Lesson 1 - From Question to Hypothesis

Welcome to From Question to Hypothesis

Lumen’s team has an idea: “let’s make the signup page better.” It’s a fine instinct and a useless experiment. Better how? Measured by what? Better by how much before it’s worth shipping? An experiment can only answer a question that’s been sharpened to the point where data could prove it wrong. That sharpening — from a fuzzy goal to a precise, testable claim — is the first and most important design step, and it’s where a lot of real experiments quietly fail before they even run. This lesson is about doing it well.

By the end of this lesson, you will be able to:

  • Explain why a vague goal can’t be tested and a hypothesis can
  • Write a hypothesis that is specific, directional, and falsifiable
  • State the null and alternative hypotheses for an experiment
  • Recognize that the hypothesis, set up front, defines the decision the test makes

Let’s start with why “make signups better” isn’t something you can test.


Why a Vague Goal Can’t Be Tested

“Make signups better” fails as an experiment for three concrete reasons. It doesn’t say what change you’re making, so you don’t know what to build into the treatment. It doesn’t name a metric, so there’s nothing to measure. And it sets no threshold, so no result could ever count as failure — any wiggle upward could be spun as success. A claim that can’t fail can’t be tested; it can only be rationalized.

Compare it to this: “Lumen’s new signup page increases signup conversion rate by at least 2 percentage points.” Now there’s a specific change (the new page), a named metric (signup conversion rate), a direction (increases), and a threshold (at least 2 points). Every one of those is something an experiment can check. This is a hypothesis — a prediction precise enough to be measured and, crucially, precise enough to be wrong.

A vague goal 'Make Lumen's signups better' (marked not testable) is sharpened into a testable hypothesis: 'The new signup page increases signup conversion rate by at least 2 points.' Three callouts below label its parts: Specific (one change, one named metric), Directional (predicts which way and by how much), and Falsifiable (data could prove it wrong). Caption: a good hypothesis names the change, the metric, the direction, and a threshold so the experiment can answer it.
Sharpening a vague goal into a testable hypothesis: it names the one change, the metric, the direction, and a threshold — the four things that make it something an experiment can actually answer.

The Three Marks of a Good Hypothesis

A testable hypothesis has three properties. Miss any one and the experiment gets harder to interpret:

  • Specific — it isolates one change and names one primary metric. “Redesign the page and change the copy and add testimonials” confounds three changes; if conversion moves, you won’t know which one did it. One change, one metric.
  • Directional — it predicts which way the metric moves, and ideally by how much. “The new page changes conversion” is technically testable but useless for a decision; “increases conversion by at least 2 points” tells you what result would justify shipping.
  • Falsifiable — there’s a result that would make you say “no, that’s wrong.” If the new page converts the same or worse, the hypothesis is refuted. A hypothesis you can’t imagine being wrong isn’t a hypothesis; it’s a wish.

The threshold (“at least 2 points”) deserves special attention: it’s what turns a direction into a decision rule. You’ll formalize that threshold as the minimum detectable effect later in this module — the smallest change worth caring about, which also drives how much data you’ll need.


The Null and the Alternative

Statisticians frame this pair of possibilities explicitly, and it’s worth adopting because it’s exactly what the test evaluates:

  • The null hypothesis (H₀) is the skeptical default: the change does nothing. For Lumen, “the new page’s true conversion rate equals the old page’s” — any difference you see is just noise.
  • The alternative hypothesis (H₁) is what you’re proposing: the change has an effect. “The new page’s true conversion rate is higher than the old page’s.”

An experiment doesn’t try to prove the alternative directly. It asks: if the null were true — if the change really did nothing — how surprising is the difference we observed? If a difference this large would be very unlikely under “no effect,” that’s evidence against the null and for your change. That’s the logic behind every significance test you’ll run later; the hypothesis you write now is what those tests are pointed at.

Write the hypothesis before you look at data

The hypothesis — and the threshold, metric, and direction in it — must be fixed before the experiment runs. It’s tempting to run a test, look at whatever moved, and write the hypothesis to match (“aha, it improved time-on-page!”). But a prediction made after seeing the data isn’t a prediction — with enough metrics, something always looks like it moved by chance. Deciding up front what you’re testing and what result would count as success is what keeps the experiment honest. You’ll see the statistical damage that “deciding after” does when we reach peeking and multiple comparisons.


Practice Exercises

Exercise 1: Sharpen the goal

Turn this vague goal into a testable hypothesis: “Lumen’s onboarding should be more engaging.” Assume the change is adding a progress bar, and the metric is the share of new users who finish onboarding.

Hint

Something like: “Adding a progress bar to onboarding increases the onboarding-completion rate by at least 3 percentage points.” It’s specific (one change: the progress bar; one metric: completion rate), directional (increases, by ≥3 points), and falsifiable (if completion is flat or lower, it’s refuted). The exact threshold is a judgment call — but naming one is what makes it a decision.

Exercise 2: What’s wrong with this one?

Critique this hypothesis: “The redesigned checkout — new layout, new copy, and one-click pay — will improve the business.”

Hint

Two problems. It’s not specific: three changes are bundled, so if a metric moves you can’t tell which change caused it (a confounded treatment). And “improve the business” names no measurable metric and no threshold, so it’s not falsifiable — there’s no result that would count as failure. Fix it by isolating one change and naming one primary metric with a direction and threshold.

Exercise 3: State the null

For the hypothesis “the new signup page increases conversion,” write the null hypothesis, and explain what the experiment actually tests.

Hint

The null (H₀) is “the new page’s true conversion rate equals the old page’s — the change has no effect.” The experiment tests how surprising the observed difference would be if the null were true: a difference too large to be plausible under “no effect” is evidence against the null and for the alternative. The test never proves the change works; it measures the evidence against the change doing nothing.


Summary

An experiment can only answer a question sharp enough to be wrong, so the first design step is turning a vague goal into a testable hypothesis. A vague goal (“make signups better”) names no change, no metric, and no threshold, so no result could ever count as failure. A good hypothesis is specific (one change, one primary metric), directional (which way, and by how much), and falsifiable (some result would refute it) — for example, “the new signup page increases conversion by at least 2 points.” Statisticians frame this as a null hypothesis (the change does nothing) versus an alternative (it has an effect); the experiment measures how surprising the observed difference would be if the null were true. And all of it is fixed before the data arrives, which is what keeps the test honest.

Key Concepts

  • Testable hypothesis — a prediction precise enough to be measured and to be proven wrong.
  • Specific, directional, falsifiable — the three marks of a hypothesis an experiment can answer.
  • Null vs. alternative — H₀ says the change does nothing; H₁ says it has an effect.
  • Fixed up front — writing the hypothesis before seeing data prevents fitting a story to noise.

Why This Matters

Most experiments that “didn’t work out” were doomed at this step: a fuzzy goal, a bundled change, or a metric chosen after the fact. Learning to write a sharp hypothesis is what makes everything downstream — choosing metrics, sizing the test, running the analysis — even possible, because each of those needs to know exactly what you’re testing. It’s also the cheapest quality lever you have: a few minutes of precision here saves weeks of an experiment that can’t answer anything. Next, you’ll choose the metrics that hypothesis points to — the one primary metric your decision hinges on, and the guardrails that protect everything else.


Next Steps

Continue to Lesson 2 - Choosing Your Metrics

Pick the one primary metric the decision hinges on, plus guardrails and secondary metrics for context.

Back to Module Overview

Return to the Designing an Experiment module overview


Continue Building Your Skills

You can now turn a vague goal into a hypothesis an experiment can actually answer — specific, directional, falsifiable, and framed as a null versus an alternative, all fixed before any data arrives. Next you’ll choose the metrics that hypothesis points at: the single primary metric your decision depends on, the guardrails that must not regress, and the secondary metrics that help you understand what happened.