Lesson 1 - Why We Run Experiments

Welcome to Why We Run Experiments

Here’s a question every product team faces: Lumen, an e-learning app, ships a new signup page, and the people who saw it converted at a much higher rate than those who didn’t. Did the new page cause more signups? It’s tempting to say yes — the numbers are right there. But if the people who saw the new page were different to begin with — more motivated, further along, more likely to sign up anyway — then the gap might have nothing to do with the page. This is the trap that experiments exist to escape. The whole reason we run an A/B test, rather than just reading our dashboards, is to earn the right to say a change caused a result. This lesson is about why that’s hard, and why randomization is the way out.

By the end of this lesson, you will be able to:

Explain why correlation isn’t causation, in the language of confounders
Describe what a confounder is and how it biases an observational comparison
See, in a simulation, an effect overstated four-fold — and then corrected
State why random assignment is what licenses a causal claim

Let’s start with the comparison that lies.

Correlation Isn’t Causation

Two things being associated — moving together — doesn’t mean one caused the other. The classic reason is a confounder: a third variable that influences both, creating an association that isn’t causal. Ice cream sales and drowning deaths rise together, but neither causes the other; hot weather drives both.

For Lumen, the lurking confounder is motivation. Highly motivated learners are more likely to seek out and try the new signup page — and they’re also more likely to sign up no matter what page they see. So if we just compare “people who used the new page” against “people who didn’t,” motivation is baked into the comparison. The new-page group is stacked with motivated people, so it converts more — and we might credit the page for what motivation did.

Two panels. Left, 'Observational (self-selected)': a hidden confounder 'Motivation' points to both 'Sees new page' and 'Converts', and the dashed link between them is labeled spurious; the measured gap is 0.126 while the true effect is only 0.030 — mostly motivation. Right, 'Randomized (coin flip)': Motivation is still present but the link from it to group assignment is crossed out; a coin flip assigns the new page at random; the measured gap is 0.030, matching the true effect, with motivation balanced 50/50. Caption: random assignment cuts the link from the confounder to the group, so the only systematic difference left is the change itself. — A confounder (motivation) drives both which page a user sees and whether they convert, so the observational gap (0.126) badly overstates the true effect (0.030). Randomly assigning the page cuts the confounder-to-group link, and the measured gap (0.030) matches the truth.

Watch the Comparison Lie

This isn’t hand-waving — we can build it and watch it happen. We’ll simulate 20,000 users, each with a hidden motivation score. In the observational world, motivated users self-select into the new page, and motivation strongly drives conversion. The new page has a small real effect of 0.03 (three extra percentage points of conversion), but motivation’s influence is much larger.

import numpy as np

rng = np.random.default_rng(42)
n = 20000
motivation = rng.random(n)            # hidden confounder in [0, 1]
TRUE_EFFECT = 0.03                     # the real lift the new page adds

# Observational: motivated users SELF-SELECT into the new page
in_treat = rng.random(n) < motivation             # higher motivation -> more likely to try it
base = 0.05 + 0.30 * motivation                   # motivation drives conversion hard
convert = rng.random(n) < (base + TRUE_EFFECT * in_treat)

rate_new = convert[in_treat].mean()
rate_old = convert[~in_treat].mean()
print(f"new page: {rate_new:.3f}   old page: {rate_old:.3f}   gap: {rate_new - rate_old:.3f}")
print(f"avg motivation — new: {motivation[in_treat].mean():.3f}  old: {motivation[~in_treat].mean():.3f}")

Running it:

new page: 0.276   old page: 0.150   gap: 0.126
avg motivation — new: 0.667  old: 0.334

The gap is 0.126 — more than four times the true effect of 0.03. And the tell is right there in the second line: the new-page group has average motivation 0.667 versus 0.334 for the old page. The groups weren’t the same to begin with, so the comparison is measuring motivation plus page, not the page. An observational comparison can’t separate the two.

Randomization Breaks the Confounder

Now change one thing: instead of letting users self-select, we assign the page at random — a coin flip, independent of motivation. Everything else stays the same.

# Randomized: assignment is a coin flip, independent of motivation
in_treat = rng.random(n) < 0.50
convert = rng.random(n) < (0.05 + 0.30 * motivation + TRUE_EFFECT * in_treat)

rate_new = convert[in_treat].mean()
rate_old = convert[~in_treat].mean()
print(f"new page: {rate_new:.3f}   old page: {rate_old:.3f}   gap: {rate_new - rate_old:.3f}")
print(f"avg motivation — new: {motivation[in_treat].mean():.3f}  old: {motivation[~in_treat].mean():.3f}")

Running it:

new page: 0.233   old page: 0.203   gap: 0.030
avg motivation — new: 0.500  old: 0.500

The gap is now 0.030 — exactly the true effect. And motivation is 0.500 in both groups: the coin flip spread motivated and unmotivated users evenly across the two pages. Motivation still affects conversion, but it no longer differs between the groups, so it can’t distort the comparison. The only systematic difference left between new and old is the thing we changed — the page — so the difference we measure is caused by it.

Randomization balances what you didn’t measure

The magic of random assignment isn’t that it controls for motivation specifically — we didn’t tell it about motivation at all. It balances motivation and every other variable at once, including confounders you’ve never thought of and couldn’t measure. That’s the unique power an experiment has over any observational analysis: you can only statistically “control for” confounders you know about and recorded, but randomization handles the unknown ones for free. It’s why the randomized gap landed on the truth without us doing anything clever.

Practice Exercises

Exercise 1: Spot the confounder

A gym finds that members who use its app lose more weight than members who don’t, and concludes the app causes weight loss. What confounder might explain the gap without the app causing anything?

Hint

Motivation (or commitment) again: more dedicated members are both more likely to use the app and more likely to lose weight through diet and exercise regardless of the app. The app-users and non-users differ to begin with, so the comparison is confounded. Only randomly assigning some members to use the app would isolate its effect.

Exercise 2: Why did the gap shrink?

In the simulation, the same TRUE_EFFECT = 0.03 and the same motivation-driven conversion were used both times. Why did the measured gap fall from 0.126 to 0.030 just by changing how users were assigned?

Hint

Because assignment changed from self-selected (correlated with motivation) to random (independent of motivation). In the first case the new-page group had far higher average motivation (0.667 vs 0.334), so the gap included motivation’s effect. Random assignment equalized motivation across groups (0.500 vs 0.500), leaving only the page’s true 0.03 effect.

Exercise 3: What does randomization license?

Why can you make a causal claim (“the new page caused higher conversion”) from a randomized experiment but not from an observational comparison, even a big one?

Hint

In a randomized experiment the groups are the same on average on every variable — measured or not — so the only systematic difference is the treatment, and any difference in the outcome must be caused by it. An observational comparison can differ on unmeasured confounders no matter how large it is, so a difference in the outcome could always be caused by something you didn’t account for. Sample size doesn’t fix confounding; randomization does.

Summary

We run experiments to earn causal claims, which plain observation can’t support because of confounders — third variables that influence both the treatment and the outcome and create associations that aren’t causal. In the simulation, motivation drove both which page users chose and whether they converted, so the observational gap (0.126) overstated the true effect (0.030) more than four-fold, with the new-page group’s average motivation (0.667) far above the old page’s (0.334). Random assignment fixed it: a coin flip independent of motivation balanced it across groups (0.500 each), and the measured gap fell to exactly the true effect (0.030). Randomization’s superpower is that it balances every variable at once — including confounders you never measured — which is precisely what licenses a causal claim.

Key Concepts

Correlation ≠ causation — an association can come from a confounder, not a cause.
Confounder — a variable affecting both treatment and outcome, biasing the comparison.
Self-selection bias — when who gets the treatment correlates with the outcome, the gap is distorted.
Randomization — assigning treatment by chance balances all variables across groups, licensing causal claims.

Why This Matters

Almost every “insight” pulled from raw product data is a confounded comparison waiting to mislead — users who used a feature vs. those who didn’t, customers who got an email vs. those who didn’t. Knowing why those comparisons can’t establish cause, and why a randomized experiment can, is the foundation everything else in this course builds on. It’s also what stops you from shipping a change because a confounded dashboard moved. Next, you’ll formalize the experiment: the control and treatment groups that turn this logic into a repeatable design.

Next Steps

Continue to Lesson 2 - Control and Treatment Groups

Turn the logic into a design: the control and treatment groups that make up the A/B frame.

Back to Module Overview

Return to The Logic of Experiments module overview

Continue Building Your Skills

You’ve seen why observation alone can’t establish cause — a confounder like motivation quietly biases the comparison — and why randomization is the escape hatch, balancing every variable across groups so a measured difference points to the change. Next you’ll formalize this into the A/B frame: control and treatment groups, and the randomization unit that makes the split fair.

Next lesson

Lesson 2 - Control and Treatment Groups

Courses

DATATWEETS

Title here

Lesson 1 - Why We Run Experiments

Welcome to Why We Run Experiments

Correlation Isn’t Causation

Watch the Comparison Lie

Randomization Breaks the Confounder

Practice Exercises

Exercise 1: Spot the confounder

Exercise 2: Why did the gap shrink?

Exercise 3: What does randomization license?

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 2 - Control and Treatment Groups

Back to Module Overview

Continue Building Your Skills

Lesson 1 - Why We Run Experiments

Welcome to Why We Run Experiments#

Correlation Isn’t Causation#

Watch the Comparison Lie#

Randomization Breaks the Confounder#

Practice Exercises#

Exercise 1: Spot the confounder#

Exercise 2: Why did the gap shrink?#

Exercise 3: What does randomization license?#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 2 - Control and Treatment Groups

Back to Module Overview

Continue Building Your Skills#

Welcome to Why We Run Experiments

Correlation Isn’t Causation

Watch the Comparison Lie

Randomization Breaks the Confounder

Practice Exercises

Exercise 1: Spot the confounder

Exercise 2: Why did the gap shrink?

Exercise 3: What does randomization license?

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills