Lesson 1 - Why We Run Experiments
Welcome to Why We Run Experiments
Here’s a question every product team faces: Lumen, an e-learning app, ships a new signup page, and the people who saw it converted at a much higher rate than those who didn’t. Did the new page cause more signups? It’s tempting to say yes — the numbers are right there. But if the people who saw the new page were different to begin with — more motivated, further along, more likely to sign up anyway — then the gap might have nothing to do with the page. This is the trap that experiments exist to escape. The whole reason we run an A/B test, rather than just reading our dashboards, is to earn the right to say a change caused a result. This lesson is about why that’s hard, and why randomization is the way out.
By the end of this lesson, you will be able to:
- Explain why correlation isn’t causation, in the language of confounders
- Describe what a confounder is and how it biases an observational comparison
- See, in a simulation, an effect overstated four-fold — and then corrected
- State why random assignment is what licenses a causal claim
Let’s start with the comparison that lies.
Correlation Isn’t Causation
Two things being associated — moving together — doesn’t mean one caused the other. The classic reason is a confounder: a third variable that influences both, creating an association that isn’t causal. Ice cream sales and drowning deaths rise together, but neither causes the other; hot weather drives both.
For Lumen, the lurking confounder is motivation. Highly motivated learners are more likely to seek out and try the new signup page — and they’re also more likely to sign up no matter what page they see. So if we just compare “people who used the new page” against “people who didn’t,” motivation is baked into the comparison. The new-page group is stacked with motivated people, so it converts more — and we might credit the page for what motivation did.
Watch the Comparison Lie
This isn’t hand-waving — we can build it and watch it happen. We’ll simulate 20,000 users, each with a hidden motivation score. In the observational world, motivated users self-select into the new page, and motivation strongly drives conversion. The new page has a small real effect of 0.03 (three extra percentage points of conversion), but motivation’s influence is much larger.
import numpy as np
rng = np.random.default_rng(42)
n = 20000
motivation = rng.random(n) # hidden confounder in [0, 1]
TRUE_EFFECT = 0.03 # the real lift the new page adds
# Observational: motivated users SELF-SELECT into the new page
in_treat = rng.random(n) < motivation # higher motivation -> more likely to try it
base = 0.05 + 0.30 * motivation # motivation drives conversion hard
convert = rng.random(n) < (base + TRUE_EFFECT * in_treat)
rate_new = convert[in_treat].mean()
rate_old = convert[~in_treat].mean()
print(f"new page: {rate_new:.3f} old page: {rate_old:.3f} gap: {rate_new - rate_old:.3f}")
print(f"avg motivation — new: {motivation[in_treat].mean():.3f} old: {motivation[~in_treat].mean():.3f}")Running it:
new page: 0.276 old page: 0.150 gap: 0.126
avg motivation — new: 0.667 old: 0.334The gap is 0.126 — more than four times the true effect of 0.03. And the tell is right there in the second line: the new-page group has average motivation 0.667 versus 0.334 for the old page. The groups weren’t the same to begin with, so the comparison is measuring motivation plus page, not the page. An observational comparison can’t separate the two.
Randomization Breaks the Confounder
Now change one thing: instead of letting users self-select, we assign the page at random — a coin flip, independent of motivation. Everything else stays the same.
# Randomized: assignment is a coin flip, independent of motivation
in_treat = rng.random(n) < 0.50
convert = rng.random(n) < (0.05 + 0.30 * motivation + TRUE_EFFECT * in_treat)
rate_new = convert[in_treat].mean()
rate_old = convert[~in_treat].mean()
print(f"new page: {rate_new:.3f} old page: {rate_old:.3f} gap: {rate_new - rate_old:.3f}")
print(f"avg motivation — new: {motivation[in_treat].mean():.3f} old: {motivation[~in_treat].mean():.3f}")Running it:
new page: 0.233 old page: 0.203 gap: 0.030
avg motivation — new: 0.500 old: 0.500The gap is now 0.030 — exactly the true effect. And motivation is 0.500 in both groups: the coin flip spread motivated and unmotivated users evenly across the two pages. Motivation still affects conversion, but it no longer differs between the groups, so it can’t distort the comparison. The only systematic difference left between new and old is the thing we changed — the page — so the difference we measure is caused by it.
Randomization balances what you didn’t measure
The magic of random assignment isn’t that it controls for motivation specifically — we didn’t tell it about motivation at all. It balances motivation and every other variable at once, including confounders you’ve never thought of and couldn’t measure. That’s the unique power an experiment has over any observational analysis: you can only statistically “control for” confounders you know about and recorded, but randomization handles the unknown ones for free. It’s why the randomized gap landed on the truth without us doing anything clever.
Practice Exercises
Exercise 1: Spot the confounder
A gym finds that members who use its app lose more weight than members who don’t, and concludes the app causes weight loss. What confounder might explain the gap without the app causing anything?
Hint
Motivation (or commitment) again: more dedicated members are both more likely to use the app and more likely to lose weight through diet and exercise regardless of the app. The app-users and non-users differ to begin with, so the comparison is confounded. Only randomly assigning some members to use the app would isolate its effect.
Exercise 2: Why did the gap shrink?
In the simulation, the same TRUE_EFFECT = 0.03 and the same motivation-driven conversion were used both times. Why did the measured gap fall from 0.126 to 0.030 just by changing how users were assigned?
Hint
Because assignment changed from self-selected (correlated with motivation) to random (independent of motivation). In the first case the new-page group had far higher average motivation (0.667 vs 0.334), so the gap included motivation’s effect. Random assignment equalized motivation across groups (0.500 vs 0.500), leaving only the page’s true 0.03 effect.
Exercise 3: What does randomization license?
Why can you make a causal claim (“the new page caused higher conversion”) from a randomized experiment but not from an observational comparison, even a big one?
Hint
In a randomized experiment the groups are the same on average on every variable — measured or not — so the only systematic difference is the treatment, and any difference in the outcome must be caused by it. An observational comparison can differ on unmeasured confounders no matter how large it is, so a difference in the outcome could always be caused by something you didn’t account for. Sample size doesn’t fix confounding; randomization does.
Summary
We run experiments to earn causal claims, which plain observation can’t support because of confounders — third variables that influence both the treatment and the outcome and create associations that aren’t causal. In the simulation, motivation drove both which page users chose and whether they converted, so the observational gap (0.126) overstated the true effect (0.030) more than four-fold, with the new-page group’s average motivation (0.667) far above the old page’s (0.334). Random assignment fixed it: a coin flip independent of motivation balanced it across groups (0.500 each), and the measured gap fell to exactly the true effect (0.030). Randomization’s superpower is that it balances every variable at once — including confounders you never measured — which is precisely what licenses a causal claim.
Key Concepts
- Correlation ≠ causation — an association can come from a confounder, not a cause.
- Confounder — a variable affecting both treatment and outcome, biasing the comparison.
- Self-selection bias — when who gets the treatment correlates with the outcome, the gap is distorted.
- Randomization — assigning treatment by chance balances all variables across groups, licensing causal claims.
Why This Matters
Almost every “insight” pulled from raw product data is a confounded comparison waiting to mislead — users who used a feature vs. those who didn’t, customers who got an email vs. those who didn’t. Knowing why those comparisons can’t establish cause, and why a randomized experiment can, is the foundation everything else in this course builds on. It’s also what stops you from shipping a change because a confounded dashboard moved. Next, you’ll formalize the experiment: the control and treatment groups that turn this logic into a repeatable design.
Next Steps
Continue to Lesson 2 - Control and Treatment Groups
Turn the logic into a design: the control and treatment groups that make up the A/B frame.
Back to Module Overview
Return to The Logic of Experiments module overview
Continue Building Your Skills
You’ve seen why observation alone can’t establish cause — a confounder like motivation quietly biases the comparison — and why randomization is the escape hatch, balancing every variable across groups so a measured difference points to the change. Next you’ll formalize this into the A/B frame: control and treatment groups, and the randomization unit that makes the split fair.