Lesson 1 - Variance Reduction with CUPED
Welcome to Variance Reduction with CUPED
In Module 3 you learned the iron law of sample size: to detect a smaller effect, you need dramatically more users, because the noise in your metric drowns small signals. What if you could reduce that noise directly — not by collecting more data, but by using data you already had before the experiment even started? That’s the idea behind CUPED (Controlled-experiment Using Pre-Experiment Data), one of the highest-leverage techniques in modern experimentation. If your metric is even moderately predictable from a user’s history, CUPED strips out the predictable part, leaving a much quieter signal — and a test that reaches significance with roughly half the users. This lesson builds it and proves the saving.
By the end of this lesson, you will be able to:
- Explain why removing predictable variation shrinks a metric’s variance
- Build the CUPED adjustment from a pre-experiment covariate
- Show that variance drops by approximately the correlation squared
- Connect the variance reduction to a smaller required sample size
Let’s start with the core idea.
Subtract What You Already Knew
Most metrics are partly predictable. A user who spent a lot last month tends to spend a lot this month; a user who was highly active before the experiment tends to stay active. That predictability is the key. If you can guess part of a user’s outcome from their history, then the surprising part — the part your experiment might actually be moving — is smaller and less noisy than the raw outcome.
CUPED formalizes this. Take a pre-experiment covariate X — any metric measured before the experiment that correlates with your outcome Y (last month’s spend, prior activity, historical conversion). Then define an adjusted metric:
Y' = Y − θ(X − X̄), where θ = cov(X, Y) / var(X)
The term θ(X − X̄) is the part of Y you could have predicted from X; subtracting it removes that predictable variation. Crucially, because X was measured before the experiment, it can’t have been affected by the treatment — so subtracting it doesn’t change the effect you’re measuring, only the noise around it.
How Much Noise Disappears
The size of the win is governed by one number: the correlation ρ between X and Y. The variance of the adjusted metric is var(Y') = var(Y)·(1 − ρ²) — so the fraction of variance you remove is ρ². A covariate correlated 0.7 with your outcome removes about half the variance; one correlated 0.9 removes 81%.
Let’s verify it. We simulate an experiment with a true effect of 0.20, where the outcome Y correlates 0.7 with a pre-experiment covariate X, then compare the raw and CUPED analyses:
import numpy as np
rng = np.random.default_rng(5)
n, rho, delta = 5000, 0.7, 0.20
def make_arm(shift):
x = rng.normal(0, 1, n) # pre-experiment covariate
eps = rng.normal(0, np.sqrt(1 - rho**2), n) # unpredictable part
return x, shift + rho * x + eps # Y = shift + rho*X + noise
xc, yc = make_arm(0.0)
xt, yt = make_arm(delta) # treatment adds the effect
x, y = np.concatenate([xc, xt]), np.concatenate([yc, yt])
theta = np.cov(x, y, ddof=1)[0, 1] / np.var(x, ddof=1) # CUPED coefficient
yc_adj = yc - theta * (xc - x.mean())
yt_adj = yt - theta * (xt - x.mean())
var_reduction = 1 - np.var(np.r_[yc_adj, yt_adj], ddof=1) / np.var(y, ddof=1)
se_raw = np.sqrt(yc.var(ddof=1)/n + yt.var(ddof=1)/n)
se_cuped = np.sqrt(yc_adj.var(ddof=1)/n + yt_adj.var(ddof=1)/n)
print(f"variance reduction: {var_reduction:.3f} (rho^2 = {rho**2:.2f})")
print(f"std error raw {se_raw:.4f} -> cuped {se_cuped:.4f}")Running it:
variance reduction: 0.480 (rho^2 = 0.49)
std error raw 0.0201 -> cuped 0.0144The variance fell 48%, right on the predicted ρ² = 0.49, and the standard error dropped from 0.0201 to 0.0144 — a 28% reduction. Here’s why that matters so much: sample size scales with the square of the standard error, so cutting the SE by 28% means you need only about (0.0144/0.0201)² ≈ 0.51 — roughly half — as many users to reach the same power. Same experiment, same effect, half the traffic, just by using data you already had.
CUPED is free power — with one honest caveat
CUPED is close to a free lunch: it’s unbiased (the effect estimate is unchanged), it only needs a pre-experiment covariate you almost certainly already log, and the variance reduction is real, not a trick. The one caveat is the word pre-experiment: the covariate must be measured before assignment, so the treatment can’t have influenced it. Using a covariate contaminated by the treatment (say, activity during the experiment) reintroduces exactly the confounding the whole course warns against and biases your result. Pick something from before the user entered the test — their history — and the win is genuine. The best covariate is usually the same metric measured in a pre-period, which tends to be highly correlated with itself.
Practice Exercises
Exercise 1: How good must the covariate be?
Your teammate wants to use a covariate correlated only 0.3 with the outcome. Roughly what variance reduction would CUPED give, and is it worth it?
Hint
Variance reduction ≈ ρ² = 0.3² = 0.09, so about 9% — modest. The standard error would drop only ~4.6% (√0.91), saving under 10% of the sample. It’s not nothing, and CUPED is cheap to apply, so it may still be worth it — but the big wins come from covariates correlated 0.6+ (ρ² ≥ 0.36). The usual high-correlation choice is the same metric measured in a pre-period.
Exercise 2: Why doesn’t CUPED change the effect?
Subtracting θ(X − X̄) from every user’s outcome changes their individual numbers. Why doesn’t it change the estimated treatment effect (the difference in group means)?
Hint
Because X is a pre-experiment covariate, randomization makes its distribution the same in both groups on average, so the adjustment θ(X − X̄) has the same average in each arm and cancels out of the difference in means. It removes noise symmetrically from both groups without shifting the gap between them. (In finite samples it even helpfully corrects small chance imbalances in X.) What it can’t do is subtract something the treatment affected — that would remove real effect, not just noise.
Exercise 3: From variance to sample size
CUPED cut your standard error by 28%. Your original test needed 8,000 users per arm. Roughly how many does the CUPED-adjusted test need for the same power?
Hint
Sample size scales with the square of the standard error, and the SE fell to 0.72 of its original value (a 28% cut). So you need about 0.72² ≈ 0.51 of the users — roughly 4,100 per arm instead of 8,000. The variance reduction (48%) maps almost directly onto the sample-size saving (~49%), which is why CUPED is so valuable when traffic is scarce or tests are slow.
Summary
CUPED cuts the noise in a metric by subtracting the part you could have predicted from a pre-experiment covariate X: the adjusted metric is Y' = Y − θ(X − X̄) with θ = cov(X, Y)/var(X). Because X is measured before assignment, the adjustment leaves the treatment effect unchanged while removing variance equal to about ρ², the squared correlation between X and Y. On simulated data with ρ = 0.7, variance fell 48% and the standard error dropped from 0.0201 to 0.0144 — and since sample size scales with the SE squared, that halves the users needed for the same power. It’s one of the closest things to free statistical power in experimentation, with the single requirement that the covariate come from before the experiment.
Key Concepts
- CUPED adjustment —
Y' = Y − θ(X − X̄),θ = cov(X,Y)/var(X), using a pre-experiment covariate. - Variance reduction ≈ ρ² — a covariate correlated 0.7 removes ~49% of the variance.
- Effect unchanged — the adjustment cancels from the difference in means, so it’s unbiased.
- Half the sample — sample size scales with SE², so a 28% SE cut roughly halves the traffic needed.
Why This Matters
Traffic is the scarcest resource in experimentation: it caps how many tests you can run and how fast you learn. CUPED effectively doubles your experimentation throughput when a good pre-period covariate exists — the same infrastructure runs twice as many tests, or each test finishes in half the time, for essentially no cost. It’s used by every major experimentation platform for exactly this reason. And it reframes a deep idea from earlier in the course: reducing noise is just as powerful as increasing signal, and sometimes far cheaper. Next, you’ll tackle the other big frustration — the temptation to stop early — with sequential testing done correctly.
Next Steps
Continue to Lesson 2 - Sequential Testing
The correct way to peek at a running test and stop early — with the error control naive peeking destroys.
Back to Module Overview
Return to the Beyond Basic A/B module overview
Continue Building Your Skills
You’ve seen how CUPED turns a pre-experiment covariate into free statistical power — a 48% variance reduction and roughly half the required sample size, just by subtracting predictable noise. That’s one way to make experiments cheaper. Next you’ll make them faster to conclude: sequential testing lets you look at a running experiment and stop the moment there’s enough evidence, with the valid error control that the naive peeking of Module 6 threw away.