Lesson 4 - The Minimum Detectable Effect
Welcome to The Minimum Detectable Effect
Back in Lesson 1 you wrote a hypothesis with a threshold in it — “increases conversion by at least 2 points.” That “at least” was doing quiet, important work, and now it’s time to name it. The threshold is the minimum detectable effect (MDE): the smallest true change you want your experiment to be able to catch reliably. It sounds like a statistical setting, but it’s really a business decision — how big does a lift have to be before it’s worth the engineering, the risk, and the opportunity cost of shipping it? Get this number right and everything downstream falls into place. Get it wrong and you’ll either run a test that can’t see the effect you care about, or spend months chasing one too small to matter.
By the end of this lesson, you will be able to:
- Define the minimum detectable effect as a design choice, not a result
- Distinguish practical significance from statistical significance
- Tell absolute effects (percentage points) apart from relative effects (percent lift)
- Explain why a smaller MDE demands a much larger sample size
Let’s start with what the MDE actually is — and what it isn’t.
What the MDE Is (and Isn’t)
The minimum detectable effect is the smallest true effect you want the experiment to detect reliably. The word design matters: you set the MDE before the experiment runs, the same way you fixed the hypothesis before seeing data. It is not something the test hands back to you — it’s an input you choose, and it encodes a judgment about the business: how big is big enough to care?
For Lumen, suppose the pricing page converts at 10% today. Is a lift to 10.2% worth a redesign? Probably not — that’s a rounding-error’s worth of extra revenue against real engineering cost. A lift to 12%? Now we’re talking. Somewhere between those two sits the line where you’d actually change your decision, and that line is your MDE. Set it from value, not from what feels statistically convenient: pick the smallest lift that would still justify shipping, and make the experiment able to see that.
Practical significance vs. statistical significance
These are not the same thing, and confusing them is one of the most common experiment mistakes. Statistical significance asks: is this difference too large to be plausibly just noise? Practical significance asks: is this difference big enough to be worth acting on? With enough users, a truly trivial effect — a +0.1 point lift — can be statistically significant, because huge samples can resolve tiny differences. But it may be nowhere near practically significant. The MDE is how you encode practical significance up front, so a technically “significant” but pointless result doesn’t drag you into shipping something that doesn’t matter.
Absolute vs. Relative: Say Which One
Before you can act on an MDE, you have to be precise about how it’s measured — and here there’s a trap worth naming explicitly, because it’s the same one from Module 1’s discussion of lift. An effect can be stated two ways:
- Absolute — in percentage points. Going from a 10% conversion rate to 12% is a +2 point absolute effect.
- Relative — as a percent of the baseline. That same move, 10% to 12%, is a +20% relative lift, because 2 is 20% of 10.
Both describe the identical change, but “+2%” is dangerously ambiguous: does it mean +2 points (to 12%) or +2% relative (to 10.2%)? Those are wildly different experiments to size. Whenever you write an MDE down, say points or say relative, and when someone hands you one, ask which they mean. For the rest of this lesson we’ll work in absolute percentage points on a 10% baseline.
Why a Smaller MDE Costs So Much More
Here’s the relationship that makes the MDE the single most important number in your experiment design: the smaller the effect you want to detect, the more users you need — and it grows fast. Detecting a big, obvious lift is cheap; detecting a small one is expensive, because small differences are easy for random noise to hide.
You’ll get the full treatment of power and sample size in Module 3. For now, here’s a preview computed for real with scipy — a small function that returns the users needed per arm to detect a given lift over a 10% baseline, at the standard 80% power and 5% significance level:
import math
from scipy.stats import norm
def n_per_arm(p1, p2, alpha=0.05, power=0.80):
z_a = norm.ppf(1 - alpha/2)
z_b = norm.ppf(power)
num = (z_a + z_b)**2 * (p1*(1-p1) + p2*(1-p2))
return math.ceil(num / (p2 - p1)**2)
baseline = 0.10
for mde in (0.01, 0.02, 0.03):
print(f"MDE +{mde:.2f} -> {n_per_arm(baseline, baseline+mde):,} per arm")Running it prints:
MDE +0.01 -> 14,749 per arm
MDE +0.02 -> 3,839 per arm
MDE +0.03 -> 1,772 per armLook at what happens. Detecting a +3 point lift takes about 1,772 users per arm. Ask instead for a +2 point lift — a smaller target — and the cost more than doubles to 3,839. Insist on catching a +1 point lift and it jumps to 14,749, nearly 3.8x the users needed for +2 points. The pattern is roughly quadratic: halving the effect you want to catch roughly quadruples the sample size. Don’t worry about the formula’s internals here — that’s Module 3’s job. The point for design is the tradeoff: your MDE is the biggest lever you have on how expensive and how long the experiment will be. A smaller MDE is not free ambition; it’s a real bill in users and calendar time.
Practice Exercises
Exercise 1: Choose an MDE from value
Lumen’s checkout converts at 8%. Engineering estimates a proposed one-click flow will take a full sprint to build and maintain. A +0.3 point lift wouldn’t cover that cost; a +1.5 point lift clearly would. What MDE should you set, and why is it a design choice rather than a measurement?
Hint
Set the MDE somewhere around the smallest lift that still justifies the sprint — say +1 point absolute. It’s a design choice because you’re picking it before the test from the business value of the change (does the lift pay for the work?), not reading it off any data. Choosing +0.3 would force a huge, slow experiment to catch a lift too small to be worth shipping; choosing +1 point sizes the test to the decision you actually face.
Exercise 2: Absolute or relative?
A teammate says “our MDE is a 20% lift” on a page converting at 10%. What could that mean, and how would you pin it down before sizing the test?
Hint
“20% lift” almost certainly means relative — a 20% increase over the 10% baseline, i.e., from 10% to 12%, which is +2 points absolute. But you must confirm, because if they actually meant +20 points (to 30%), that’s a completely different — and far easier to detect — experiment. Restate it back explicitly: “so, 10% to 12%, a +2 point absolute lift?” Ambiguity here silently corrupts every sample-size number that follows.
Exercise 3: The cost of ambition
Using the numbers from this lesson, your team wants to tighten the MDE from +2 points to +1 point on a 10% baseline. Roughly how much bigger does the experiment get, and what’s the practical consequence?
Hint
From the scipy output, +2 points needs 3,839 per arm and +1 point needs 14,749 — about 3.8x more users, consistent with “halving the effect roughly quadruples the sample.” Practically, an experiment that would have taken two weeks at the +2 point MDE now takes closer to two months at the same traffic, or needs nearly four times the traffic to finish on schedule. Tightening the MDE is a real cost, so tighten it only when the smaller lift genuinely matters to the decision.
Summary
The minimum detectable effect is the smallest true change you want your experiment to catch reliably — and it’s a design choice you make up front from business value, not a number the test returns. It encodes practical significance (is this effect big enough to act on?), which is distinct from statistical significance (is this effect too large to be noise?); with enough data a trivial effect can be statistically significant yet practically worthless, and the MDE is how you rule that out in advance. State every MDE clearly as absolute percentage points or a relative percent lift, because “+2%” is ambiguous enough to size the wrong experiment. And the reason the MDE matters so much is the sample-size relationship, computed for real with scipy: a smaller MDE demands a much larger sample, roughly quadratically — detecting +1 point on a 10% baseline takes about 3.8x the users of +2 points. That makes the MDE the single biggest lever on how expensive and how long your experiment will be.
Key Concepts
- Minimum detectable effect (MDE) — the smallest true effect you want to detect reliably, chosen before the test from business value.
- Practical vs. statistical significance — worth acting on vs. too large to be noise; the MDE encodes the former up front.
- Absolute vs. relative — percentage points vs. percent of baseline; always say which, since they size different experiments.
- MDE drives sample size — smaller effects cost far more data; halving the MDE roughly quadruples the users needed.
Why This Matters
The MDE is where good experiment design gets expensive or cheap, fast or slow — and where teams most often set themselves up to fail without noticing. Aim too small and you commit to an experiment that needs months of traffic to detect a lift that wasn’t worth shipping anyway. Aim too big and you might miss a real, valuable effect because the test was never powered to see it. Choosing the MDE deliberately — from what the change is actually worth — is what ties the business decision to the statistics, and it’s the number every sample-size calculation you’ll do in Module 3 depends on. Next, you’ll put all of Module 2 together on a real design problem: sizing and specifying Lumen’s pricing experiment end to end.
Next Steps
Continue to Lesson 5 - Guided Project: Design Lumen's Pricing Experiment
Put the whole module together: turn a business goal into a fully specified, correctly sized pricing experiment for Lumen.
Back to Module Overview
Return to the Designing an Experiment module overview
Continue Building Your Skills
You can now set a minimum detectable effect the way experienced practitioners do: as a deliberate design choice grounded in what a change is actually worth, stated unambiguously as absolute or relative, and understood as the lever that decides how much data — and time — your experiment will need. Next, you’ll bring every piece of Module 2 together on a real design problem, specifying and sizing Lumen’s pricing experiment from a business goal all the way to a runnable plan.