Lesson 5 - Guided Project: Your First Experiment on Lumen
Welcome to the Guided Project
Across this module you built the logic of an experiment piece by piece: why we randomize to earn a causal claim, what control and treatment groups are, and how the randomization unit makes the split fair. In Lesson 4 you framed Lumen’s first real experiment — the signup page. Now you’ll run it, end to end, and read what it says.
Lumen is testing a redesigned signup page. Half of arriving users see the current page (control), half see the new one (treatment), and the metric that matters is signup conversion — did the visitor create an account? By the end of this project you’ll have gone from a random split to a measured result: two conversion rates, an absolute lift, and a relative lift. You’ll read the outcome descriptively — the new page converted higher in this sample — and you’ll see exactly where that reading stops and a significance test begins. We deliberately do not compute a p-value here; that’s Module 4. This lesson is about honestly reading what the experiment shows.
By the end of this project, you will be able to:
- Generate a clean, seeded A/B dataset for a two-group conversion experiment
- Compute observed conversion rates for control and treatment from raw outcomes
- Calculate and correctly distinguish absolute lift (percentage points) from relative lift (percent)
- Read an experiment’s result descriptively and name the open question a significance test answers
We’ll build it in stages. Let’s run Lumen’s first experiment.
Stage 1: The Setup
Here’s the experiment, exactly as Lesson 4 framed it. Lumen randomly splits its arriving users into two equal groups:
- Control — the current signup page, unchanged.
- Treatment — the new, redesigned signup page.
The primary metric is signup conversion: the fraction of users in each group who created an account. Everything else about the two groups is identical because the split was random — so if their conversion rates differ, the page is the only systematic difference between them.
To simulate this experiment we need to pretend we’re the universe and decide the truth in advance. In reality nobody knows these numbers — that’s why we run the test — but to generate believable data we’ll set the true conversion rates ourselves: the current page converts at 10%, the new page at 12%. So the true effect of the redesign is a 2-percentage-point improvement. Our job for the rest of the project is to recover that from data, the way a real analyst would, and see how close the observed result lands.
Stage 2: Generate the Seeded Data
Each user in a group either converts or doesn’t — a coin flip weighted by that group’s true rate. We simulate 5000 users per group. For control, each user converts with probability 0.10; for treatment, with probability 0.12. We seed the generator so you get the exact same dataset every time you run it.
import numpy as np
rng = np.random.default_rng(7)
n_c = n_t = 5000
conv_control = rng.random(n_c) < 0.10
conv_treatment = rng.random(n_t) < 0.12rng.random(n_c) draws 5000 numbers uniformly in [0, 1). Comparing each to 0.10 turns it into True (converted) with probability 0.10 and False otherwise — so conv_control is a boolean array of 5000 signup outcomes for the current page. conv_treatment does the same at 0.12 for the new page. That’s the whole experiment’s raw data: two arrays of True/False, one per group. Nothing here knows the “true” rates were 0.10 and 0.12 — those numbers are gone once the coins are flipped, exactly as they would be in a real test.
Stage 3: Compute the Observed Conversion Rates
A boolean array makes counting trivial: True counts as 1 and False as 0, so .sum() is the number of signups and .mean() is the conversion rate.
signups_control = conv_control.sum()
signups_treatment = conv_treatment.sum()
rate_control = conv_control.mean()
rate_treatment = conv_treatment.mean()
print(f"control: {signups_control}/{n_c} converted = {rate_control:.4f}")
print(f"treatment: {signups_treatment}/{n_t} converted = {rate_treatment:.4f}")Running it:
control: 503/5000 converted = 0.1006
treatment: 613/5000 converted = 0.1226There’s the result. The current page converted 503 of 5000 users — a rate of 0.1006, about 10.1%. The new page converted 613 of 5000 — a rate of 0.1226, about 12.3%. The new page converted more users in this sample. Now we quantify how much more.
Stage 4: Compute the Lift
“How much more” has two honest answers, and confusing them is one of the most common ways experiment results get miscommunicated.
Absolute lift is the plain difference between the two rates, measured in percentage points:
abs_lift = rate_treatment - rate_control
print(f"absolute lift = {abs_lift:.4f}") # 0.0220 -> 2.2 percentage pointsabsolute lift = 0.0220The new page converted 2.2 percentage points higher: 12.26% minus 10.06%.
Relative lift expresses that gain as a fraction of the control rate — how much better the new page did relative to where we started:
rel_lift = abs_lift / rate_control
print(f"relative lift = {rel_lift:.1%}") # 21.9%relative lift = 21.9%The same result is a 21.9% relative improvement: 0.0220 divided by 0.1006. Both numbers describe the identical outcome, but they answer different questions. Absolute lift says “the rate moved by 2.2 points.” Relative lift says “conversions went up by about a fifth.” Mixing them up misleads badly: a stakeholder who hears “signups went up 22%” and pictures 22 percentage points — imagining the rate jumped from 10% to 32% — has wildly overestimated the effect. Always say which one you mean. A good habit: state absolute lift in points and relative lift in percent, and never drop the unit.
A higher number isn’t a real difference — yet
Treatment converting at 0.1226 versus control’s 0.1006 tells you the new page did better in this sample — nothing more. Descriptive lift alone can’t ship a decision, because some gap between two random groups appears even when the underlying pages are identical, purely from the luck of who landed where. The measured lift is a starting point, not a verdict. Before Lumen rolls out the new page, it needs to know whether a 2.2-point gap is bigger than the wobble chance produces on its own — and that’s a question descriptive numbers can’t answer.
Stage 5: Read the Result
Step back and read what the experiment says, carefully.
What we can say. The new signup page converted higher than the current one in this experiment — 0.1226 versus 0.1006, an absolute lift of 2.2 points and a relative lift of about 22%. Because the split was random, the two groups were the same on average to begin with, so the page is the only systematic difference between them. That’s a genuine, causally clean description of what happened in this sample.
What we can’t say — yet. Notice the observed rates aren’t the true rates we set. We built control at exactly 0.10 and treatment at exactly 0.12, but the data came back 0.1006 and 0.1226. That gap between truth and observation is sampling noise: 5000 coin flips rarely land on the exact expected fraction. Change the seed and the numbers move — default_rng(8) gives a different 503, a different 613, a different lift. The truth didn’t change; the sample did.
Which raises the question this whole result hangs on: is the 2.2-point gap real, or could it be chance? If the two pages were secretly identical, random assignment would still hand one group a few more signups than the other — sometimes by more than 2.2 points. So a positive lift, on its own, isn’t proof the new page is better. We need a way to ask: how likely is a gap this big if the pages were actually the same?
That question has a name — statistical significance — and a tool: the two-proportion z-test, which we build in Module 4. It takes exactly these four numbers (503 and 5000 for control, 613 and 5000 for treatment) and returns how surprising this gap would be under pure chance. We are deliberately not computing it here. Module 1 was about the logic of experiments — why we randomize, what the groups mean, and how to read a raw result honestly. You now have a complete, correctly-described experiment result. Turning “the new page looks better” into “the new page is better” is the job of significance testing, and it’s where Module 4 picks up.
Practice Exercises
Exercise 1: Change the seed and watch the rates move
Re-run the whole experiment with np.random.default_rng(8) instead of 7, keeping everything else the same. Do the observed conversion rates change? Do they still land near the true 0.10 and 0.12? What does that tell you about the 0.1006 and 0.1226 you got with seed 7?
Hint
Only the seed changes — the true rates (0.10, 0.12) and sample sizes (5000) stay put. You’ll get different counts and a different lift, but both rates will still sit in the same neighborhood as the truth. That’s sampling noise made visible: the observed numbers wobble around the true rates from sample to sample, which is exactly why a single observed lift can’t be trusted on its own — and why a significance test exists.
Exercise 2: Compute the lift for a smaller true effect
Suppose the new page were only slightly better: control’s true rate is 0.10 and treatment’s is 0.11. Without simulating, compute the absolute and relative lift at the true rates. How does a 1-point absolute lift compare to the 2-point case as a relative number?
Hint
Absolute lift = 0.11 - 0.10 = 0.01, i.e. 1 percentage point. Relative lift = 0.01 / 0.10 = 0.10, i.e. 10%. So halving the absolute effect (2 points to 1 point) also halves the relative effect (about 22% to about 10%) here — because the control baseline (0.10) is unchanged. Relative lift always depends on the baseline you divide by, which is why the same absolute gain looks bigger on a low-converting page than a high-converting one.
Exercise 3: Explain absolute vs relative on a new pair
A different Lumen experiment reports that a checkout tweak moved purchase conversion from 4% to 5%. Write one sentence stating the absolute lift and one stating the relative lift, and explain why quoting only “conversions rose 25%” could mislead a stakeholder.
Hint
Absolute lift is 5% - 4% = 1 percentage point. Relative lift is 0.01 / 0.04 = 25%. Quoting only “25%” invites a stakeholder to imagine the rate jumped by 25 points (from 4% to 29%), when it actually rose by a single point. On a low baseline like 4%, relative lift looks dramatic while the absolute movement is small — so always name the unit and, ideally, report both.
Summary
You ran Lumen’s first experiment end to end. Starting from the Lesson 4 frame — control is the current signup page, treatment is the new page, and the primary metric is signup conversion — you generated a seeded dataset of 5000 users per group, with true conversion rates of 0.10 and 0.12 that the data then had to reveal. Counting signups gave observed rates of 0.1006 (503/5000) for control and 0.1226 (613/5000) for treatment. From those you computed absolute lift — 0.0220, or 2.2 percentage points — and relative lift — 0.0220 / 0.1006, or 21.9% — and saw why keeping the two straight matters. You read the result descriptively: the new page converted higher in this sample, but the observed rates differ from the true rates because of sampling noise, so whether a 2.2-point gap is real or chance is the open question a significance test answers. Every number in this lesson was produced and verified for real with numpy, seeded so you reproduce it exactly; there’s no API key and no cost.
Key Concepts
- Observed conversion rate — signups divided by users in a group;
.mean()of a boolean outcome array. - Absolute lift — the plain difference between rates, measured in percentage points.
- Relative lift — absolute lift divided by the control rate, measured in percent of the baseline.
- Sampling noise — observed rates differ from true rates run to run, so a single observed lift isn’t proof of a real effect.
Why This Matters
This is the shape of every A/B result you’ll ever read: two rates, a lift, and a decision waiting on whether that lift is real. Reading it honestly — knowing that a higher number in one sample is a description, not a verdict, and that absolute and relative lift tell different stories — is what separates a careful analyst from someone who ships on noise. You’ve now done the descriptive half of experiment analysis cleanly. The inferential half, which asks whether the gap could be chance, is exactly what significance testing in Module 4 will give you.
Next Steps
Continue to Module 2 - Designing an Experiment
Hypotheses, primary vs guardrail metrics, randomization units, and the minimum effect worth detecting.
Back to Module Overview
Return to The Logic of Experiments module overview
Continue Building Your Skills
You’ve taken Lumen’s signup experiment from a random split all the way to a measured, honestly-read result — two conversion rates, an absolute and a relative lift, and a clear-eyed view of what the numbers do and don’t prove. That completes the logic of experiments. Next, Module 2 turns from reading an experiment to designing one: writing a testable hypothesis, choosing primary and guardrail metrics, picking the right randomization unit, and deciding the smallest effect worth detecting before you ever collect data.