Lesson 2 - Control and Treatment Groups

Welcome to Control and Treatment Groups

In Lesson 1 you saw why randomization matters: it balances every variable across groups, so a measured difference points to the change rather than to a hidden confounder. Now we turn that logic into a concrete design. An A/B test has two groups. The control group keeps the current experience — the status quo, unchanged. The treatment group gets the new version, with exactly one thing different. You show each group its version, measure the same metric for both, and compare. That’s the whole frame, and almost every word of it earns its place. This lesson is about what each group is for, and the three decisions that make the comparison trustworthy: what the control really represents, changing one thing at a time, and what you randomize.

By the end of this lesson, you will be able to:

  • Explain what control and treatment groups are, and how control stands in for the counterfactual
  • Say why you change only one thing between the two versions
  • Choose a randomization unit and explain why randomizing by user is usually right
  • Split users randomly into control and treatment in Python

Let’s start with the group that does the quiet work.


The Control Group Is a Stand-In for the Counterfactual

The question an experiment really wants to answer is a counterfactual: what would have happened to these users if we hadn’t made the change? If we knew that, we could compare it against what actually happened with the change and read off the effect directly. But we can’t. The same user can’t both see the new Lumen signup page and not see it — you only ever observe one of the two worlds for any given person.

The control group is our practical answer to that impossible question. Instead of observing the same users both ways, we take a different group of users — one that is equivalent to the treatment group on average, because we assigned them by chance — and let them keep the status quo. Because the two groups started the same (that’s what randomization from Lesson 1 buys us), the control group’s outcome is a fair estimate of what the treatment group would have done without the change. The control isn’t a throwaway “do nothing” bucket; it’s the measuring stick. Without it, a conversion rate in the treatment group is just a number with nothing to compare against.

The treatment group, then, is the group that receives the new version — the one change we want to evaluate. Because A is the control and B is the treatment, the whole thing gets the name A/B test.

Users arriving at Lumen flow into a random split that sends them to Control (A) the current page or Treatment (B) the new page; the same metric (conversion rate) is measured for each group and compared; because assignment is random the groups start the same, so a metric difference points to the change.
The A/B frame: a random split routes each Lumen user to control (A, the current page) or treatment (B, the new page), the same metric is measured for both, and because the groups start equivalent, a difference points to the change.

Change One Thing at a Time

For the comparison to mean anything, the treatment must differ from the control in exactly one way. If B has a new headline and a new button color and a shorter form, and it wins, you’ve learned that some combination of those helped — but you can’t say which one, or whether one of them actually hurt while the others carried it. The single difference is what lets you attribute the result to a specific change.

Concretely, if Lumen wants to test a new signup button color, then the button color is the only thing that should differ between A and B. Same headline, same layout, same copy, same load behavior, same everything else. The control page and the treatment page are identical twins with one deliberate difference. Anything else that varies becomes a competing explanation for the result — a confounder you built in by hand.

This is the design-time version of Lesson 1’s lesson. Randomization removes confounders that come from who is in each group; changing one thing at a time removes confounders that come from what the two versions do. You need both.


The Randomization Unit: Randomize by User

Random assignment is the engine, but you have to decide what you’re assigning. The thing you randomize is called the randomization unit, and the usual choice is the user. Sometimes teams randomize by session, device, or account instead, but user is the default for a good reason.

Randomizing by user means each person is assigned once and stays in that group for the whole experiment, across every visit. That matters for two reasons. First, consistency: a user who lands in treatment sees the new page every time they come back, rather than flipping between old and new on each visit — which would be a confusing experience and would muddy what they actually reacted to. Second, independence: if you randomized by session, one person’s ten visits could scatter across both groups, so your “counts” would no longer be counts of independent people. The same user appearing on both sides breaks the assumption that the groups are made of separate, comparable units — and that assumption is what the whole comparison rests on.

Session-level (or device-level) randomization does exist and occasionally makes sense, but it carries exactly this risk: the same person can see both A and B, blurring the very difference you’re trying to measure. Unless you have a specific reason, randomize by user.


Splitting Users in Python

Here’s the mechanics of the split itself. We assign each of 10,000 users to control or treatment with a coin flip — a seeded random draw so the result is reproducible.

import numpy as np

rng = np.random.default_rng(0)
n = 10000
group = np.where(rng.random(n) < 0.5, "treatment", "control")
print((group == "control").sum(), (group == "treatment").sum())

Running it:

5010 4990

Roughly 5,000 in each group — a near-even split, but not exactly 5,000/5,000, because each user is an independent coin flip rather than a forced half-and-half. That small wobble is expected and fine.

A 50/50 split is the most common choice, but it isn’t sacred: any fixed ratio works — 90/10, 80/20 — and you’ll sometimes want a smaller treatment group when a change is risky. What actually matters is not the ratio but that assignment is random and independent of the user’s characteristics. That independence is the whole point from Lesson 1: because the coin flip doesn’t know or care how motivated a user is, motivation (and everything else) lands evenly on both sides, and the two groups start the same.

The split doesn’t need to be 50/50 — it needs to be random

It’s easy to fixate on getting a perfectly even split, but the even ratio is a convenience, not the source of the guarantee. The causal magic comes entirely from randomness independent of the user, not from the proportions. A 70/30 split assigned by a fair coin still balances confounders across groups; a 50/50 split assigned by something correlated with the user — say, “new users go to treatment” — does not, and quietly reintroduces exactly the confounding Lesson 1 warned about. Pick a ratio for practical reasons; protect the randomness for statistical ones.


Practice Exercises

Exercise 1: What is the control group for?

Lumen runs a test and reports that the treatment page converted at 22%. A teammate says, “Great, the new page works.” Why is that number, on its own, not enough — and what does the control group add?

Hint

22% is meaningless without a comparison. The control group answers the counterfactual: what these users would have converted at without the change. If control also converted at 22%, the new page did nothing; if control converted at 18%, the page added about 4 points. The control is the measuring stick that turns a raw rate into an effect.

Exercise 2: One change or several?

Lumen’s designer wants B to have a new headline, a green button (instead of blue), and a two-field form (instead of four). B wins by 5 points. What can and can’t the team conclude, and how should they have run it?

Hint

They can conclude the bundle of three changes together beat the old page — but not which change did the work, or whether one of them actually hurt while the others carried it. To learn what each change does, vary one thing at a time (separate tests, or a design that isolates each factor). The single controlled difference is what lets you attribute a result to a specific change.

Exercise 3: Why randomize by user?

An engineer proposes assigning the page randomly on every page load instead of per user, arguing it’s simpler. Give two problems this causes.

Hint

First, inconsistency: a returning user would flip between the old and new page across visits, a confusing experience that muddies what they reacted to. Second, broken independence: one person’s visits would land in both groups, so your group counts are no longer counts of separate, comparable people — undermining the comparison the whole test relies on. Randomizing by user gives each person one consistent, independent assignment.


Summary

An A/B test is built from two groups. The control (A) keeps the status quo and serves as a practical stand-in for the counterfactual — what the treatment group would have done without the change — which works only because random assignment makes the two groups equivalent to begin with. The treatment (B) carries exactly one change, so any measured difference can be attributed to that change and not to a bundle of them. You randomize by a unit, usually the user, so each person gets one consistent assignment and the groups stay made of independent people. A seeded split sent 10,000 users to roughly 5,017/4,983 — a near-even 50/50 — but the ratio is a convenience; the guarantee comes from assignment being random and independent of the user, the same independence that broke the confounder in Lesson 1.

Key Concepts

  • Control group (A) — the unchanged status quo, standing in for the counterfactual.
  • Treatment group (B) — the version with exactly one change under evaluation.
  • One change at a time — the single controlled difference is what lets you attribute a result.
  • Randomization unit — what you assign (usually the user), keeping groups consistent and independent.

Why This Matters

This frame — control, treatment, one change, randomized by user — is the skeleton of every A/B test you’ll ever run or read about. Get any piece wrong and the result quietly stops meaning what you think it means: no control and you have a number with nothing to compare it to; multiple changes and you can’t say what worked; the wrong randomization unit and your “users” aren’t independent. With the frame in place, the natural next question is why random assignment actually delivers equivalent groups — which is exactly what Lesson 3 pins down.


Next Steps

Continue to Lesson 3 - Why Randomization Works

See exactly why a random split produces groups that are equivalent on everything at once.

Back to Module Overview

Return to The Logic of Experiments module overview


Continue Building Your Skills

You’ve turned Lesson 1’s logic into a design: a control group that stands in for the counterfactual, a treatment group with exactly one change, and a random split by user that keeps the groups comparable and independent. You even assigned 10,000 users in a few lines of Python. Next you’ll look under the hood of the split itself and see why randomization produces groups that start the same — the guarantee everything here has been leaning on.