Module · 5 lessons

Analyzing Mean Metrics

Revenue and time metrics are averages, not rates — and they're skewed. Learn the two-sample t-test, why Welch is the safe default, confidence intervals for a difference in means, and the Mann-Whitney fallback when the average lies.

Start module Back to A/B Testing & Experimentation

At a glance

Level

Intermediate

Lessons

5 lessons

Time to complete

1 week

Cost

Free forever · no sign-up

Welcome to Analyzing Mean Metrics, the fifth module. Not every metric is a rate. Revenue per user, time on task, order value, sessions per week — these are averages of a continuous quantity, and the two-proportion z-test you just learned doesn’t apply to them. They also bring a new complication: money and time metrics are almost always heavily skewed, with a long tail of big spenders or long sessions that can make the average behave in surprising — and misleading — ways. This module gives you the tools to analyze mean metrics honestly.

You’ll start with the two-sample t-test, the mean-metric counterpart to the z-test. You’ll learn why Welch’s t-test — which allows the two groups to have different variances — is the safe default, and watch it overturn a false “significant” result on real data. You’ll build a confidence interval for the difference in means. And you’ll meet the Mann-Whitney U test, a rank-based fallback for when skew makes the average untrustworthy — on a Lumen revenue experiment where the mean went up but the typical user actually spent less. The module ends with a full guided analysis and a decision that the raw average alone would have gotten wrong.

Every result here is real, runnable Python: the t-tests, the confidence interval, and the Mann-Whitney test are computed with numpy and scipy on the same seeded, skewed revenue data, so you see exactly how — and when — the average can deceive you. Start with Lesson 1 on the two-sample t-test.

Lessons in this module

1 The Two-Sample T-Test When the metric is an average — revenue per user — the z-test no longer applies. Meet the two-sample t-test: how a difference in means becomes a t statistic and a p-value, run on Lumen's revenue experiment. 2 Welch's T-Test and Unequal Variances The two-sample t-test comes in two flavors — Student's and Welch's — and on Lumen's revenue data the choice between them flips the decision from ship to don't ship. 3 Skewed Metrics and the Mann-Whitney Test When revenue is heavily right-skewed, the mean can rise while the typical user spends less — meet the Mann-Whitney U test, a rank-based tool that answers the per-user question the mean cannot. 4 Confidence Intervals for the Difference in Means A p-value gives a yes/no; a confidence interval gives dollars. Build the Welch 95% CI for Lumen's revenue lift and see why the interval — which barely includes $0 — says don't ship yet. 5 Guided Project: Analyze Lumen's Revenue Experiment Analyze a skewed revenue metric end to end — Welch t-test, confidence interval, and a Mann-Whitney check — and reach a decision the raw average would have gotten wrong.

Achievement

Complete all 5 lessons to finish the Analyzing Mean Metrics module.

Start module

Courses

DATATWEETS

Title here

Analyzing Mean Metrics

At a glance

Lessons in this module