Lesson 1 - Sampling Distributions and Confidence Intervals

Welcome to Sampling Distributions and Confidence Intervals

In the last module you learned that a sample statistic — a sample mean, say — is just one draw from a lottery, and that it scatters around the true population value. This lesson takes that scatter seriously and turns it into a tool. If you can describe how a statistic varies from sample to sample, you can attach an honest margin of error to it and report a range that probably contains the truth.

You will do this with a real dataset of cars, simulating thousands of samples to watch a statistic’s own distribution take shape, then turning that distribution into a confidence interval you can compute with a one-line formula or with resampling.

By the end of this lesson, you will be able to:

Describe the sampling distribution of a statistic and simulate it from data
Compute the standard error and explain why larger samples give tighter estimates
State the Central Limit Theorem and recognize it at work on skewed data
Build and correctly interpret a 95% confidence interval, both with a formula and with the bootstrap

You only need a little Python, pandas, and numpy. Let’s begin.

A Statistic Is Random

Load the dataset for this module. It records fuel economy and engine details for 398 classic cars.

import numpy as np
import pandas as pd

cars = pd.read_csv("https://datatweets.com/datasets/cars.csv")
mpg = cars["mpg"]
print(round(mpg.mean(), 2), round(mpg.std(), 2), mpg.size)

23.51 7.82 398

Treat these 398 cars as our population. The population mean fuel economy is $\mu = 23.51$ mpg with a standard deviation of $\sigma = 7.82$ mpg. As before, this is a convenient fiction that lets us know the true answer and then see how close a sample gets.

Now draw four samples of 30 cars each and look at their means:

rng = np.random.default_rng(0)
for _ in range(4):
    sample = rng.choice(mpg.values, size=30, replace=False)
    print(round(sample.mean(), 2))

Four samples, four different means — 22.61, 24.46, 23.77, 23.75 — all hovering near the true 23.51. The lesson here is the one that everything else builds on: a sample statistic is itself a random variable. Recompute it on fresh data and it changes. So a statistic has its own distribution, and that distribution has a name.

The Sampling Distribution of the Mean

The sampling distribution of a statistic is the distribution of values it takes across all possible samples of a fixed size. We cannot enumerate every possible sample, but we can approximate the distribution by simulation: draw a sample, record its mean, and repeat thousands of times.

rng = np.random.default_rng(0)
sample_means = np.array([
    rng.choice(mpg.values, size=30, replace=False).mean()
    for _ in range(2000)
])
print("mean of the sample means:", round(sample_means.mean(), 2))
print("std of the sample means: ", round(sample_means.std(ddof=1), 2))

mean of the sample means: 23.52
std of the sample means:  1.37

Two things stand out. First, the average of our 2000 sample means is 23.52 — essentially the population mean of 23.51. The sample mean is unbiased: it does not systematically run high or low. Second, the sample means spread out far less than the raw data did. The cars themselves have a standard deviation of 7.82 mpg, but the means of samples of 30 have a standard deviation of only about 1.37 mpg.

The figure below makes that contrast visible. The grey histogram is the raw mpg values; the blue histogram is the 2000 sample means. Same data, but averaging 30 cars at a time produces something tight and bell-shaped centered on the truth.

Two overlaid histograms: a wide grey histogram of raw mpg values, and a much narrower blue histogram of 2000 sample means, both centered near 23.5 mpg. — The raw mpg values (grey) spread widely, but the means of 30-car samples (blue) cluster tightly into a bell shape around the true mean of 23.51 mpg.

The standard deviation of that blue distribution — how much a single sample mean typically misses by — is important enough to get its own name.

The Standard Error

The standard error (SE) is the standard deviation of a statistic’s sampling distribution. For a sample mean it has a clean formula:

SE = \frac{s}{\sqrt{n}}

where $s$ is the sample standard deviation and $n$ is the sample size. Notice what the $\sqrt{n}$ in the denominator does: as the sample grows, the standard error shrinks, so estimates from bigger samples land closer to the truth.

n = mpg.size
se = mpg.std() / np.sqrt(n)
print(round(se, 3))

0.392

For the whole 398-car dataset the standard error is 0.392 mpg. Watch the standard error fall as the sample size climbs:

for size in [10, 50, 200]:
    print(size, round(mpg.std() / np.sqrt(size), 3))

10 2.472
50 1.105
200 0.553

Quadrupling the sample size from 50 to 200 only halves the standard error — that is the $\sqrt{n}$ at work. Precision is expensive: each additional digit of accuracy costs many times more data.

Standard deviation vs. standard error

The standard deviation describes how spread out the individual data points are. The standard error describes how spread out a statistic (like the mean) is across samples. They are linked by $SE = s/\sqrt{n}$ : the standard error is always smaller, and it keeps shrinking as you collect more data, while the standard deviation does not.

The Central Limit Theorem

There is a reason the blue histogram came out bell-shaped even though the raw mpg values are lopsided. The Central Limit Theorem (CLT) says that for a large enough sample size, the sampling distribution of the mean is approximately normal — regardless of the shape of the population it was drawn from.

A skewed population on the left feeds arrows labelled draw many samples take each mean into a narrow normal bell of sample means centered at mu on the right — Whatever shape the population takes, the distribution of sample means is approximately normal and far tighter, centered on the true mean.

The weight column is clearly right-skewed, with a long tail of heavy cars:

print("weight skew:", round(cars["weight"].skew(), 2))

weight skew: 0.53

A positive skew means the distribution leans left with a tail stretching right. Yet if we sample 40 cars at a time and take the mean weight, those means form a tidy bell anyway:

rng = np.random.default_rng(0)
weight = cars["weight"].values
weight_means = np.array([
    rng.choice(weight, size=40, replace=False).mean()
    for _ in range(2000)
])
print("population mean weight:", round(cars["weight"].mean(), 1))
print("mean of sample means:  ", round(weight_means.mean(), 1))
print("std of sample means:   ", round(weight_means.std(ddof=1), 1))

population mean weight: 2970.4
mean of sample means:   2970.3
std of sample means:    129.2

The skewed population averages out: the sample means center on the true 2970.4 and spread symmetrically. This is what makes inference possible. Because the sampling distribution of the mean is approximately normal, we can use the normal distribution’s familiar landmark — about 95% of values fall within 1.96 standard deviations of the center — to build an interval estimate.

Confidence Intervals

A single number like “23.51 mpg” hides its own uncertainty. A confidence interval reports a range instead, built so that the procedure captures the true value a known fraction of the time. For a 95% interval around a mean, we reach out 1.96 standard errors on each side:

About twenty horizontal confidence interval bars stacked beside a vertical true mean line; nineteen blue bars cross the line and one orange bar misses it — 95% of such intervals contain the true mean — that is what 95% confidence means, and roughly one bar in twenty misses.

\bar{x} \pm z\,\frac{s}{\sqrt{n}}

with $z = 1.96$ for 95% confidence. Plug in our numbers:

mean = mpg.mean()
se = mpg.std() / np.sqrt(mpg.size)
lo = mean - 1.96 * se
hi = mean + 1.96 * se
print(round(lo, 2), round(hi, 2))

22.75 24.28

So the 95% confidence interval for mean mpg is (22.75, 24.28). The interval is $23.51 \pm 1.96 \times 0.392$ : the estimate plus or minus its margin of error.

What “95% confident” actually means

Here is the subtle part that trips up almost everyone. The 95% does not mean there is a 95% chance the true mean lies inside this particular interval. The true mean is a fixed number; this one interval either contains it or it does not. The 95% is a property of the procedure: if you repeated the whole sampling-and-interval process many times, about 95% of the intervals you built would capture the true mean.

We can check that claim directly. Draw 1000 samples of 40 cars, build a confidence interval from each, and count how many contain the true mean of 23.51:

rng = np.random.default_rng(0)
mu = mpg.mean()
covered = 0
for _ in range(1000):
    s = rng.choice(mpg.values, size=40, replace=False)
    e = 1.96 * s.std(ddof=1) / np.sqrt(40)
    if (s.mean() - e) <= mu <= (s.mean() + e):
        covered += 1
print(round(covered / 1000, 3))

0.958

About 95.8% of the intervals caught the true mean — right on target. The next figure shows 20 of these intervals: most (blue) straddle the red line at the true mean, and the occasional one (red) misses entirely. That is the 5% you expect to fail.

Twenty horizontal confidence intervals stacked vertically; most cross a red vertical line at the true mean, while one or two fall entirely to one side and miss it. — Each horizontal bar is a 95% confidence interval from one sample. Most capture the true mean (red line); the few that miss are the price of 95% confidence, not a mistake.

The Bootstrap Confidence Interval

The formula interval leans on the Central Limit Theorem and the normal landmark of 1.96. The bootstrap offers a simulation-based alternative that needs no formula at all. The idea is disarmingly simple: treat your sample as a stand-in for the population, then resample it with replacement many times, computing the statistic on each resample.

rng = np.random.default_rng(0)
boot_means = np.array([
    rng.choice(mpg.values, size=mpg.size, replace=True).mean()
    for _ in range(10000)
])
lo = np.percentile(boot_means, 2.5)
hi = np.percentile(boot_means, 97.5)
print(round(lo, 2), round(hi, 2))

22.77 24.28

Each resample draws 398 cars with replacement from the original 398, so some cars appear twice and others not at all — that variation mimics drawing a fresh sample from the population. The middle 95% of the 10,000 bootstrap means runs from the 2.5th to the 97.5th percentile, giving (22.77, 24.28).

Compare that to the formula interval of (22.75, 24.28): the two agree to within a couple of hundredths of a mpg. When the Central Limit Theorem applies, the bootstrap and the formula tell the same story. The bootstrap’s real value shows up for statistics with no neat formula — a median, a correlation, a ratio — where resampling still works without any new math.

When to reach for the bootstrap

Use the formula interval for a mean when your sample is reasonably large — it is fast and standard. Reach for the bootstrap when you need an interval for a statistic without a tidy standard-error formula, or when your sample is small and you would rather not assume normality.

Practice Exercises

Exercise 1: Standard error from a smaller sample

Draw a single simple random sample of 60 cars (use rng = np.random.default_rng(7)), then compute its sample mean mpg and its standard error. Is the standard error larger or smaller than the 0.392 we got from all 398 cars, and why?

Hint

Take sample = rng.choice(mpg.values, size=60, replace=False), then sample.mean() and sample.std(ddof=1) / np.sqrt(60). With $n = 60$ instead of 398, the $\sqrt{n}$ denominator is smaller, so the standard error is larger.

Exercise 2: A 90% confidence interval

Build a 90% confidence interval for mean mpg from the full dataset. A 90% interval reaches out $z = 1.645$ standard errors instead of 1.96. Is the 90% interval wider or narrower than the 95% one, and does that make sense?

Hint

Reuse se = mpg.std() / np.sqrt(mpg.size), then mpg.mean() - 1.645*se and mpg.mean() + 1.645*se. Less confidence means a smaller $z$ , so the interval is narrower — you are willing to be wrong more often in exchange for a tighter range.

Exercise 3: Bootstrap a different statistic

The formula $s/\sqrt{n}$ is for the mean, but the bootstrap works on any statistic. Build a 95% bootstrap confidence interval for the median mpg by resampling with replacement and taking the 2.5th and 97.5th percentiles of the resampled medians.

Hint

Copy the bootstrap loop but replace .mean() with np.median(...): np.median(rng.choice(mpg.values, size=mpg.size, replace=True)). Collect 10,000 of them and read off np.percentile(boot, [2.5, 97.5]). No standard-error formula required.

Summary

A sample statistic is a random variable with its own distribution — the sampling distribution. By simulating thousands of samples from the cars data, you saw the sample mean center on the true value and spread far less than the raw data, with a spread measured by the standard error, $SE = s/\sqrt{n}$ , which shrinks as samples grow. The Central Limit Theorem explains why that sampling distribution is bell-shaped even when the population is skewed, and that normality lets us build a confidence interval: $\bar{x} \pm 1.96 \cdot SE$ gave a 95% interval of (22.75, 24.28) for mean mpg. The bootstrap reproduced almost the same interval by resampling alone — and works for statistics that have no formula.

Key Concepts

Sampling distribution — the distribution of a statistic across all possible samples of a fixed size.
Standard error (SE) — the standard deviation of a statistic’s sampling distribution; for a mean, $s/\sqrt{n}$ .
Central Limit Theorem — the sampling distribution of the mean is approximately normal for large $n$ , whatever the population’s shape.
Confidence interval — a range, built by a procedure that captures the true parameter a known fraction of the time (e.g. 95%).
Confidence interpretation — the confidence level describes the long-run procedure, not the probability for one fixed interval.
Bootstrap — resampling a sample with replacement to approximate a statistic’s sampling distribution without a formula.

Why This Matters

Every estimate you will ever report — a click-through rate, an average response time, a model’s accuracy — is computed from a sample and therefore carries uncertainty. Knowing how to quantify that uncertainty with a standard error and communicate it with a confidence interval is what turns a number into a defensible claim. And the bootstrap means you are never stuck just because a textbook formula does not exist for your statistic.

Next Steps

Continue to Lesson 2 - Hypothesis Testing

Turn confidence intervals into decisions: frame a null hypothesis, compute a p-value, and judge whether an effect is real.

Back to Module Overview

Return to the Statistical Inference module overview

Continue Building Your Skills

You can now describe how a statistic varies, summarize that variation with a standard error, and report an honest range instead of a single point. Next you will put that machinery to work in reverse: instead of estimating where the truth lies, you will ask whether a specific claim about it can survive the evidence — the logic of hypothesis testing.

Next lesson

Lesson 2 - Hypothesis Testing

Courses

DATATWEETS

Title here

Lesson 1 - Sampling Distributions and Confidence Intervals

Welcome to Sampling Distributions and Confidence Intervals

A Statistic Is Random

The Sampling Distribution of the Mean

The Standard Error

The Central Limit Theorem

Confidence Intervals

What “95% confident” actually means

The Bootstrap Confidence Interval

Practice Exercises

Exercise 1: Standard error from a smaller sample

Exercise 2: A 90% confidence interval

Exercise 3: Bootstrap a different statistic

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 2 - Hypothesis Testing

Back to Module Overview

Continue Building Your Skills

Lesson 1 - Sampling Distributions and Confidence Intervals

Welcome to Sampling Distributions and Confidence Intervals#

A Statistic Is Random#

The Sampling Distribution of the Mean#

The Standard Error#

The Central Limit Theorem#

Confidence Intervals#

What “95% confident” actually means#

The Bootstrap Confidence Interval#

Practice Exercises#

Exercise 1: Standard error from a smaller sample#

Exercise 2: A 90% confidence interval#

Exercise 3: Bootstrap a different statistic#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 2 - Hypothesis Testing

Back to Module Overview

Continue Building Your Skills#

Welcome to Sampling Distributions and Confidence Intervals

A Statistic Is Random

The Sampling Distribution of the Mean

The Standard Error

The Central Limit Theorem

Confidence Intervals

What “95% confident” actually means

The Bootstrap Confidence Interval

Practice Exercises

Exercise 1: Standard error from a smaller sample

Exercise 2: A 90% confidence interval

Exercise 3: Bootstrap a different statistic

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills