Lesson 1 - Estimating Probabilities

Welcome to Estimating Probabilities

Every time you check a weather app, read a poll, or trust a model that says “70% likely,” you are reading a probability — a single number that measures how likely something is. Before you can combine probabilities or reason about uncertainty, you need to answer a more basic question: where does that number come from in the first place?

There are two answers. Sometimes you count what actually happened in data, and sometimes you reason about what could happen from the structure of the problem. In this lesson you will do both — estimating a real probability from a dataset of penguins, calculating exact probabilities for coins and dice, and then running a simulation that shows the two ideas are really one.

By the end of this lesson, you will be able to:

Define a probability as a number between 0 and 1 and read it correctly
Estimate an empirical probability from observed data as a relative frequency
Calculate a theoretical probability when outcomes are equally likely
Explain the law of large numbers and why a few trials cannot be trusted

You only need a little Python, numpy, and pandas. Let’s begin.

What a Probability Is

A probability is a number that measures how likely an event is, and it always lives between 0 and 1. A probability of 0 means the event is impossible; a probability of 1 means it is certain; everything interesting sits in between. We write the probability of an event $A$ as $P(A)$ .

0 \le P(A) \le 1

You can read a probability as a long-run fraction. If $P(A) = 0.36$ , it means that if you could repeat the situation many times, event $A$ would happen about 36% of the time. That single idea — a probability is the fraction of times something happens in the long run — is the thread running through everything below. The only question is whether you measure that fraction from data you have collected or compute it from the rules of the game.

Empirical Probability: Counting What Happened

An empirical probability is an estimate based on observed data. You count how often an event occurred and divide by the total number of observations. It is sometimes called the relative frequency of the event.

P(A) = \frac{\text{number of times } A \text{ occurred}}{\text{total number of observations}}

To make this concrete, load the penguins dataset — body measurements of three penguin species near Palmer Station, Antarctica — and look at how the species break down.

import pandas as pd

penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(penguins.shape)
print(penguins["species"].value_counts())

(344, 7)
species
Adelie       152
Gentoo       124
Chinstrap     68
Name: count, dtype: int64

There are 344 penguins in total. Suppose you reach into this group and pick one penguin at random. What is the probability it is a Gentoo? You already have the ingredients: 124 of the 344 penguins are Gentoo, so the empirical probability is that fraction.

p_gentoo = (penguins["species"] == "Gentoo").mean()
print(round(p_gentoo, 4))

0.3605

So $P(\text{Gentoo}) = \dfrac{124}{344} = 0.3605$ . Notice the trick: comparing a column to a value gives a column of True/False, and .mean() treats True as 1 and False as 0 — so the mean of that column is exactly the proportion that are True. That proportion is the empirical probability.

The same recipe works for any event you can express as a condition. Here are all three species as probabilities:

print((penguins["species"].value_counts(normalize=True)).round(4))

species
Adelie       0.4419
Gentoo       0.3605
Chinstrap    0.1977

These three numbers add up to 1, because every penguin is exactly one species — one of these outcomes is certain to happen.

Why empirical probability matters

Look closely: the three species are not equally likely. Adelie penguins show up more than twice as often as Chinstraps. You could never have guessed $P(\text{Gentoo}) = 0.3605$ by reasoning about the structure of penguins — you had to measure it. Whenever outcomes are unequal or unknown, estimating from data is the only honest way to get a probability.

Theoretical Probability: Equally Likely Outcomes

Sometimes you do not need data at all. When every possible outcome is equally likely, you can calculate a theoretical probability by simply counting. Divide the number of outcomes that count as a success — the favorable outcomes — by the total number of possible outcomes.

P(A) = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}

Two panels. On the left, a fair die's six faces are each labelled with probability one-sixth, illustrating favorable-over-total reasoning. On the right, a bar chart of observed relative frequencies from about 1000 rolls hovers around the dashed one-sixth line. — Two routes to the same probability: theory reads one-sixth straight off the die's symmetry (favorable over total), while the empirical estimate comes from counting actual rolls and settles near one-sixth as the data piles up.

A fair coin has two equally likely outcomes, heads and tails. Exactly one of them is heads, so:

P(\text{heads}) = \frac{1}{2} = 0.5

A fair six-sided die has six equally likely faces. Three of them are even (2, 4, 6), so the probability of rolling an even number is:

P(\text{even}) = \frac{3}{6} = 0.5

You did not roll a single die to get that — you reasoned it out from the structure of the die. That is the power and the limit of theoretical probability: it is exact and instant, but it only works when you can honestly claim every outcome is equally likely. A fair coin qualifies. The penguin species above do not, which is why you had to fall back on counting.

So we now have two numbers that both claim to be a probability: the theoretical $P(\text{heads}) = 0.5$ we calculated, and the empirical fraction we would get by actually flipping a coin. Do they agree? The answer is the most important idea in this lesson.

The Law of Large Numbers

The law of large numbers says that as you repeat a random experiment more and more times, the empirical probability (the fraction you observe) gets closer and closer to the theoretical probability. The two faces of probability converge.

Let’s watch it happen. We’ll simulate flipping a fair coin with numpy, using a seeded random generator so you get the exact same flips when you run this yourself. We represent heads as 1 and tails as 0, so the running mean of the flips is the running proportion of heads.

import numpy as np

rng = np.random.default_rng(0)
flips = rng.integers(0, 2, size=10000)   # 0 = tails, 1 = heads
print(flips[:10])

[1 1 1 0 0 0 0 0 0 1]

Those are the first ten flips: three heads, then six tails, then a head. Now track the running proportion of heads — the cumulative number of heads divided by the number of flips so far — and check it at a few milestones.

running = np.cumsum(flips) / np.arange(1, len(flips) + 1)

for n in [10, 100, 1000, 10000]:
    print(n, round(running[n - 1], 4))

10 0.4
100 0.56
1000 0.537
10000 0.503

After only 10 flips the proportion is 0.4 — a long way from 0.5. After 100 it is 0.56, still off. But by 1,000 flips it has tightened to 0.537, and by 10,000 flips it is 0.503, almost exactly the theoretical value. The more you flip, the closer the observed fraction hugs the true probability.

The figure below plots the full run. The blue line is the running proportion of heads; the dashed gray line marks the theoretical 0.5. Watch how wildly the line swings at the start and how it calms down and locks onto 0.5 as the flips pile up.

A line chart of the running proportion of heads from 5000 seeded coin flips, swinging widely for the first few dozen flips then converging tightly onto a dashed reference line at 0.5. — The running proportion of heads from a seeded simulation of 5,000 fair-coin flips. Early on the proportion lurches above and below 0.5; as the number of flips grows (the x-axis is on a log scale) it settles onto the theoretical value of 0.5 — the law of large numbers in action.

This convergence is exactly why empirical and theoretical probability are two views of the same thing. The theoretical 0.5 is the value the empirical fraction is heading toward; the empirical fraction is the theoretical value made visible by repetition.

Why a few trials cannot be trusted

The flip side of the law of large numbers is a warning: with a small number of trials, the empirical fraction can be far off. Flip a coin just ten times, with several different seeds, and look at how much the answer jumps around.

for seed in [0, 1, 2, 3]:
    r = np.random.default_rng(seed)
    f = r.integers(0, 2, size=10)
    print(seed, f.sum(), "heads ->", round(f.mean(), 2))

0 4 heads -> 0.4
1 5 heads -> 0.5
2 3 heads -> 0.3
3 4 heads -> 0.4

Four runs of ten flips give 0.4, 0.5, 0.3, 0.4 — anywhere from 30% to 50% heads, even though the coin is perfectly fair. A single short run could badly mislead you. Now do the same with 1,000 flips each:

for seed in [0, 1, 2, 3]:
    r = np.random.default_rng(seed)
    f = r.integers(0, 2, size=1000)
    print(seed, round(f.mean(), 4))

Now every run lands close to 0.5. The lesson is sharp: an empirical probability is only as trustworthy as the number of observations behind it. Estimating $P(\text{Gentoo})$ from 344 penguins is reasonably solid; estimating it from 5 penguins would not be.

Empirical or theoretical?

Ask one question: are the outcomes equally likely and known? If yes — a fair coin, a balanced die, a shuffled deck — calculate the theoretical probability by counting favorable over total. If no, or if you simply do not know — penguin species, customer churn, whether it rains tomorrow — collect data and estimate the empirical probability, and gather as much data as you can.

Practice Exercises

Exercise 1: Estimate a probability from data

Using the penguins dataset, estimate the empirical probability that a randomly chosen penguin lives on the island "Biscoe". Then state in words what that number means.

Hint

Build a True/False condition and take its mean: (penguins["island"] == "Biscoe").mean(). You should get about 0.4884 — read it as “roughly 49% of these penguins live on Biscoe.” You can check the raw counts with penguins["island"].value_counts().

Exercise 2: Calculate a theoretical probability, then test it

A fair six-sided die has six equally likely faces. Calculate the theoretical probability of rolling a 5 or a 6. Then simulate 100,000 rolls with rng = np.random.default_rng(0) and compare the empirical fraction to your theoretical answer.

Hint

Two faces out of six are favorable, so the theoretical probability is $\tfrac{2}{6} \approx 0.3333$ . Simulate with rolls = rng.integers(1, 7, size=100000) (the upper bound 7 is excluded, so faces run 1–6) and compute (rolls >= 5).mean(). You should land near 0.3326 — close to the theoretical value, exactly as the law of large numbers predicts.

Exercise 3: See small samples mislead you

Flip a fair coin only 12 times using rng = np.random.default_rng(2) and compute the proportion of heads. Then do the same with 5,000 flips. Which result do you trust more, and why?

Hint

Use rng.integers(0, 2, size=12).mean() for the short run and np.random.default_rng(2).integers(0, 2, size=5000).mean() for the long one. The short run can drift well away from 0.5, while the long run will sit close to it — the larger sample is the more reliable estimate.

Summary

You learned that a probability is a number between 0 and 1 measuring how likely an event is, and that there are two ways to find one. An empirical probability is a relative frequency you count from observed data — like $P(\text{Gentoo}) = 0.3605$ estimated from 344 penguins. A theoretical probability is a calculation of favorable outcomes over total outcomes, valid when every outcome is equally likely — like $P(\text{heads}) = 0.5$ for a fair coin. The law of large numbers unites them: as trials pile up, the empirical fraction converges on the theoretical value, which is also why estimates from only a handful of trials cannot be trusted.

Key Concepts

Probability — a number in $[0, 1]$ measuring how likely an event is; $P(A) = 0$ is impossible, $P(A) = 1$ is certain.
Empirical probability — an estimate from data, equal to the relative frequency (count of the event divided by total observations).
Theoretical probability — favorable outcomes divided by total outcomes, valid only when outcomes are equally likely.
Equally likely outcomes — outcomes with the same chance of occurring, the condition theoretical probability depends on.
Law of large numbers — as the number of trials grows, the empirical probability converges to the theoretical probability.

Why This Matters

Almost every probability you will use in real work is empirical — churn rates, click-through rates, disease prevalence, model confidence — because the outcomes are rarely equally likely and the true value is unknown. Knowing that those estimates are relative frequencies, that they get more reliable with more data, and that a small sample can lie to you is the difference between a number you can build a decision on and one that just happens to sound precise.

Next Steps

Continue to Lesson 2 - Probability Rules

Combine events with the addition rule, multiplication rule, and complements to compute probabilities of compound events.

Back to Module Overview

Return to the Probability Fundamentals module overview

Continue Building Your Skills

You can now put a trustworthy number on a single event — by counting it in data or calculating it from equally likely outcomes — and you understand why more trials make that number more reliable. Next you will learn how to combine events: the rules that let you find the probability of “this or that” and “this and that,” and reason about what happens when one event depends on another.

Next lesson

Lesson 2 - Probability Rules

Courses

DATATWEETS

Title here

Lesson 1 - Estimating Probabilities

Welcome to Estimating Probabilities

What a Probability Is

Empirical Probability: Counting What Happened

Theoretical Probability: Equally Likely Outcomes

The Law of Large Numbers

Why a few trials cannot be trusted

Practice Exercises

Exercise 1: Estimate a probability from data

Exercise 2: Calculate a theoretical probability, then test it

Exercise 3: See small samples mislead you

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 2 - Probability Rules

Back to Module Overview

Continue Building Your Skills

Lesson 1 - Estimating Probabilities

Welcome to Estimating Probabilities#

What a Probability Is#

Empirical Probability: Counting What Happened#

Theoretical Probability: Equally Likely Outcomes#

The Law of Large Numbers#

Why a few trials cannot be trusted#

Practice Exercises#

Exercise 1: Estimate a probability from data#

Exercise 2: Calculate a theoretical probability, then test it#

Exercise 3: See small samples mislead you#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 2 - Probability Rules

Back to Module Overview

Continue Building Your Skills#

Welcome to Estimating Probabilities

What a Probability Is

Empirical Probability: Counting What Happened

Theoretical Probability: Equally Likely Outcomes

The Law of Large Numbers

Why a few trials cannot be trusted

Practice Exercises

Exercise 1: Estimate a probability from data

Exercise 2: Calculate a theoretical probability, then test it

Exercise 3: See small samples mislead you

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills