Lesson 1 - Estimating Probabilities
Welcome to Estimating Probabilities
Every time you check a weather app, read a poll, or trust a model that says “70% likely,” you are reading a probability — a single number that measures how likely something is. Before you can combine probabilities or reason about uncertainty, you need to answer a more basic question: where does that number come from in the first place?
There are two answers. Sometimes you count what actually happened in data, and sometimes you reason about what could happen from the structure of the problem. In this lesson you will do both — estimating a real probability from a dataset of penguins, calculating exact probabilities for coins and dice, and then running a simulation that shows the two ideas are really one.
By the end of this lesson, you will be able to:
- Define a probability as a number between 0 and 1 and read it correctly
- Estimate an empirical probability from observed data as a relative frequency
- Calculate a theoretical probability when outcomes are equally likely
- Explain the law of large numbers and why a few trials cannot be trusted
You only need a little Python, numpy, and pandas. Let’s begin.
What a Probability Is
A probability is a number that measures how likely an event is, and it always lives between 0 and 1. A probability of 0 means the event is impossible; a probability of 1 means it is certain; everything interesting sits in between. We write the probability of an event as .
You can read a probability as a long-run fraction. If , it means that if you could repeat the situation many times, event would happen about 36% of the time. That single idea — a probability is the fraction of times something happens in the long run — is the thread running through everything below. The only question is whether you measure that fraction from data you have collected or compute it from the rules of the game.
Empirical Probability: Counting What Happened
An empirical probability is an estimate based on observed data. You count how often an event occurred and divide by the total number of observations. It is sometimes called the relative frequency of the event.
To make this concrete, load the penguins dataset — body measurements of three penguin species near Palmer Station, Antarctica — and look at how the species break down.
import pandas as pd
penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(penguins.shape)
print(penguins["species"].value_counts())(344, 7)
species
Adelie 152
Gentoo 124
Chinstrap 68
Name: count, dtype: int64There are 344 penguins in total. Suppose you reach into this group and pick one penguin at random. What is the probability it is a Gentoo? You already have the ingredients: 124 of the 344 penguins are Gentoo, so the empirical probability is that fraction.
p_gentoo = (penguins["species"] == "Gentoo").mean()
print(round(p_gentoo, 4))0.3605So . Notice the trick: comparing a column to a value gives a column of True/False, and .mean() treats True as 1 and False as 0 — so the mean of that column is exactly the proportion that are True. That proportion is the empirical probability.
The same recipe works for any event you can express as a condition. Here are all three species as probabilities:
print((penguins["species"].value_counts(normalize=True)).round(4))species
Adelie 0.4419
Gentoo 0.3605
Chinstrap 0.1977These three numbers add up to 1, because every penguin is exactly one species — one of these outcomes is certain to happen.
Why empirical probability matters
Look closely: the three species are not equally likely. Adelie penguins show up more than twice as often as Chinstraps. You could never have guessed by reasoning about the structure of penguins — you had to measure it. Whenever outcomes are unequal or unknown, estimating from data is the only honest way to get a probability.
Theoretical Probability: Equally Likely Outcomes
Sometimes you do not need data at all. When every possible outcome is equally likely, you can calculate a theoretical probability by simply counting. Divide the number of outcomes that count as a success — the favorable outcomes — by the total number of possible outcomes.
A fair coin has two equally likely outcomes, heads and tails. Exactly one of them is heads, so:
A fair six-sided die has six equally likely faces. Three of them are even (2, 4, 6), so the probability of rolling an even number is:
You did not roll a single die to get that — you reasoned it out from the structure of the die. That is the power and the limit of theoretical probability: it is exact and instant, but it only works when you can honestly claim every outcome is equally likely. A fair coin qualifies. The penguin species above do not, which is why you had to fall back on counting.
So we now have two numbers that both claim to be a probability: the theoretical we calculated, and the empirical fraction we would get by actually flipping a coin. Do they agree? The answer is the most important idea in this lesson.
The Law of Large Numbers
The law of large numbers says that as you repeat a random experiment more and more times, the empirical probability (the fraction you observe) gets closer and closer to the theoretical probability. The two faces of probability converge.
Let’s watch it happen. We’ll simulate flipping a fair coin with numpy, using a seeded random generator so you get the exact same flips when you run this yourself. We represent heads as 1 and tails as 0, so the running mean of the flips is the running proportion of heads.
import numpy as np
rng = np.random.default_rng(0)
flips = rng.integers(0, 2, size=10000) # 0 = tails, 1 = heads
print(flips[:10])[1 1 1 0 0 0 0 0 0 1]Those are the first ten flips: three heads, then six tails, then a head. Now track the running proportion of heads — the cumulative number of heads divided by the number of flips so far — and check it at a few milestones.
running = np.cumsum(flips) / np.arange(1, len(flips) + 1)
for n in [10, 100, 1000, 10000]:
print(n, round(running[n - 1], 4))10 0.4
100 0.56
1000 0.537
10000 0.503After only 10 flips the proportion is 0.4 — a long way from 0.5. After 100 it is 0.56, still off. But by 1,000 flips it has tightened to 0.537, and by 10,000 flips it is 0.503, almost exactly the theoretical value. The more you flip, the closer the observed fraction hugs the true probability.
The figure below plots the full run. The blue line is the running proportion of heads; the dashed gray line marks the theoretical 0.5. Watch how wildly the line swings at the start and how it calms down and locks onto 0.5 as the flips pile up.
This convergence is exactly why empirical and theoretical probability are two views of the same thing. The theoretical 0.5 is the value the empirical fraction is heading toward; the empirical fraction is the theoretical value made visible by repetition.
Why a few trials cannot be trusted
The flip side of the law of large numbers is a warning: with a small number of trials, the empirical fraction can be far off. Flip a coin just ten times, with several different seeds, and look at how much the answer jumps around.
for seed in [0, 1, 2, 3]:
r = np.random.default_rng(seed)
f = r.integers(0, 2, size=10)
print(seed, f.sum(), "heads ->", round(f.mean(), 2))0 4 heads -> 0.4
1 5 heads -> 0.5
2 3 heads -> 0.3
3 4 heads -> 0.4Four runs of ten flips give 0.4, 0.5, 0.3, 0.4 — anywhere from 30% to 50% heads, even though the coin is perfectly fair. A single short run could badly mislead you. Now do the same with 1,000 flips each:
for seed in [0, 1, 2, 3]:
r = np.random.default_rng(seed)
f = r.integers(0, 2, size=1000)
print(seed, round(f.mean(), 4))0 0.537
1 0.491
2 0.5
3 0.491Now every run lands close to 0.5. The lesson is sharp: an empirical probability is only as trustworthy as the number of observations behind it. Estimating from 344 penguins is reasonably solid; estimating it from 5 penguins would not be.
Empirical or theoretical?
Ask one question: are the outcomes equally likely and known? If yes — a fair coin, a balanced die, a shuffled deck — calculate the theoretical probability by counting favorable over total. If no, or if you simply do not know — penguin species, customer churn, whether it rains tomorrow — collect data and estimate the empirical probability, and gather as much data as you can.
Practice Exercises
Exercise 1: Estimate a probability from data
Using the penguins dataset, estimate the empirical probability that a randomly chosen penguin lives on the island "Biscoe". Then state in words what that number means.
Hint
Build a True/False condition and take its mean: (penguins["island"] == "Biscoe").mean(). You should get about 0.4884 — read it as “roughly 49% of these penguins live on Biscoe.” You can check the raw counts with penguins["island"].value_counts().
Exercise 2: Calculate a theoretical probability, then test it
A fair six-sided die has six equally likely faces. Calculate the theoretical probability of rolling a 5 or a 6. Then simulate 100,000 rolls with rng = np.random.default_rng(0) and compare the empirical fraction to your theoretical answer.
Hint
Two faces out of six are favorable, so the theoretical probability is . Simulate with rolls = rng.integers(1, 7, size=100000) (the upper bound 7 is excluded, so faces run 1–6) and compute (rolls >= 5).mean(). You should land near 0.3326 — close to the theoretical value, exactly as the law of large numbers predicts.
Exercise 3: See small samples mislead you
Flip a fair coin only 12 times using rng = np.random.default_rng(2) and compute the proportion of heads. Then do the same with 5,000 flips. Which result do you trust more, and why?
Hint
Use rng.integers(0, 2, size=12).mean() for the short run and np.random.default_rng(2).integers(0, 2, size=5000).mean() for the long one. The short run can drift well away from 0.5, while the long run will sit close to it — the larger sample is the more reliable estimate.
Summary
You learned that a probability is a number between 0 and 1 measuring how likely an event is, and that there are two ways to find one. An empirical probability is a relative frequency you count from observed data — like estimated from 344 penguins. A theoretical probability is a calculation of favorable outcomes over total outcomes, valid when every outcome is equally likely — like for a fair coin. The law of large numbers unites them: as trials pile up, the empirical fraction converges on the theoretical value, which is also why estimates from only a handful of trials cannot be trusted.
Key Concepts
- Probability — a number in measuring how likely an event is; is impossible, is certain.
- Empirical probability — an estimate from data, equal to the relative frequency (count of the event divided by total observations).
- Theoretical probability — favorable outcomes divided by total outcomes, valid only when outcomes are equally likely.
- Equally likely outcomes — outcomes with the same chance of occurring, the condition theoretical probability depends on.
- Law of large numbers — as the number of trials grows, the empirical probability converges to the theoretical probability.
Why This Matters
Almost every probability you will use in real work is empirical — churn rates, click-through rates, disease prevalence, model confidence — because the outcomes are rarely equally likely and the true value is unknown. Knowing that those estimates are relative frequencies, that they get more reliable with more data, and that a small sample can lie to you is the difference between a number you can build a decision on and one that just happens to sound precise.
Next Steps
Continue to Lesson 2 - Probability Rules
Combine events with the addition rule, multiplication rule, and complements to compute probabilities of compound events.
Back to Module Overview
Return to the Probability Fundamentals module overview
Continue Building Your Skills
You can now put a trustworthy number on a single event — by counting it in data or calculating it from equally likely outcomes — and you understand why more trials make that number more reliable. Next you will learn how to combine events: the rules that let you find the probability of “this or that” and “this and that,” and reason about what happens when one event depends on another.