Lesson 1 - Conditional Probability Fundamentals
Welcome to Conditional Probability
Most useful probabilities come with a condition attached. You rarely ask “what is the chance of rain?” in the abstract — you ask “what is the chance of rain given that the sky is grey?” The extra information changes the answer. Conditional probability is the tool for updating what you believe once you learn that something is true, and it is the single idea that everything in Bayesian reasoning is built on.
In this lesson you will make that idea concrete with a real dataset of penguins. You will condition on which island a penguin lives on, watch a probability move as you add that knowledge, and compute every number yourself in Python so the formula stops being abstract.
By the end of this lesson, you will be able to:
- Explain what means and read it as “the probability of A given B”
- Use the formula and compute it directly from data
- Show how conditioning on new information changes a probability
- Recognize that is generally not , and how conditioning relates to independence
You only need a little Python and pandas. Let’s begin.
What Conditional Probability Means
A plain probability measures how likely an event is across the whole population. A conditional probability, written , measures how likely is once you already know that another event happened. The vertical bar reads as the word “given”: is “the probability of given .”
The key mental move is this: conditioning on means you throw away every outcome where is false and ask your question only inside the survivors. You shrink the world down to the rows where holds, then look at how often happens there.
Load the dataset for this module — body measurements of penguins from three species near Palmer Station, Antarctica — and look at how species and island relate:
import pandas as pd
penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(pd.crosstab(penguins["island"], penguins["species"]))species Adelie Chinstrap Gentoo
island
Biscoe 44 0 124
Dream 56 68 0
Torgersen 52 0 0This table is the entire world we will reason about. Every cell is a real count of penguins. Notice already that the islands are not interchangeable: Gentoos appear only on Biscoe, Chinstraps only on Dream, and Torgersen is pure Adelie. That structure is exactly what conditioning will let us exploit.
Computing P(A | B) From the Data
The unconditional probability of drawing a Gentoo from all 344 penguins is just its share of the whole dataset:
print(round((penguins["species"] == "Gentoo").mean(), 4))0.3605So with no extra information, — a little over one in three. Now suppose someone tells you the penguin lives on Biscoe. Conditioning means we keep only the Biscoe rows and ask how many are Gentoo:
biscoe = penguins[penguins["island"] == "Biscoe"]
print(round((biscoe["species"] == "Gentoo").mean(), 4))0.7381The probability jumped from 0.3605 to 0.7381. That is : among Biscoe penguins, almost three in four are Gentoo. Learning the island more than doubled the chance. The boolean mask penguins["island"] == "Biscoe" did the conditioning — it discarded the other two islands — and .mean() of a True/False column gives the proportion that are True.
The formula behind the mask
What .mean() computed has a name. The definition of conditional probability is:
Here is the probability that both and happen (the “intersection”), and is the probability of the condition on its own. Dividing by is the mathematical version of shrinking the world down to . Let’s confirm it gives the same 0.7381:
p_both = ((penguins["species"] == "Gentoo") & (penguins["island"] == "Biscoe")).mean()
p_biscoe = (penguins["island"] == "Biscoe").mean()
print(round(p_both, 4), round(p_biscoe, 4), round(p_both / p_biscoe, 4))0.3605 0.4884 0.7381The intersection probability is 0.3605, the probability of Biscoe is 0.4884, and their ratio is 0.7381 — identical to the masked answer. Restricting to a subgroup and applying the formula are two views of the same operation.
Conditioning can pin a probability to certainty
Conditioning does not always nudge a probability gently. Sometimes it removes all doubt. Look at Torgersen:
torgersen = penguins[penguins["island"] == "Torgersen"]
print(round((torgersen["species"] == "Adelie").mean(), 4))1.0. Every penguin on Torgersen is an Adelie, so once you know the island is Torgersen, the species is certain. Compare that with Dream, a mixed island:
dream = penguins[penguins["island"] == "Dream"]
print(round((dream["species"] == "Adelie").mean(), 4))0.4516— Dream is split between Adelie and Chinstrap, so the same question gives a very different answer depending on which condition you impose. The figure below shows the full picture: how the species mix changes as you move from island to island.
Why we divide by P(B)
Dividing by rescales the probabilities inside so they add back up to 1. The raw share of Biscoe-and-Gentoo penguins in the whole dataset is only 0.3605, but inside the Biscoe world it has to compete only with other Biscoe penguins — so it grows to 0.7381. The denominator is what turns a slice of the whole into a probability within the subgroup.
P(A | B) Is Not P(B | A)
Here is the trap that conditional probability sets, and the reason it deserves a whole lesson before we reach Bayes’ theorem: the order of the condition matters. and are different questions, and they usually have different answers.
We already found . Now flip it — among Gentoos, what fraction live on Biscoe?
gentoo = penguins[penguins["species"] == "Gentoo"]
print(round((gentoo["island"] == "Biscoe").mean(), 4))1.0. Every Gentoo lives on Biscoe, so given a Gentoo you can be certain of the island. But the reverse, , is far from certain, because Biscoe also houses plenty of Adelies. Two conditional probabilities that look like mirror images — 1.0 versus 0.74 — describe completely different facts.
Confusing the two directions is one of the most common reasoning errors in statistics. “Most Gentoos are on Biscoe” does not mean “most Biscoe penguins are Gentoo.” Keeping these straight is the whole motivation for Bayes’ theorem, which is the formal recipe for converting one direction into the other. You will meet it in the next lessons.
Conditioning and Independence
Sometimes conditioning on changes nothing at all. When that happens, and are independent: knowing tells you nothing new about . The precise definition is that and are independent when
In words, the conditional probability equals the plain probability — the condition is irrelevant. Our penguin data is a study in the opposite: island and species are strongly dependent, which is exactly why conditioning moved the numbers so much.
p_gentoo = (penguins["species"] == "Gentoo").mean()
p_gentoo_given_biscoe = (penguins[penguins["island"] == "Biscoe"]["species"] == "Gentoo").mean()
p_gentoo_given_dream = (penguins[penguins["island"] == "Dream"]["species"] == "Gentoo").mean()
print(round(p_gentoo, 4), round(p_gentoo_given_biscoe, 4), round(p_gentoo_given_dream, 4))0.3605 0.7381 0.0, but conditioning on Biscoe pushes it up to 0.7381 and conditioning on Dream drops it to 0.0. Because , the two events are not independent — island carries real information about species. If they had been independent, all three numbers would have been the same, and conditioning would have been a waste of effort.
That contrast is worth holding onto: conditional probability is only powerful when events are dependent. The more a condition reshapes a probability, the more that condition is worth knowing.
Practice Exercises
Exercise 1: Condition on a different island
Compute — the probability that a Dream penguin is a Chinstrap. Then compare it to the unconditional . Did conditioning on Dream raise or lower the probability?
Hint
Restrict first with dream = penguins[penguins["island"] == "Dream"], then take (dream["species"] == "Chinstrap").mean(). Get the unconditional version from the full dataset with (penguins["species"] == "Chinstrap").mean().
Exercise 2: Verify the formula by hand
Confirm using the formula instead of a mask. Compute the intersection probability and separately, then divide.
Hint
The intersection is ((penguins["species"] == "Adelie") & (penguins["island"] == "Dream")).mean() and the condition is (penguins["island"] == "Dream").mean(). Dividing the first by the second should give 0.4516.
Exercise 3: Flip a conditional probability
You know . Now compute the reverse, , and explain in one sentence why the two are so different even though they involve the same two events.
Hint
For , restrict to Biscoe and take the share of Gentoos. The asymmetry comes from the denominators: there are more Biscoe penguins than Gentoo penguins, because Biscoe also holds Adelies.
Summary
You met conditional probability, the probability of an event once you know that another event is true, written and read “A given B.” Computing it means restricting your world to the rows where holds and asking how often happens there — which the formula captures exactly. You saw conditioning move from 0.36 up to 0.74 given Biscoe and pin to 1.0, you learned that is generally not , and you connected conditioning to independence, which holds exactly when the condition changes nothing.
Key Concepts
- Conditional probability — , the probability of given that is true.
- The formula — ; dividing by restricts the world to .
- Conditioning from data — keep only the rows where holds, then take the share where holds.
- Asymmetry — in general; the direction of the bar matters.
- Independence — and are independent exactly when , so the condition adds no information.
Why This Matters
Every model that updates a belief from evidence — spam filters, medical tests, recommendation engines, fraud detection — runs on conditional probability. Knowing how to condition correctly, and never confusing with , is what separates a trustworthy inference from a confident mistake. It is also the exact foundation Bayes’ theorem is built on, which is where this module heads next.
Next Steps
Continue to Lesson 2 - Conditional Probability Intermediate
Go deeper with the multiplication rule, chained conditions, and building the bridge to Bayes' theorem.
Back to Module Overview
Return to the Conditional Probability & Bayes module overview
Continue Building Your Skills
You now know how to update a probability the moment you learn a new fact — the quiet engine behind every system that reasons under uncertainty. Next you will chain conditions together and watch the path toward Bayes’ theorem take shape, turning “given B” from a single step into a full method for learning from evidence.