Lesson 2 - Conditional Probability: Intermediate

Welcome to Conditional Probability, Intermediate

In the last lesson you learned what conditional probability means: how knowing that one event happened reshapes the probability of another. This lesson turns that idea into machinery you can actually compute with. You will chain conditional probabilities through multi-stage problems, use them to decide whether two events are truly independent, and finish with the law of total probability — the rule that stitches a handful of conditional probabilities back into one unconditional answer.

That last rule is more than a curiosity. It is exactly the denominator that Bayes’ theorem needs in the next lesson, so the work you do here is the foundation for everything that follows. We will keep using the Palmer penguins, and add a deck of cards when we need a clean second example.

By the end of this lesson, you will be able to:

  • Apply the general multiplication rule P(AB)=P(B)P(AB) P(A \cap B) = P(B)\,P(A \mid B) and chain it through a probability tree
  • Solve multi-stage problems like drawing two cards without replacement
  • Decide whether two events are independent using conditional probability
  • Use the law of total probability to recover an unconditional probability from a partition

You only need a little Python and pandas. Let’s begin.


The General Multiplication Rule

Last lesson defined conditional probability as

P(AB)=P(AB)P(B) P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Multiply both sides by P(B) P(B) and you get a rule for the probability that both events happen — the general multiplication rule:

P(AB)=P(B)P(AB) P(A \cap B) = P(B)\,P(A \mid B)

In words: the chance of A A and B B is the chance of B B , times the chance of A A once B B is known. It works in either order, so P(AB)=P(A)P(BA) P(A \cap B) = P(A)\,P(B \mid A) too — whichever conditional you happen to know.

Let’s see it on the penguins. Load the data and treat all 344 birds as our population.

import pandas as pd

penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(pd.crosstab(penguins["island"], penguins["species"], margins=True))
species    Adelie  Chinstrap  Gentoo  All
island
Biscoe         44          0     124  168
Dream          56         68       0  124
Torgersen      52          0       0   52
All           152         68     124  344

Pick a penguin at random. What is the probability it is on Biscoe and an Adelie? We can read it straight from the table — 44 of 344 birds are both — but let’s instead build it from a condition, the way the multiplication rule does:

total = len(penguins)
p_biscoe = (penguins["island"] == "Biscoe").mean()

biscoe = penguins[penguins["island"] == "Biscoe"]
p_adelie_given_biscoe = (biscoe["species"] == "Adelie").mean()

p_both = p_biscoe * p_adelie_given_biscoe
print(round(p_biscoe, 4), round(p_adelie_given_biscoe, 4), round(p_both, 4))
0.4884 0.2619 0.1279

So P(Biscoe)=0.4884 P(\text{Biscoe}) = 0.4884 , and given you are on Biscoe the chance of Adelie is 0.2619 0.2619 . Multiply them and you get 0.1279 0.1279 — which is exactly 44/344 44/344 . The rule reconstructs the joint probability from one marginal and one conditional, and that is the move every tree diagram is built on.


Chaining Probabilities with a Tree

A probability tree is the multiplication rule drawn out for a process that happens in stages. Each branch carries a probability, and the chance of any complete path is the product of the branches along it. Stage one here is picking an island; stage two is the species you find there.

A probability tree for picking a random penguin. The first set of branches splits on island with probabilities 0.488 for Biscoe, 0.361 for Dream, and 0.151 for Torgersen. From each island a branch leads to Adelie with the conditional probability of Adelie given that island: 0.262 from Biscoe, 0.452 from Dream, and 1.000 from Torgersen.
A two-stage tree: pick an island (blue branches, weighted by how common each island is), then read off the chance the bird is an Adelie given that island (grey branches). Multiplying along a path gives the probability of that island-and-Adelie combination.

Notice the branch values. The first split uses P(island) P(\text{island}) — how common each island is — and the second uses P(Adelieisland) P(\text{Adelie} \mid \text{island}) . The Torgersen branch is 1.000 1.000 because every Torgersen penguin in this dataset happens to be an Adelie. Walking the Biscoe path, 0.488×0.262=0.128 0.488 \times 0.262 = 0.128 , reproduces the joint probability we just computed by hand.

Two cards without replacement

Trees shine when the stages genuinely depend on each other. Draw two cards from a shuffled deck without replacement and ask for the probability that both are hearts. There are 13 hearts in 52 cards, so the first card is a heart with probability 13/52 13/52 . But now only 51 cards remain and one heart is gone, so the second card is a heart with probability 12/51 12/51 — a conditional probability, because it depends on the first draw.

p_first_heart = 13 / 52
p_second_heart_given_first = 12 / 51
p_both_hearts = p_first_heart * p_second_heart_given_first
print(round(p_first_heart, 4), round(p_second_heart_given_first, 4), round(p_both_hearts, 4))
0.25 0.2353 0.0588

The chance both cards are hearts is P(H1H2)=P(H1)P(H2H1)=0.25×0.2353=0.0588 P(H_1 \cap H_2) = P(H_1)\,P(H_2 \mid H_1) = 0.25 \times 0.2353 = 0.0588 , about 1 in 17. The “without replacement” detail is the whole point: removing the first heart changes the deck, so the second probability is conditional on the first. That dependence is exactly what a tree keeps track of.

Why the second branch shrinks

With replacement, you would put the first card back and the second draw would again be 13/52 13/52 — the draws would be independent. Without replacement, the population changes between stages, so each branch must use the conditional probability given everything that came before. That is the difference the next section makes precise.


Independence Revisited

Two events are independent when knowing one happened tells you nothing about the other. We can now state that cleanly with conditional probability: A A and B B are independent when

P(AB)=P(A) P(A \mid B) = P(A)

The condition simply does not move the probability. When that holds, the multiplication rule collapses to its simplest form, because P(AB)=P(B)P(AB)=P(B)P(A) P(A \cap B) = P(B)\,P(A \mid B) = P(B)\,P(A) .

Are island and species independent for our penguins? Compare P(AdelieBiscoe) P(\text{Adelie} \mid \text{Biscoe}) with the plain P(Adelie) P(\text{Adelie}) :

p_adelie = (penguins["species"] == "Adelie").mean()
print("P(Adelie)            =", round(p_adelie, 4))
print("P(Adelie | Biscoe)   =", round(p_adelie_given_biscoe, 4))
P(Adelie)            = 0.4419
P(Adelie | Biscoe)   = 0.2619

These are far apart — 0.4419 0.4419 versus 0.2619 0.2619 — so island and species are not independent. Learning that a penguin lives on Biscoe genuinely lowers your belief that it is an Adelie (Biscoe is dominated by Gentoos). That dependence is not a nuisance; it is information, and Bayes’ theorem will be the tool for cashing it in.

For contrast, the card draws with replacement would be independent: P(H2H1)=13/52=P(H2) P(H_2 \mid H_1) = 13/52 = P(H_2) , because putting the card back leaves the deck exactly as it was. Independence is the special case where conditioning changes nothing.


The Law of Total Probability

Here is the rule that ties the lesson together. Suppose you split the population into non-overlapping, exhaustive pieces — a partition. Our three islands do exactly this: every penguin lives on exactly one of Biscoe, Dream, or Torgersen. The law of total probability says you can find the overall probability of an event by computing it within each piece and combining:

P(A)=iP(ABi)P(Bi) P(A) = \sum_i P(A \mid B_i)\,P(B_i)

Each term P(ABi)P(Bi) P(A \mid B_i)\,P(B_i) is one path through the tree (a multiplication-rule joint probability), and summing over all the pieces covers every way A A can happen.

A sample-space rectangle split into three non-overlapping blocks B1, B2, B3, with event A drawn as a band cutting across all three; its slice in each block is shaded as A-and-B1, A-and-B2, A-and-B3.
The partition splits the sample space into non-overlapping blocks, and P(A) is the sum of A's shaded slice in each one, weighted by how common that block is.

Let’s recover P(Adelie) P(\text{Adelie}) by conditioning on island:

counts = penguins["island"].value_counts()
p_island = counts / total

p_adelie_given = {}
for isl in ["Biscoe", "Dream", "Torgersen"]:
    sub = penguins[penguins["island"] == isl]
    p_adelie_given[isl] = (sub["species"] == "Adelie").mean()

total_prob = sum(p_adelie_given[isl] * p_island[isl] for isl in p_island.index)
print({k: round(v, 4) for k, v in p_adelie_given.items()})
print("Law of total probability P(Adelie) =", round(total_prob, 4))
print("Unconditional        P(Adelie) =", round(p_adelie, 4))
{'Biscoe': 0.2619, 'Dream': 0.4516, 'Torgersen': 1.0}
Law of total probability P(Adelie) = 0.4419
Unconditional        P(Adelie) = 0.4419

Written out, that sum is

P(Adelie)=44168168344+56124124344+525252344=152344=0.4419 P(\text{Adelie}) = \frac{44}{168}\cdot\frac{168}{344} + \frac{56}{124}\cdot\frac{124}{344} + \frac{52}{52}\cdot\frac{52}{344} = \frac{152}{344} = 0.4419

The pieces — 0.2619 0.2619 , 0.4516 0.4516 , and 1.0000 1.0000 — are wildly different, yet weighted by how common each island is, they average back to the single unconditional value 0.4419 0.4419 . That is the intuition worth keeping: total probability is a weighted average of conditional probabilities, weighted by the probability of each condition. Islands where Adelies are rare pull down, the all-Adelie island pulls up, and the weights settle the balance.

Why build up the answer you already had? Because conditioning is often the only way in. When you cannot see P(A) P(A) directly but you can reason within each subgroup, the law of total probability assembles the whole from the parts. And the denominator of Bayes’ theorem — coming next lesson — is precisely a law-of-total-probability sum. Master this, and Bayes is mostly bookkeeping.

What makes a valid partition

The pieces Bi B_i must be mutually exclusive (no overlap) and exhaustive (they cover everything), and each must have nonzero probability. Island works because every penguin sits in exactly one island. If your groups overlap or leave gaps, the weights will not sum to 1 and the law breaks.


Practice Exercises

Exercise 1: Chain a joint probability

Using the multiplication rule (not the raw count), compute P(DreamChinstrap) P(\text{Dream} \cap \text{Chinstrap}) — the probability a random penguin is on Dream and a Chinstrap. Then check your answer against the count in the crosstab.

Hint

Compute p_dream = (penguins["island"] == "Dream").mean(), then the conditional p_chinstrap_given_dream on the penguins[penguins["island"] == "Dream"] subset, and multiply. It should equal 68/344 68/344 .

Exercise 2: Draw two aces

Without replacement, what is the probability that the first two cards off a shuffled deck are both aces? Build it as a two-branch tree, then say in one sentence why the second branch is not 4/52 4/52 .

Hint

The first ace has probability 4/52 4/52 . After removing it, 3 aces remain in 51 cards, so the second branch is 3/51 3/51 . Multiply: (4/52)×(3/51)0.0045 (4/52)\times(3/51) \approx 0.0045 .

Exercise 3: Total probability for Gentoo

Use the law of total probability, conditioning on island, to recover P(Gentoo) P(\text{Gentoo}) . Print each P(Gentooisland) P(\text{Gentoo} \mid \text{island}) and confirm the weighted sum matches the unconditional P(Gentoo) P(\text{Gentoo}) . Which island dominates the answer, and why?

Hint

Reuse the Exercise loop but test (sub["species"] == "Gentoo"). Only Biscoe has Gentoos, so its term (124/168)×(168/344) (124/168)\times(168/344) carries almost the whole sum; the other two contribute zero. The total should come to 0.3605 0.3605 .


Summary

You turned conditional probability into a working toolkit. The general multiplication rule P(AB)=P(B)P(AB) P(A \cap B) = P(B)\,P(A \mid B) builds a joint probability from a marginal and a conditional, and chaining it along a probability tree solves multi-stage problems like drawing cards without replacement. You sharpened independence into a single test — P(AB)=P(A) P(A \mid B) = P(A) — and saw that island and species fail it, which means the condition carries real information. Finally, the law of total probability reassembled the unconditional P(Adelie)=0.4419 P(\text{Adelie}) = 0.4419 from its three island pieces, revealing total probability as a weighted average of conditional probabilities — and quietly setting up the denominator that Bayes’ theorem will demand next.

Key Concepts

  • General multiplication ruleP(AB)=P(B)P(AB) P(A \cap B) = P(B)\,P(A \mid B) ; the joint probability of two events from a marginal and a conditional.
  • Probability tree — a stage-by-stage diagram where each path’s probability is the product of its branch probabilities.
  • Without replacement — sampling that changes the population between draws, making later draws conditional on earlier ones.
  • Independence — events A A and B B with P(AB)=P(A) P(A \mid B) = P(A) ; conditioning changes nothing.
  • Partition — mutually exclusive, exhaustive subgroups that cover the whole population.
  • Law of total probabilityP(A)=iP(ABi)P(Bi) P(A) = \sum_i P(A \mid B_i)\,P(B_i) ; a weighted average of conditional probabilities across a partition.

Why This Matters

Almost every real probability you will estimate is hidden inside subgroups — spam rates differ by sender, churn differs by plan, defect rates differ by supplier. The multiplication rule and the law of total probability are how you reason across those subgroups instead of pretending they do not exist, and the same weighted-average logic is the engine inside Bayes’ theorem, Naive Bayes classifiers, and the risk calculations behind medical testing and fraud detection.


Next Steps

Continue to Lesson 3 - Bayes' Theorem

Flip a conditional probability around and learn the rule for updating a belief when new evidence arrives.

Back to Module Overview

Return to the Conditional Probability & Bayes module overview


Continue Building Your Skills

You can now move smoothly between conditional and unconditional probabilities — splitting a problem into subgroups and weaving the pieces back together. Next you will run that machinery in reverse: given that an event happened, what is the probability of the cause behind it? That question is Bayes’ theorem, and you already built its hardest part today.