Lesson 3 - Solving Complex Probability Problems

Welcome to Solving Complex Probability Problems

In the last lesson you combined events with the addition rule — the tool for “or” questions. But most interesting probability questions are not about a single event; they are about a sequence of them. What is the chance a system stays up if three components must each survive? What is the chance you roll at least one six in four tries? These are “and” questions and “at least one” questions, and they need a different set of tools.

This lesson gives you those tools. You will learn the multiplication rule for combining events, the all-important distinction between independent and dependent events, a first look at conditional probability, and a clever shortcut — the complement — that turns hard “at least one” problems into easy ones. You will check every result against a simulation, so theory and experiment always agree.

By the end of this lesson, you will be able to:

  • Apply the multiplication rule to find the probability that several events all happen
  • Tell independent events from dependent ones, and prove dependence with real data
  • Read conditional probability P(BA) P(B \mid A) and use it inside the multiplication rule
  • Solve “at least one” problems with the complement trick

You will need pandas, numpy, and the ideas from the previous two lessons. Let’s begin.


The Multiplication Rule

The addition rule answers “what is the probability of A or B?” The multiplication rule answers a different question: “what is the probability of A and B — that both happen?” Its general form is:

P(AB)=P(A)P(BA) P(A \cap B) = P(A)\,P(B \mid A)

Read this as: the chance both happen equals the chance the first happens, times the chance the second happens given that the first already did. The symbol P(BA) P(B \mid A) is a conditional probability — the probability of B once you know A has occurred. We will come back to it shortly; for now, notice that the rule chains events together one step at a time.

A two-step die roll

Suppose you roll a fair die twice and want the probability of getting a six both times. The first roll is a six with probability 1/6 1/6 . The second roll is also a six with probability 1/6 1/6 — and crucially, the second roll does not care what the first one did. So:

P(sixsix)=16×16=1360.0278 P(\text{six} \cap \text{six}) = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36} \approx 0.0278

Let’s confirm that by rolling a million pairs of dice:

import numpy as np

rng = np.random.default_rng(0)
trials = 1_000_000
pairs = rng.integers(1, 7, size=(trials, 2))   # two rolls per trial
both_six = (pairs == 6).all(axis=1).mean()
print(round(both_six, 4))
0.028

The simulation lands on 0.028, right on top of the theoretical 1/36=0.0278 1/36 = 0.0278 . The multiplication rule works — but notice we did something quietly: we multiplied 1/6×1/6 1/6 \times 1/6 and ignored the conditional part. That shortcut is only allowed for a special kind of event.


Independent Events

Two events are independent when knowing that one happened tells you nothing about the other. The second die roll is independent of the first: the die has no memory. When events are independent, the conditional probability collapses to the plain probability, P(BA)=P(B) P(B \mid A) = P(B) , and the multiplication rule simplifies to:

P(AB)=P(A)P(B) P(A \cap B) = P(A)\,P(B)

Dice, coin flips, and freshly shuffled cards are the classic independent events. You can multiply their probabilities straight across, which is exactly what we did above. The same logic extends to longer chains — the probability of three sixes in a row is (1/6)3 (1/6)^3 , and four heads in a row is (1/2)4 (1/2)^4 . Independence is what lets you raise a probability to a power.

An urn of three orange and two blue balls drawn twice. Top row, with replacement: the drawn ball is returned so the second draw is still 3 out of 5 orange — independent. Bottom row, without replacement: the ball is kept, so the second draw drops to 2 out of 4 orange — dependent.
Drawing with replacement leaves the second-draw probabilities unchanged (independent), while drawing without replacement changes them — three-fifths becomes two-fourths — because the first draw alters what is left (dependent).

Independence is a strong claim

Independence is not the default — it is a property you have to justify. Physical setups like dice and coins are independent by design. Real-world events (rain today and rain tomorrow, two stocks in the same sector) usually are not. Multiplying probabilities of dependent events as if they were independent is one of the most common mistakes in applied probability.


Dependent Events

Two events are dependent when knowing one does change the probability of the other. To see dependence in real data, we will return to the penguins. Load the dataset and cross-tabulate species against island:

import pandas as pd

penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(pd.crosstab(penguins["species"], penguins["island"]))
island     Biscoe  Dream  Torgersen
species
Adelie         44     56         52
Chinstrap       0     68          0
Gentoo        124      0          0

Look at the Gentoo row: all 124 Gentoo penguins live on Biscoe, and none anywhere else. That single fact makes species and island dependent. Let’s prove it with numbers. First the individual probabilities:

n = len(penguins)
p_gentoo = (penguins["species"] == "Gentoo").mean()
p_biscoe = (penguins["island"] == "Biscoe").mean()
print(round(p_gentoo, 4), round(p_biscoe, 4))
0.3605 0.4884

So P(Gentoo)=0.3605 P(\text{Gentoo}) = 0.3605 and P(Biscoe)=0.4884 P(\text{Biscoe}) = 0.4884 . If species and island were independent, the multiplication rule would predict:

P(GentooBiscoe)=P(Gentoo)P(Biscoe)=0.3605×0.4884=0.176 P(\text{Gentoo} \cap \text{Biscoe}) = P(\text{Gentoo})\,P(\text{Biscoe}) = 0.3605 \times 0.4884 = 0.176

Now compute what the data actually shows:

p_both = ((penguins["species"] == "Gentoo") &
          (penguins["island"] == "Biscoe")).mean()
print(round(p_both, 4))
0.3605

The true joint probability is 0.3605, not 0.176. Because every Gentoo is on Biscoe, “Gentoo and Biscoe” is just “Gentoo” — they describe the same penguins. The two numbers disagree, and that disagreement is the definition of dependence:

P(GentooBiscoe)=0.36050.176=P(Gentoo)P(Biscoe) P(\text{Gentoo} \cap \text{Biscoe}) = 0.3605 \neq 0.176 = P(\text{Gentoo})\,P(\text{Biscoe})

When the joint probability does not equal the product of the individual probabilities, the events are dependent — and you cannot multiply straight across.


Conditional Probability — a Preview

To handle dependent events correctly, you need the conditional probability P(BA) P(B \mid A) — “the probability of B given A.” Here it is for the penguins: given that a penguin is a Gentoo, what is the chance it lives on Biscoe? We already know the answer must be 1, because every Gentoo is there. The formula confirms it:

P(BiscoeGentoo)=P(BiscoeGentoo)P(Gentoo) P(\text{Biscoe} \mid \text{Gentoo}) = \frac{P(\text{Biscoe} \cap \text{Gentoo})}{P(\text{Gentoo})}
p_biscoe_given_gentoo = p_both / p_gentoo
print(round(p_biscoe_given_gentoo, 4))
1.0

P(BiscoeGentoo)=1.0 P(\text{Biscoe} \mid \text{Gentoo}) = 1.0 . Knowing a penguin is a Gentoo turns the island from a 49% guess into a certainty — the clearest possible sign of dependence. Compare this with the independent case: for two dice, P(six on roll 2six on roll 1)=1/6 P(\text{six on roll 2} \mid \text{six on roll 1}) = 1/6 , unchanged, because the rolls do not influence each other.

Plug conditional probability back into the general multiplication rule and it always works, dependent or not:

P(AB)=P(A)P(BA) P(A \cap B) = P(A)\,P(B \mid A)

For the penguins: P(Gentoo)×P(BiscoeGentoo)=0.3605×1.0=0.3605 P(\text{Gentoo}) \times P(\text{Biscoe} \mid \text{Gentoo}) = 0.3605 \times 1.0 = 0.3605 , matching the data exactly. Conditional probability is a deep topic, and the next module is devoted to it. For now, just hold onto the notation P(BA) P(B \mid A) and the idea that it is the engine inside the multiplication rule.


The Complement Trick for “At Least One”

Some of the most common probability questions ask for “at least one” — at least one defective part, at least one rainy day, at least one six. Computing these directly is painful, because “at least one” hides many cases: exactly one, exactly two, exactly three, and so on. The shortcut is to flip the question. The complement of “at least one” is “none,” and the two probabilities must add to 1:

P(at least one)=1P(none) P(\text{at least one}) = 1 - P(\text{none})
Four dice in a row, each shaded to mark the 'no six' outcome with probability five-sixths, multiplied together. Above them the equation P(at least one six) = 1 minus P(no six), and below, P(no six) equals five-sixths to the fourth power, about 0.48.
The complement trick: instead of adding up every way to get at least one six, shade the single easy "no six" outcome on each roll, multiply the five-sixths together, and subtract from 1.

“None” is easy: if the events are independent, it is a single product. Take the classic problem — roll a fair die four times, what is the probability of at least one six? The chance of no six on a single roll is 5/6 5/6 . Across four independent rolls:

P(no six in 4 rolls)=(56)40.4823 P(\text{no six in 4 rolls}) = \left(\frac{5}{6}\right)^4 \approx 0.4823 P(at least one six)=1(56)40.5177 P(\text{at least one six}) = 1 - \left(\frac{5}{6}\right)^4 \approx 0.5177

So it is slightly better than a coin flip — 51.77%. Let’s confirm with a simulation of a million four-roll trials:

rng = np.random.default_rng(0)
trials = 1_000_000
rolls = rng.integers(1, 7, size=(trials, 4))   # 4 rolls per trial
at_least_one = (rolls == 6).any(axis=1).mean()
print(round(at_least_one, 4))
0.518

The simulation gives 0.518, matching the theoretical 0.5177 to three decimals. The complement turned a four-case sum into one quick subtraction. The figure below shows the agreement across one through four rolls:

Grouped bar chart comparing the theoretical and simulated probability of rolling at least one six in one, two, three, and four die rolls, rising from about 0.167 to 0.518.
Theory and a million-trial simulation agree at every step: the chance of at least one six climbs from 0.167 with a single roll to 0.518 with four rolls.

When in doubt, flip it

Any time a problem says “at least one,” your first instinct should be to compute 1P(none) 1 - P(\text{none}) . It is almost always faster than adding up the “exactly one,” “exactly two,” … cases, and it is far less error-prone.


Tree Diagrams for Multi-Step Problems

When a problem unfolds in stages, a tree diagram keeps the bookkeeping straight. Each stage branches into its possible outcomes, you write the probability on each branch, and you multiply along a path to get the probability of that full sequence. Paths that lead to the same event are then added together.

Picture rolling a die twice and asking for at least one six. The first roll branches into “six” (1/6) (1/6) and “not six” (5/6) (5/6) ; each of those branches again for the second roll. The only path with no six is “not six” then “not six,” with probability 5/6×5/6=25/36 5/6 \times 5/6 = 25/36 . Every other path contains a six, so:

P(at least one six in 2 rolls)=12536=11360.3056 P(\text{at least one six in 2 rolls}) = 1 - \frac{25}{36} = \frac{11}{36} \approx 0.3056

This is the same complement logic, now drawn out as a tree. The two habits — multiply along a path, add across paths — are all you need to break almost any multi-step probability problem into pieces you can compute.


Practice Exercises

Exercise 1: Three independent flips

A fair coin is flipped three times. Use the multiplication rule to find the probability of getting heads all three times, then confirm it by simulating one million sets of three flips. Do the theory and the simulation agree?

Hint

The flips are independent, so multiply: (1/2)3 (1/2)^3 . Simulate with flips = rng.integers(0, 2, size=(1_000_000, 3)) and compute (flips == 1).all(axis=1).mean().

Exercise 2: Are species and sex dependent?

Using the penguins dataset, check whether species and sex are independent. Compute P(Gentoo) P(\text{Gentoo}) , P(FEMALE) P(\text{FEMALE}) , and P(GentooFEMALE) P(\text{Gentoo} \cap \text{FEMALE}) , then compare the joint probability to the product of the two individual probabilities. Are they close?

Hint

Drop missing values first with clean = penguins.dropna(subset=["sex"]). If P(GentooFEMALE)P(Gentoo)P(FEMALE) P(\text{Gentoo} \cap \text{FEMALE}) \approx P(\text{Gentoo})\,P(\text{FEMALE}) , the two are (approximately) independent; a clear gap signals dependence.

Exercise 3: At least one defective

A machine produces parts that are defective 2% of the time, independently. What is the probability that a box of 20 parts contains at least one defective part? Solve it with the complement, then verify by simulation.

Hint

The answer is 10.9820 1 - 0.98^{20} . To simulate, draw rng.random(size=(1_000_000, 20)) < 0.02 and check .any(axis=1).mean().


Summary

You learned to combine events into multi-step problems. The multiplication rule, P(AB)=P(A)P(BA) P(A \cap B) = P(A)\,P(B \mid A) , finds the chance several events all happen. When events are independent — like dice or coin flips — it simplifies to P(AB)=P(A)P(B) P(A \cap B) = P(A)\,P(B) , so you can multiply straight across. When they are dependent — like penguin species and island, where every Gentoo lives on Biscoe — that shortcut fails, and you need the conditional probability P(BA) P(B \mid A) to get the right answer. Finally, the complement trick turns hard “at least one” problems into a single subtraction, 1P(none) 1 - P(\text{none}) , as in the 0.5177 chance of at least one six in four rolls. Every result here was confirmed by simulation.

Key Concepts

  • Multiplication ruleP(AB)=P(A)P(BA) P(A \cap B) = P(A)\,P(B \mid A) ; the probability that several events all occur.
  • Independent events — events where one tells you nothing about the other; then P(AB)=P(A)P(B) P(A \cap B) = P(A)\,P(B) .
  • Dependent events — events where knowing one changes the other’s probability; the joint probability differs from the product of the individual ones.
  • Conditional probabilityP(BA) P(B \mid A) , the probability of B given that A has occurred.
  • Complement trickP(at least one)=1P(none) P(\text{at least one}) = 1 - P(\text{none}) .
  • Tree diagram — a branching picture; multiply along a path, add across paths.

Why This Matters

Almost every real probability question is a chain of events, not a single one — a model pipeline succeeding only if each stage does, a system surviving only if no component fails. Knowing when you may multiply probabilities (independence) and when you may not (dependence), and reaching for the complement on “at least one” questions, is what lets you compute the odds correctly instead of guessing. These same moves underpin reliability engineering, A/B testing, and the likelihoods inside every probabilistic model.


Next Steps

Continue to Lesson 4 - Permutations and Combinations

Learn to count outcomes so you can solve probability problems that are too large to list by hand.

Back to Module Overview

Return to the Probability Fundamentals module overview


Continue Building Your Skills

You can now chain events together, separate independence from dependence, and flip “at least one” into an easy complement. There is one tool left that makes large probability problems tractable: counting. Next you will learn permutations and combinations — the art of counting outcomes you could never list by hand — and finally compute the genuinely tiny odds of winning a lottery.