Lesson 6 - Guided Project: Telling the Species Apart
On this page
Welcome to the Guided Project
You’re handed penguin body measurements with the species labels hidden. Using only the descriptive-statistics tools from this module — distributions, summaries, and comparisons — how well can you tell the three species apart?
That is the whole project. No machine learning, no fancy models — just the habits you have been building: look at the data, summarize each group, compare distributions, and turn what you see into a simple rule. Then you will do the honest thing most tutorials skip: measure exactly how often your rule is right, and look hard at where it fails.
By the end of this lesson, you will be able to:
- Profile an unfamiliar dataset and summarize each measurement by group
- Read distributions to find which variables separate groups and which do not
- Translate a visual pattern into a written rule-of-thumb classifier
- Measure a rule’s real accuracy and diagnose its errors honestly
You only need pandas, a little numpy, and the descriptive statistics from the earlier lessons. Let’s begin.
Step 1: Load and Inspect the Data
Start by loading the penguins and taking the measure of what you have.
import pandas as pd
penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(penguins.shape)
print(penguins["species"].value_counts())(344, 7)
species
Adelie 152
Gentoo 124
Chinstrap 68
Name: count, dtype: int64There are 344 penguins across three species — Adelie, Gentoo, and Chinstrap — and the classes are uneven, with more than twice as many Adelie as Chinstrap. That imbalance matters later: a lazy rule that simply guesses “Adelie” every time would already be right 44% of the time, so any rule we build has to beat that bar by a wide margin.
Four numeric measurements describe each bird: bill_length_mm, bill_depth_mm, flipper_length_mm, and body_mass_g. A few are missing:
print(penguins[["bill_length_mm", "flipper_length_mm", "body_mass_g"]].isna().sum())bill_length_mm 2
flipper_length_mm 2
body_mass_g 2
dtype: int64Only two rows per measurement are missing — a couple of birds that were never fully measured. We will drop those rows before classifying so every penguin we judge actually has the numbers our rule needs.
The labels aren’t really hidden
We keep the species column the whole time — we just promise not to use it while building the rule. Treating it as the hidden answer key lets us do something you rarely can in the real world: check our work against the truth and compute an exact accuracy.
Step 2: Summarize Each Measurement by Species
The single most useful table in this project is the mean and standard deviation of every measurement, split by species. groupby plus agg builds it in one line:
summary = (penguins.groupby("species")[
["bill_length_mm", "flipper_length_mm", "body_mass_g"]]
.agg(["mean", "std"]).round(1))
print(summary) bill_length_mm flipper_length_mm body_mass_g
mean std mean std mean std
species
Adelie 38.8 2.7 190.0 6.5 3700.7 458.6
Chinstrap 48.8 3.3 195.8 7.1 3733.1 384.3
Gentoo 47.5 3.1 217.2 6.5 5076.0 504.1Read this table the way the rest of the project depends on. Three patterns jump out:
- Gentoo is the giant. Its flippers average 217.2 mm versus ~190–196 mm for the others, and it is over 1,300 g heavier. On size alone, Gentoo stands apart.
- Adelie has a short bill. At 38.8 mm, its bill is a full 10 mm shorter than the other two. That is the cleanest single gap in the table.
- Chinstrap and Gentoo have nearly identical bill lengths (48.8 vs 47.5 mm) but completely different bodies — so bill length tells Adelie from Chinstrap, while size tells Gentoo from Adelie.
The standard deviations are the second half of the story. A 6.5 mm spread on flipper length is small next to the 27 mm gap between Adelie and Gentoo means — those groups barely overlap. But Adelie and Chinstrap bill lengths sit only 10 mm apart with spreads of ~3 mm each, so their tails will mingle. Already we can predict where a simple rule will struggle.
Step 3: Visualize Which Measurements Separate the Species
A summary table hints at separation; a distribution shows it. Plot flipper length as one histogram per species and the picture becomes obvious.
Gentoo lives almost entirely to the right; Adelie and Chinstrap pile up together on the left. A single vertical cut around 206 mm would peel off nearly every Gentoo while catching almost no one else. That is our first rule.
But flipper length cannot tell Adelie from Chinstrap — they share the same hump. For that we need bill length:
Now Adelie sits clearly on the left with a short bill, and a cut near 43 mm divides it from the longer-billed Chinstrap and Gentoo. Put both measurements on one scatter plot and you can see all three groups pulled into their own corners:
The scatter plot is the blueprint for our classifier. The horizontal cut at 206 mm separates the high-flippered Gentoo on top. Below that line, the vertical cut at 43 mm separates short-billed Adelie on the left from long-billed Chinstrap on the right. Two cuts, three regions.
Step 4: Build a Rule-of-Thumb Classifier
We can now write the rule the plots described, in plain order:
- If
flipper_length_mmis greater than 206, call it Gentoo (it is large). - Otherwise, if
bill_length_mmis greater than 43, call it Chinstrap (long bill). - Otherwise, call it Adelie (short bill, smaller body).
The thresholds are not guesses — they come straight from the distributions in Step 3, sitting in the gaps between the species. Here it is in code. First drop the rows with missing measurements, then apply the rule to every penguin:
clean = penguins.dropna(subset=["bill_length_mm", "flipper_length_mm"]).copy()
def classify(row):
if row["flipper_length_mm"] > 206:
return "Gentoo"
elif row["bill_length_mm"] > 43:
return "Chinstrap"
else:
return "Adelie"
clean["predicted"] = clean.apply(classify, axis=1)
print(clean["predicted"].value_counts())predicted
Adelie 146
Gentoo 129
Chinstrap 67The rule predicts 146 Adelie, 129 Gentoo, and 67 Chinstrap. Compare that to the true counts (152, 124, 68) and the totals are already close — a good sign. But matching the totals is not the same as getting each individual bird right, so we have to check properly.
Order matters in a rule chain
Because the checks run top to bottom, the flipper test gets first say. A long-billed Gentoo is caught by rule 1 before the bill rule can mislabel it Chinstrap. When you write a rule chain, put the cleanest, most decisive split first.
Step 5: Check How Often the Rule Is Right
The moment of truth. Accuracy is the share of penguins whose predicted species matches their real one:
Because we kept the species column as our hidden answer key, we can compute it directly:
accuracy = (clean["predicted"] == clean["species"]).mean()
print(round(accuracy * 100, 1))94.4The rule is right 94.4% of the time — 323 of 342 penguins — using nothing but two thresholds read off a couple of histograms. Against the 44% you would get by always guessing Adelie, that is a genuine result. But an overall number can hide trouble in one group, so break it down by species:
clean["correct"] = clean["predicted"] == clean["species"]
per_species = (clean.groupby("species")["correct"].mean() * 100).round(1)
print(per_species)species
Adelie 94.0
Chinstrap 86.8
Gentoo 99.2
Name: correct, dtype: float64The accuracy is not spread evenly. Gentoo is nearly perfect (99.2%) — exactly what the well-separated flipper histogram promised. Adelie is strong (94.0%), but Chinstrap lags at 86.8%. To see which mistakes the rule makes, build a confusion table — actual species down the rows, predicted across the columns:
print(pd.crosstab(clean["species"], clean["predicted"]))predicted Adelie Chinstrap Gentoo
species
Adelie 142 7 2
Chinstrap 4 59 5
Gentoo 0 1 122Every off-diagonal cell is an error. Reading them tells the whole story of where the rule breaks:
- Adelie ↔ Chinstrap confusion (7 + 4 = 11 errors). This is exactly the overlap we predicted in Step 2. A handful of Adelie have unusually long bills (over 43 mm) and get called Chinstrap; a few Chinstrap have unusually short bills and get called Adelie. Their bill-length distributions touch, so no single cut can separate them cleanly.
- Chinstrap → Gentoo (5 errors). Five Chinstrap have long flippers (over 206 mm) and trip the Gentoo rule. The Chinstrap flipper distribution has a long upper tail that reaches into Gentoo territory.
- Gentoo almost never misses (1 error). Its size makes it the easiest species to call.
The lesson here is honest and important: the rule is only as good as the gaps between the groups. Where distributions are well separated (Gentoo’s size), the rule is nearly flawless. Where they overlap (Adelie vs Chinstrap bills), no threshold can be perfect, and the errors cluster exactly where the histograms told us they would.
Take It Further
The rule is good, but you can push it further with the same descriptive tools:
- Tune the thresholds. Try shifting the bill cut between 42 and 45 mm and recompute the accuracy. Does nudging it help Chinstrap at Adelie’s expense, or the reverse? Use the confusion table to decide what “better” even means here.
- Add a third measurement.
body_mass_gseparates Gentoo even more sharply than flipper length. Add a mass check to the Gentoo rule and see whether it rescues any of the misread Chinstrap. - Quantify the overlap. For Adelie and Chinstrap bill length, compute how many Adelie fall above 43 mm and how many Chinstrap fall below it. That count is the floor on this rule’s error — the mistakes no single cut can avoid.
- Stratify your check. Compute accuracy separately by
sexorisland. Does the rule work as well for females as males? Uneven accuracy across subgroups is something every honest analysis should report.
Summary
You told three penguin species apart using only descriptive statistics. You profiled the data, summarized every measurement by species, and read the distributions to find which variables separate the groups — flipper length isolates the large Gentoo, bill length splits short-billed Adelie from long-billed Chinstrap. You turned those gaps into a two-threshold rule, applied it to all 342 measured birds, and measured an honest 94.4% accuracy — then used a confusion table to pin the errors on exactly the overlap your summaries had predicted.
Key Concepts
- Group summaries —
groupbywith mean and standard deviation reveal which measurements separate groups and which overlap. - Separation vs overlap — a variable is useful for classification only where its group distributions don’t blend together.
- Rule-of-thumb classifier — an ordered chain of threshold checks read directly off the distributions.
- Accuracy — the share of cases predicted correctly; compare it against a naive baseline.
- Confusion table — a cross-tabulation of actual versus predicted that shows where a rule fails, not just how often.
Why This Matters
This is descriptive statistics doing real work. Long before anyone reaches for a model, the questions that decide a project are the ones you just answered: which variables actually carry signal, where do groups overlap, and how good is “good enough”? A rule you can read in one sentence, whose every error you can explain, is often more trustworthy than a black box — and the habit of measuring accuracy honestly, then staring at the mistakes, is what separates careful analysis from wishful thinking.
Next Steps
Continue to Module 2 - Measures of Center & Variability (next in the course)
Go deeper on the mean, median, standard deviation, and the summaries you leaned on throughout this project.
Back to Module Overview
Return to the Statistics Fundamentals module overview
Continue Building Your Skills
You just turned a pile of measurements into a working classifier without a single model — only the descriptive tools from this module and the discipline to check your own work. That instinct, to summarize first, look for the gaps, and always measure how often you’re right, will serve you in every analysis you ever do. Onward to measures of center and variability, where these summaries get a proper foundation.