Lesson 4 - Chi-Squared Test of Independence
On this page
- Welcome to the Chi-Squared Test of Independence
- Two Categorical Variables
- The Contingency Table
- Expected Counts Under Independence
- The Chi-Squared Statistic and Degrees of Freedom
- A Second Example: Cars
- Cautions: What the Test Does Not Tell You
- Practice Exercises
- Summary
- Next Steps
- Continue Building Your Skills
Welcome to the Chi-Squared Test of Independence
So far you have tested claims about numbers — means, proportions, differences. But a great deal of real data is categorical: a penguin’s species, a car’s country of origin, a customer’s plan tier. When you have two categorical variables, the natural question is whether they are related. Does species depend on which island a penguin lives on? Does engine size depend on where a car was built?
The chi-squared test of independence answers exactly that question. In this lesson you will count two categorical variables against each other, work out what their table would look like if they were unrelated, and measure how far reality strays from that expectation.
By the end of this lesson, you will be able to:
- Build a contingency table of two categorical variables with
pd.crosstab - Compute the expected counts that independence would predict
- Calculate the chi-squared statistic and its degrees of freedom by hand
- Run the test with
scipy.stats.chi2_contingencyand interpret the result honestly
You only need pandas and scipy. Let’s begin.
Two Categorical Variables
Every test you have met so far compared numbers. The chi-squared test of independence compares categories. It starts from a question of the form: are these two labels related, or do they vary independently of each other?
Load the penguins dataset and look at the two categorical columns we will study — species and island:
import pandas as pd
penguins = pd.read_csv("https://datatweets.com/datasets/penguins.csv")
print(penguins[["species", "island"]].head()) species island
0 Adelie Torgersen
1 Adelie Torgersen
2 Adelie Torgersen
3 Adelie Torgersen
4 Adelie TorgersenWe want to know whether species and island are independent. If they were, then knowing a penguin’s island would tell you nothing about its likely species — the species mix on every island would look the same. Let’s count and see.
The Contingency Table
A contingency table (also called a cross-tabulation) counts how often each combination of two categories occurs. pandas builds one in a single call:
observed = pd.crosstab(penguins["species"], penguins["island"])
print(observed)island Biscoe Dream Torgersen
species
Adelie 44 56 52
Chinstrap 0 68 0
Gentoo 124 0 0Each cell is an observed count — the number of penguins with that species and that island. Already the table tells a vivid story: Gentoo penguins appear only on Biscoe, and Chinstrap only on Dream. That is the opposite of independence; species is clearly tied to island. The heatmap below makes the pattern impossible to miss.
Margins: the row and column totals
The totals along the edges of the table — called marginal totals — are what the test uses to build its expectation. Add them with margins=True:
print(pd.crosstab(penguins["species"], penguins["island"], margins=True))island Biscoe Dream Torgersen All
species
Adelie 44 56 52 152
Chinstrap 0 68 0 68
Gentoo 124 0 0 124
All 168 124 52 344There are penguins in total. The row totals (152, 68, 124) give the overall species mix; the column totals (168, 124, 52) give the overall island mix. Hold on to these — independence is defined entirely by them.
Expected Counts Under Independence
The test asks a precise hypothetical question: if species and island were independent, how many penguins would we expect in each cell?
Under independence, the probability of landing in a cell is just the row probability times the column probability. Multiply that by and the row and column totals do all the work. For the cell in row and column :
where is that row’s total, is that column’s total, and is the grand total.
Working one cell by hand
Take Adelie on Biscoe. The Adelie row total is , the Biscoe column total is , and :
If species and island were independent, we would expect about 74 Adelie penguins on Biscoe. We actually observed only 44 — a big gap. The whole test is just adding up gaps like this one across every cell.
scipy computes the full table of expected counts for you:
from scipy.stats import chi2_contingency
chi2, p_value, dof, expected = chi2_contingency(observed)
print(pd.DataFrame(expected, index=observed.index,
columns=observed.columns).round(2))island Biscoe Dream Torgersen
species
Adelie 74.23 54.79 22.98
Chinstrap 33.21 24.51 10.28
Gentoo 60.56 44.70 18.74Notice every row of expected counts has the same island proportions — that is what “independent” means. Compare this to the observed table full of zeros, and you can feel how far reality is from the independence story.
The Chi-Squared Statistic and Degrees of Freedom
To turn those gaps into a single number, the chi-squared statistic sums the squared difference between observed and expected in each cell, scaled by the expected count:
Squaring makes every gap positive, and dividing by puts each cell on a fair footing — a gap of 30 matters far more where you expected 10 than where you expected 1000. A near 0 means observed and expected agree (consistent with independence); a large means they diverge.
One cell’s contribution
For Adelie on Biscoe, and :
That single cell already contributes 12.31. Summing this quantity over all nine cells gives the full statistic:
import numpy as np
contributions = (observed.values - expected) ** 2 / expected
print(round(contributions.sum(), 2))299.55Degrees of freedom
The degrees of freedom count how many cells are free to vary once the margins are fixed. For a table with rows and columns:
With 3 species and 3 islands that is . Degrees of freedom set the scale of the chi-squared distribution we compare against, so a of 10 means something very different on 1 degree of freedom than on 40.
Reading the p-value
chi2_contingency returns the statistic, the p-value, the degrees of freedom, and the expected table all at once:
chi2, p_value, dof, expected = chi2_contingency(observed)
print(f"chi-squared = {chi2:.1f}")
print(f"dof = {dof}")
print(f"p-value = {p_value:.2e}")chi-squared = 299.6
dof = 4
p-value = 1.35e-63The p-value is the probability of seeing a this large if species and island really were independent. Here it is — astronomically small. We reject the independence hypothesis with overwhelming confidence: species and island are strongly dependent. That matches what the raw table screamed at us, with Gentoo confined to Biscoe and Chinstrap to Dream.
A Second Example: Cars
The method works on any pair of categorical columns. Let’s ask whether a car’s engine size (number of cylinders) depends on its country of origin:
cars = pd.read_csv("https://datatweets.com/datasets/cars.csv")
origin_cyl = pd.crosstab(cars["origin"], cars["cylinders"])
print(origin_cyl)cylinders 3 4 5 6 8
origin
europe 0 63 3 4 0
japan 4 69 0 6 0
usa 0 72 0 74 103The pattern is just as stark as before: every 8-cylinder car in the dataset is from the USA, while European and Japanese cars cluster at 4 cylinders. Run the test:
chi2, p_value, dof, expected = chi2_contingency(origin_cyl)
print(f"chi-squared = {chi2:.1f}")
print(f"dof = {dof}")
print(f"p-value = {p_value:.2e}")chi-squared = 180.1
dof = 8
p-value = 9.80e-35Here the table is , so . The p-value of again leaves no doubt: engine size depends on origin. US cars skew heavily toward large 8-cylinder engines, a difference no amount of random chance could produce.
Cautions: What the Test Does Not Tell You
The chi-squared test is easy to run and easy to over-read. Keep three warnings in mind.
Significance is not strength
A tiny p-value says the variables are almost certainly related — it says nothing about how strongly. A barely-there association and an iron-clad one can both produce . To measure strength you need an effect-size measure such as Cramér’s V, not the p-value.
Large samples make trivial effects “significant”
Look back at the formula: the statistic grows with the counts. Feed the test enough rows and even a microscopic, practically meaningless deviation from independence will cross the significance threshold. With big data, statistically significant and important drift apart — always ask how large the effect actually is.
A relationship is not causation
The test detects that two variables move together, not why. Origin and cylinders are related, but origin does not magically stamp cylinders into an engine — design choices, regulations, and markets drive both. Chi-squared finds association; explaining it is your job, not the test’s.
One assumption to check
The chi-squared approximation is only reliable when expected counts are reasonably large — a common rule of thumb is that every expected cell should be at least 5. When several expected counts fall below that, the p-value can be misleading; use Fisher’s exact test instead. Always inspect the expected table that chi2_contingency returns.
Practice Exercises
Exercise 1: Sex and island
Build a contingency table of penguin sex against island, then run chi2_contingency. Is there evidence that sex depends on island? Does the result make biological sense?
Hint
Drop missing values first with clean = penguins.dropna(subset=["sex"]), then pd.crosstab(clean["sex"], clean["island"]). A large p-value means you cannot reject independence — which is what you would expect, since males and females share every island.
Exercise 2: Expected count by hand
Using the cars origin by cylinders table, compute the expected count for 4-cylinder USA cars by hand with , then confirm it against the expected array from chi2_contingency.
Hint
The USA row total and the 4-cylinder column total come from pd.crosstab(..., margins=True); is the grand total. Compare your number to the matching entry of the returned expected table.
Exercise 3: Measure the strength
For the species-by-island table, compute Cramér’s V to quantify how strong the association is, using where is the smaller of the number of rows and columns. How does a strength measure differ from the p-value?
Hint
Here , , and , so . A V near 1 means a very strong association; the p-value only told you the association was real, not that it was strong.
Summary
You learned to test whether two categorical variables are related. You built a contingency table with pd.crosstab, computed the expected counts independence would predict from the row and column totals, and summed the scaled, squared gaps into the chi-squared statistic. With degrees of freedom , scipy.stats.chi2_contingency turned that statistic into a p-value. Penguin species proved strongly dependent on island, and car engine size strongly dependent on origin. Most importantly, you learned what the test does not say: significance is not strength, big samples inflate significance, and association is never proof of cause.
Key Concepts
- Contingency table — a cross-tabulation counting every combination of two categorical variables.
- Expected count — the cell count predicted under independence, .
- Chi-squared statistic — , measuring total divergence from independence.
- Degrees of freedom — , the number of freely varying cells given fixed margins.
- Test of independence — rejects independence when is large and the p-value small.
Why This Matters
Categorical relationships are everywhere in real data work — which marketing channel drives which plan, which region buys which product, which treatment pairs with which outcome. The chi-squared test of independence is the standard first tool for asking “are these two labels connected?”, and knowing its cautions keeps you from mistaking a tiny p-value for a big, causal, or important finding.
Next Steps
Continue to Lesson 5 - Guided Project: Japanese vs American Cars
Put hypothesis and chi-squared tests to work in a full, open-ended analysis comparing two groups of cars.
Back to Module Overview
Return to the Statistical Inference module overview
Continue Building Your Skills
You can now take two columns of labels and ask, rigorously, whether they are connected. In the next lesson you will stop running tests one at a time and pull everything together — sampling, hypothesis tests, and chi-squared — into a single guided investigation of how Japanese and American cars really differ.