Lesson 3 - Confidence Bands and Multiple Testing
Welcome to Confidence Bands and Multiple Testing
Every ACF and PACF plot you’ll see — including every one in this module — comes with a shaded band, and it’s tempting to read it as a simple rule: bars inside the band don’t matter, bars outside do. That’s mostly right, but there’s a real trap hiding in “mostly.” Test enough lags, and pure chance alone will occasionally push a bar outside the band even when there’s genuinely nothing there. This lesson gives you the actual formula behind the band, and then proves the trap is real with a simulation.
By the end of this lesson, you will be able to:
- State the formula for the standard ACF/PACF significance band and what it assumes
- Explain what “significant at this lag” actually means statistically
- Recognize the multiple-testing problem and quantify how often it produces false alarms
- Apply healthy skepticism to an isolated significant spike with no independent explanation
Let’s start with where the band comes from.
Where the Band Comes From
Under the null hypothesis that a series is pure white noise — no autocorrelation at any lag — the sample ACF at any given lag is approximately normally distributed with a standard deviation of about 1/√n, where n is the number of observations. A 95% confidence interval around zero is then approximately:
±1.96 / √nThat’s the band you see shaded on every ACF/PACF plot in this module. A bar that pokes outside it is further from zero than you’d expect from pure randomness alone, at the conventional 95% confidence level — which is exactly the same logic as the ADF test’s p-value threshold back in Module 3, just applied lag by lag instead of series-wide.
import numpy as np
n = 84 # Cyclepath's seasonally-differenced series, from Module 3
band = 1.96 / np.sqrt(n)
print(round(band, 4)) # 0.2139For Cyclepath’s stationary series (n = 84, from Module 3), any ACF or PACF value with an absolute value above 0.2139 is conventionally treated as significant.
The Trap: Testing Many Lags at Once
Here’s the problem. A 95% confidence interval means a 5% chance of a false positive at any single lag — a bar crossing the band even though the true value is zero. That’s an acceptable risk for one test. But an ACF/PACF plot doesn’t test one lag; it typically shows ten, fifteen, or more, all at once. Test enough of them, and the odds that at least one crosses the line by pure chance climb fast.
Simulate this directly: generate 2,000 independent pure-white-noise series (no autocorrelation whatsoever, by construction), each the same length as Cyclepath’s stationary series (n=84), and check how often the ACF plot alone would flag something:
import numpy as np
from statsmodels.tsa.stattools import acf
rng = np.random.default_rng(3)
n, nlags, nsim = 84, 15, 2000
band = 1.96 / np.sqrt(n)
total_tests, total_sig, at_least_one = 0, 0, 0
for _ in range(nsim):
x = rng.normal(0, 1, n)
a = acf(x, nlags=nlags, fft=True)[1:]
sig = np.abs(a) > band
total_tests += nlags
total_sig += sig.sum()
if sig.any():
at_least_one += 1
print(round(total_sig / total_tests, 3)) # 0.037
print(round(at_least_one / nsim, 3)) # 0.419Two numbers matter here. The per-lag false-positive rate (0.037, close to the expected 5%, with the small gap being a known finite-sample effect) confirms the band is doing roughly what it claims — at any single lag, tested alone. But the series-level rate — the fraction of pure noise series that show at least one “significant” bar somewhere across 15 tested lags — is 42%. Nearly half of the time, a series with zero genuine autocorrelation will still produce a plot with at least one bar poking outside the band, purely from chance.
This doesn’t make the band useless
The band is still the right tool — it correctly flags roughly 5% of individual lags in pure noise, exactly as designed. The lesson isn’t “ignore the band,” it’s “don’t trust a single isolated spike with no other explanation, especially when you’re scanning many lags at once.” A spike that lines up with something you already have independent reason to expect — a known seasonal period, a lag that showed up consistently across multiple related series — is much more trustworthy than one that appears at an arbitrary lag with no story behind it.
Applying This to a Real Plot
This is exactly the discipline you’ll need in Lesson 4, where Cyclepath’s real ACF/PACF plot shows two lags crossing the band: lag 7 and lag 12. Lag 12 has a ready explanation — it’s the seasonal period, and Module 3 already found leftover seasonal autocorrelation at exactly this lag even after seasonal differencing (-0.417 in the ACF back in that module). Lag 7 has no such story; nothing about a monthly bike-share series gives lag 7 special meaning the way lag 12 obviously does. Knowing that roughly 42% of pure noise series would show some spurious spike somewhere among 15 lags is exactly the reason to treat lag 7 with real suspicion rather than building a model term around it just because it crossed a line.
Practice Exercises
Exercise 1: Recompute the band for a longer series
If Cyclepath’s stationary series had 500 observations instead of 84, what would the significance band be, and what does that imply about detecting weaker autocorrelation?
Hint
1.96 / √500 ≈ 0.0877 — a much narrower band than 84 observations gives (0.2139). A narrower band means weaker autocorrelation can still register as statistically significant, since the test has more data to distinguish a small-but-real effect from pure noise. This is the general pattern behind all statistical significance testing: more data means more power to detect subtle real effects, not just bigger effects.
Exercise 2: One-in-fifteen isn’t one-in-twenty
If each individual lag has roughly a 5% false-positive rate, why did the simulation find a 42% chance of at least one false positive across 15 lags, rather than something closer to 15 × 5% = 75%, or exactly 5%?
Hint
It’s not simple addition, because you’re asking for the probability of at least one success across 15 roughly independent trials, which compounds differently: 1 - (1 - 0.05)^15 ≈ 1 - 0.463 ≈ 0.537 is the naive independence-based estimate, and the simulation’s 42% is in that same ballpark (a bit lower, partly because ACF estimates at nearby lags aren’t fully independent of each other — they share overlapping data — which slightly reduces the effective number of independent tests). The key intuition either way: the chance of some false alarm grows quickly with the number of lags tested, even though each individual lag’s error rate stays fixed at 5%.
Exercise 3: Two significant lags, different confidence
Two lags in a real ACF plot are significant: one at the known seasonal period, one at an arbitrary lag with no calendar meaning. How should your confidence in each differ, even though both technically crossed the same band?
Hint
Much higher confidence in the seasonal-period lag. Both bars crossed the same statistical threshold, but the seasonal lag has independent, out-of-band evidence supporting it — a known 12-month cycle, corroborated by Module 3’s decomposition and stationarity work — while the arbitrary lag has nothing beyond “it happened to cross the line,” which the 42% simulation result shows happens often even in pure noise. Statistical significance from the band alone is necessary but not sufficient; a substantive reason for a spike to exist is what turns “crossed the line” into “probably real.”
Summary
The standard ACF/PACF significance band is ±1.96/√n, the 95% confidence interval under the null hypothesis that the series is pure white noise — for Cyclepath’s 84-point stationary series, that’s ±0.2139. Individually, this band is well-calibrated: a real simulation of 2,000 pure-white-noise series confirmed a roughly 3.7% per-lag false-positive rate, close to the nominal 5%. But testing many lags at once compounds that risk — the same simulation found that 42% of pure-noise series showed at least one “significant” bar somewhere across 15 tested lags, purely by chance. The practical discipline: treat an isolated spike with real suspicion unless it has independent backing — a known seasonal period, a structural reason, or corroboration from other analysis — rather than building a model term around any bar that happens to cross the line.
Key Concepts
- Significance band formula —
±1.96/√n, the 95% CI for ACF/PACF under the white-noise null. - Per-lag vs. series-level false-positive rate — each lag has roughly a 5% error rate alone, but testing many lags compounds that risk substantially.
- 42% false-alarm rate (simulated) — nearly half of pure-noise series show at least one spurious “significant” lag across 15 tested.
- Corroboration matters — a spike with an independent explanation (like a known seasonal period) is far more trustworthy than an isolated one.
Why This Matters
Reading an ACF/PACF plot without this caution leads to a specific, common mistake: treating every bar that crosses the band as real structure to model, when a meaningful fraction of the time some of those bars are just noise. The discipline this lesson builds — trust corroborated spikes, treat isolated ones skeptically — is exactly what keeps Lesson 4’s reading of Cyclepath’s real plot honest, where lag 7 and lag 12 both cross the band but only one of them has a story behind it.
Next Steps
Continue to Lesson 4 - Reading Cyclepath's ACF and PACF
Apply the AR/MA signatures and the significance discipline to Cyclepath's real, stationary series.
Back to Module Overview
Return to the Autocorrelation: ACF and PACF module overview
Continue Building Your Skills
You now know the significance band’s formula and, more importantly, its limits — a real simulation showed 42% of pure noise trips at least one false alarm across 15 lags. Next you’ll put every tool from this module together on Cyclepath’s actual stationary series: the AR/MA signatures from Lesson 2, and the skepticism about isolated spikes from this lesson.