Lesson 2 - The Augmented Dickey-Fuller Test
Welcome to the Augmented Dickey-Fuller Test
Lesson 1 compared Cyclepath’s first-half and second-half statistics by hand and saw a clear mean shift — useful, but not a real answer to “is this series stationary?” It doesn’t say how confident to be, and it wouldn’t generalize to a subtler case where the difference is small but still real. The Augmented Dickey-Fuller (ADF) test replaces that eyeball check with an actual statistical test: a single function call that returns a number and a p-value you can act on.
By the end of this lesson, you will be able to:
- State the ADF test’s null and alternative hypotheses in terms of a “unit root”
- Run
adfullerfromstatsmodelsand read its output - Interpret the p-value against the conventional 0.05 threshold
- Confirm formally that raw Cyclepath is non-stationary
Let’s run it.
What the Test Actually Asks
The ADF test’s hypotheses are usually stated in terms of a unit root, a technical way of describing a series that wanders — each value is close to the last one plus random noise, with no force pulling it back toward a fixed level (the mathematical signature of a trending or drifting series):
- Null hypothesis (H₀): the series has a unit root — it is non-stationary.
- Alternative hypothesis (H₁): the series does not have a unit root — it is stationary.
That framing means the test is set up to require evidence against stationarity being absent — you only conclude “stationary” if the data gives you a good enough reason to reject the null. Practically, that comes down to one number: the p-value. A small p-value (conventionally below 0.05) lets you reject H₀ and conclude the series is stationary. A large p-value means you don’t have enough evidence to reject H₀ — the series behaves like it has a unit root, i.e. it’s non-stationary.
Running It on Cyclepath
import numpy as np, pandas as pd
from statsmodels.tsa.stattools import adfuller
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96); rng = np.random.default_rng(42)
trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")
y = cyclepath()
stat, pval, used_lag, nobs, crit_values, icbest = adfuller(y, autolag="AIC")
print(round(stat, 3)) # -0.920
print(round(pval, 4)) # 0.7815
print(used_lag) # 11
print(nobs) # 84
print({k: round(v, 3) for k, v in crit_values.items()})
# {'1%': -3.511, '5%': -2.897, '10%': -2.585}adfuller returns six things, but three matter most: the ADF statistic (-0.920), the p-value (0.7815), and the critical values at standard confidence levels. The p-value here is nowhere close to 0.05 — it’s over 78% — so there is essentially no evidence to reject the null hypothesis. Cyclepath, tested formally, is non-stationary. That confirms exactly what Lesson 1’s informal split-half comparison suggested, but now with a rigorous test behind it instead of a visual impression.
Two equivalent ways to read the result
You can read adfuller’s output two ways, and they always agree: compare the p-value to 0.05 (0.7815 > 0.05 → fail to reject → non-stationary), or compare the ADF statistic to a critical value (-0.920 is not more negative than -2.897, the 5% critical value → fail to reject → non-stationary). The ADF statistic needs to be more negative than the critical value to reject the null — think of it as needing enough evidence of “pulling back toward a fixed level” to overcome the default assumption of drift. Both readings point the same direction here, and in practice most people just check the p-value.
What autolag="AIC" and used_lag Mean
The ADF test needs to account for autocorrelation in the series before testing for a unit root, and it does that by including a number of lagged difference terms in its internal regression. autolag="AIC" tells adfuller to automatically pick that number of lags by minimizing the Akaike Information Criterion — a standard model-selection score that balances fit against complexity — rather than you having to guess a lag count by hand. On Cyclepath, it settled on 11 lags, leaving 84 usable observations (nobs) out of the original 96 after accounting for those lags. This is the sensible default for real data; you’d only override it if you had a specific reason to fix the lag count yourself.
Practice Exercises
Exercise 1: Reading a p-value
An ADF test on a different series returns a p-value of 0.02. What do you conclude, and what would you do next?
Hint
A p-value of 0.02 is below the conventional 0.05 threshold, so you reject the null hypothesis — there’s sufficient evidence the series does not have a unit root, meaning it’s stationary. Unlike Cyclepath, this series wouldn’t need differencing before fitting an ARIMA-family model; you could proceed directly to choosing model orders (Module 4’s ACF/PACF reading) instead of this module’s differencing steps.
Exercise 2: Statistic vs. critical value
A series returns an ADF statistic of -3.8, with critical values of -3.51 (1%), -2.90 (5%), and -2.58 (10%). Is the series stationary at the 5% level? At the 1% level?
Hint
-3.8 is more negative than all three critical values (-3.51, -2.90, -2.58), so the null hypothesis is rejected at every listed confidence level, including the strictest, 1%. The series is stationary by this test, and confidently so — the ADF statistic would need to be less negative than -2.90 to fail even the loosest (5%) threshold, which isn’t the case here.
Exercise 3: Why not just eyeball it?
Lesson 1’s split-half check and this lesson’s ADF test agreed on Cyclepath. Given that, why bother with the formal test at all?
Hint
They agreed here because Cyclepath’s non-stationarity is large and obvious — but the whole point of a formal test is to handle the cases where it isn’t obvious: a subtle trend easy to miss by eye, a borderline case where two analysts might disagree looking at the same plot, or a series where you need to prove stationarity (or its absence) to justify a modeling choice, not just assert it. The ADF test also gives you something the eyeball check can’t: a specific, reproducible threshold (p < 0.05) that doesn’t depend on how the plot happens to look, which matters when you’re testing dozens of series or re-running the same pipeline on new data automatically.
Summary
The Augmented Dickey-Fuller test formalizes the stationarity question with a null hypothesis that the series has a unit root (is non-stationary) and an alternative that it doesn’t (is stationary). Running statsmodels.tsa.stattools.adfuller(y, autolag="AIC") on raw Cyclepath returns an ADF statistic of -0.920 and a p-value of 0.7815 — far above the conventional 0.05 threshold, so there’s no evidence to reject non-stationarity. The test automatically selected 11 lags via AIC, using 84 of the 96 available observations. This confirms formally what Lesson 1’s informal split-half check suggested: Cyclepath is not stationary, and needs fixing before an ARIMA-family model can be fit to it.
Key Concepts
- Null hypothesis (unit root) — the ADF test assumes non-stationarity by default; you need evidence to reject that assumption.
- p-value threshold — conventionally, p < 0.05 rejects the null and concludes stationarity.
- ADF statistic vs. critical values — an equivalent way to read the same result; more negative than the critical value means reject the null.
- autolag=“AIC” — automatically chooses how many lagged terms the test accounts for, using the standard default rather than a manual guess.
Why This Matters
Every differencing decision in the rest of this module is judged against this same test — you’ll difference, re-run adfuller, and check whether the p-value crosses 0.05. Without a formal, repeatable test, “did that fix it?” would be a matter of opinion; with the ADF test, it’s a number you can check every time, on any series, without re-litigating what “looks stationary enough” means. Next, you’ll use the tool this module is really about: differencing, starting with the most common fix — removing trend.
Next Steps
Continue to Lesson 3 - Differencing to Remove Trend
Use pandas' diff() to remove trend, then confirm the fix by re-running the ADF test.
Back to Module Overview
Return to the Stationarity and Differencing module overview
Continue Building Your Skills
You now have a formal, repeatable test for stationarity — and confirmed evidence that Cyclepath fails it. Next you’ll fix that with the most common tool in the kit: differencing. You’ll remove the trend, re-run the exact same ADF test, and watch the p-value cross the 0.05 line.