Lesson 2 - The Augmented Dickey-Fuller Test

Welcome to the Augmented Dickey-Fuller Test

Lesson 1 compared Cyclepath’s first-half and second-half statistics by hand and saw a clear mean shift — useful, but not a real answer to “is this series stationary?” It doesn’t say how confident to be, and it wouldn’t generalize to a subtler case where the difference is small but still real. The Augmented Dickey-Fuller (ADF) test replaces that eyeball check with an actual statistical test: a single function call that returns a number and a p-value you can act on.

By the end of this lesson, you will be able to:

State the ADF test’s null and alternative hypotheses in terms of a “unit root”
Run adfuller from statsmodels and read its output
Interpret the p-value against the conventional 0.05 threshold
Confirm formally that raw Cyclepath is non-stationary

Let’s run it.

What the Test Actually Asks

The ADF test’s hypotheses are usually stated in terms of a unit root, a technical way of describing a series that wanders — each value is close to the last one plus random noise, with no force pulling it back toward a fixed level (the mathematical signature of a trending or drifting series):

Null hypothesis (H₀): the series has a unit root — it is non-stationary.
Alternative hypothesis (H₁): the series does not have a unit root — it is stationary.

That framing means the test is set up to require evidence against stationarity being absent — you only conclude “stationary” if the data gives you a good enough reason to reject the null. Practically, that comes down to one number: the p-value. A small p-value (conventionally below 0.05) lets you reject H₀ and conclude the series is stationary. A large p-value means you don’t have enough evidence to reject H₀ — the series behaves like it has a unit root, i.e. it’s non-stationary.

Running It on Cyclepath

import numpy as np, pandas as pd
from statsmodels.tsa.stattools import adfuller

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()

stat, pval, used_lag, nobs, crit_values, icbest = adfuller(y, autolag="AIC")
print(round(stat, 3))            # -0.920
print(round(pval, 4))            # 0.7815
print(used_lag)                  # 11
print(nobs)                      # 84
print({k: round(v, 3) for k, v in crit_values.items()})
# {'1%': -3.511, '5%': -2.897, '10%': -2.585}

adfuller returns six things, but three matter most: the ADF statistic (-0.920), the p-value (0.7815), and the critical values at standard confidence levels. The p-value here is nowhere close to 0.05 — it’s over 78% — so there is essentially no evidence to reject the null hypothesis. Cyclepath, tested formally, is non-stationary. That confirms exactly what Lesson 1’s informal split-half comparison suggested, but now with a rigorous test behind it instead of a visual impression.

Two equivalent ways to read the result

You can read adfuller’s output two ways, and they always agree: compare the p-value to 0.05 (0.7815 > 0.05 → fail to reject → non-stationary), or compare the ADF statistic to a critical value (-0.920 is not more negative than -2.897, the 5% critical value → fail to reject → non-stationary). The ADF statistic needs to be more negative than the critical value to reject the null — think of it as needing enough evidence of “pulling back toward a fixed level” to overcome the default assumption of drift. Both readings point the same direction here, and in practice most people just check the p-value.

The ADF decision rule: a p-value under 0.05 lets you reject the null hypothesis of a unit root and call the series stationary. Cyclepath's p-value of 0.7815 lands squarely on the "fail to reject" side — non-stationary, confirmed formally.

What `autolag="AIC"` and `used_lag` Mean

The ADF test needs to account for autocorrelation in the series before testing for a unit root, and it does that by including a number of lagged difference terms in its internal regression. autolag="AIC" tells adfuller to automatically pick that number of lags by minimizing the Akaike Information Criterion — a standard model-selection score that balances fit against complexity — rather than you having to guess a lag count by hand. On Cyclepath, it settled on 11 lags, leaving 84 usable observations (nobs) out of the original 96 after accounting for those lags. This is the sensible default for real data; you’d only override it if you had a specific reason to fix the lag count yourself.

Practice Exercises

Exercise 1: Reading a p-value

An ADF test on a different series returns a p-value of 0.02. What do you conclude, and what would you do next?

Hint

A p-value of 0.02 is below the conventional 0.05 threshold, so you reject the null hypothesis — there’s sufficient evidence the series does not have a unit root, meaning it’s stationary. Unlike Cyclepath, this series wouldn’t need differencing before fitting an ARIMA-family model; you could proceed directly to choosing model orders (Module 4’s ACF/PACF reading) instead of this module’s differencing steps.

Exercise 2: Statistic vs. critical value

A series returns an ADF statistic of -3.8, with critical values of -3.51 (1%), -2.90 (5%), and -2.58 (10%). Is the series stationary at the 5% level? At the 1% level?

Hint

-3.8 is more negative than all three critical values (-3.51, -2.90, -2.58), so the null hypothesis is rejected at every listed confidence level, including the strictest, 1%. The series is stationary by this test, and confidently so — the ADF statistic would need to be less negative than -2.90 to fail even the loosest (5%) threshold, which isn’t the case here.

Exercise 3: Why not just eyeball it?

Lesson 1’s split-half check and this lesson’s ADF test agreed on Cyclepath. Given that, why bother with the formal test at all?

Hint

They agreed here because Cyclepath’s non-stationarity is large and obvious — but the whole point of a formal test is to handle the cases where it isn’t obvious: a subtle trend easy to miss by eye, a borderline case where two analysts might disagree looking at the same plot, or a series where you need to prove stationarity (or its absence) to justify a modeling choice, not just assert it. The ADF test also gives you something the eyeball check can’t: a specific, reproducible threshold (p < 0.05) that doesn’t depend on how the plot happens to look, which matters when you’re testing dozens of series or re-running the same pipeline on new data automatically.

Summary

The Augmented Dickey-Fuller test formalizes the stationarity question with a null hypothesis that the series has a unit root (is non-stationary) and an alternative that it doesn’t (is stationary). Running statsmodels.tsa.stattools.adfuller(y, autolag="AIC") on raw Cyclepath returns an ADF statistic of -0.920 and a p-value of 0.7815 — far above the conventional 0.05 threshold, so there’s no evidence to reject non-stationarity. The test automatically selected 11 lags via AIC, using 84 of the 96 available observations. This confirms formally what Lesson 1’s informal split-half check suggested: Cyclepath is not stationary, and needs fixing before an ARIMA-family model can be fit to it.

Key Concepts

Null hypothesis (unit root) — the ADF test assumes non-stationarity by default; you need evidence to reject that assumption.
p-value threshold — conventionally, p < 0.05 rejects the null and concludes stationarity.
ADF statistic vs. critical values — an equivalent way to read the same result; more negative than the critical value means reject the null.
autolag=“AIC” — automatically chooses how many lagged terms the test accounts for, using the standard default rather than a manual guess.

Why This Matters

Every differencing decision in the rest of this module is judged against this same test — you’ll difference, re-run adfuller, and check whether the p-value crosses 0.05. Without a formal, repeatable test, “did that fix it?” would be a matter of opinion; with the ADF test, it’s a number you can check every time, on any series, without re-litigating what “looks stationary enough” means. Next, you’ll use the tool this module is really about: differencing, starting with the most common fix — removing trend.

Next Steps

Continue to Lesson 3 - Differencing to Remove Trend

Use pandas' diff() to remove trend, then confirm the fix by re-running the ADF test.

Back to Module Overview

Return to the Stationarity and Differencing module overview

Continue Building Your Skills

You now have a formal, repeatable test for stationarity — and confirmed evidence that Cyclepath fails it. Next you’ll fix that with the most common tool in the kit: differencing. You’ll remove the trend, re-run the exact same ADF test, and watch the p-value cross the 0.05 line.

Previous lesson

Lesson 1 - What Is Stationarity?

Next lesson

Lesson 3 - Differencing to Remove Trend

Courses

DATATWEETS

Title here

Lesson 2 - The Augmented Dickey-Fuller Test

Welcome to the Augmented Dickey-Fuller Test

What the Test Actually Asks

Running It on Cyclepath

What `autolag="AIC"` and `used_lag` Mean

Practice Exercises

Exercise 1: Reading a p-value

Exercise 2: Statistic vs. critical value

Exercise 3: Why not just eyeball it?

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 3 - Differencing to Remove Trend

Back to Module Overview

Continue Building Your Skills

Lesson 2 - The Augmented Dickey-Fuller Test

Welcome to the Augmented Dickey-Fuller Test#

What the Test Actually Asks#

Running It on Cyclepath#

What autolag="AIC" and used_lag Mean#

Practice Exercises#

Exercise 1: Reading a p-value#

Exercise 2: Statistic vs. critical value#

Exercise 3: Why not just eyeball it?#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 3 - Differencing to Remove Trend

Back to Module Overview

Continue Building Your Skills#

Welcome to the Augmented Dickey-Fuller Test

What the Test Actually Asks

Running It on Cyclepath

What `autolag="AIC"` and `used_lag` Mean

Practice Exercises

Exercise 1: Reading a p-value

Exercise 2: Statistic vs. critical value

Exercise 3: Why not just eyeball it?

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills