Lesson 1 - What Is Stationarity?

Welcome to What Is Stationarity?

Module 2 ended by decomposing Cyclepath into a rising trend and a repeating seasonal wave — useful for understanding the series, but it also quietly proved something important: a series whose average keeps climbing and whose typical value swings with the calendar cannot be stationary. That word gets used constantly in forecasting, and for good reason — the ARIMA family of models this course builds toward, starting in Module 5, is built on the assumption that the series it’s fitting is stationary. Before fixing that (this module’s job), you need a precise definition of what “stationary” actually requires.

By the end of this lesson, you will be able to:

  • State the three properties a stationary series must have: constant mean, constant variance, time-independent autocovariance
  • Explain why ARIMA-family models need stationarity to work correctly
  • Check informally whether a series looks stationary by comparing statistics across different windows
  • Predict, from Module 2’s decomposition alone, that Cyclepath is not stationary

Let’s define it precisely.


Three Requirements

A time series is (weakly) stationary if three things hold, none of which depend on where in time you look:

  • Constant mean — the average level doesn’t drift. A series with a trend fails this immediately: its mean over the first few years is different from its mean over the last few, by construction.
  • Constant variance — how much the series spreads around its mean doesn’t change over time. A series whose swings get wilder (or calmer) as time passes fails this, even if its mean is flat.
  • Time-independent autocovariance — how strongly y_t relates to y_{t-k} depends only on the lag k, never on the specific time t. This month’s relationship to last month should look the same whether “this month” is January 2016 or January 2023.

A series with any trend or any seasonality violates at least one of these — trend breaks the constant-mean requirement, seasonality breaks time-independent autocovariance (the relationship between June and December is different from the relationship between June and July, and that pattern itself repeats identically every year, which is exactly the definition of not independent of time). Cyclepath, with both, breaks stationarity in two separate ways at once.

Why ARIMA needs this

ARIMA-family models describe a series as a function of its own recent past — this month depends on the last few months in a fixed, learnable way. That only makes sense if the relationship between “now” and “the recent past” is the same relationship no matter which “now” you pick. Fit such a model to a series with a trend, and it has no way to represent “the average keeps climbing forever” — it will systematically underpredict a rising series and overpredict a falling one, because it was never given a mechanism for a moving target. Stationarity isn’t a formality; it’s the specific assumption that makes the model’s math valid.


An Informal Check: Split and Compare

Before reaching for a formal test (Lesson 2), you can often see non-stationarity just by comparing statistics across different slices of the same series:

import numpy as np, pandas as pd

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()

first_half, second_half = y.iloc[:48], y.iloc[48:]
print(round(first_half.mean(), 1), round(first_half.std(), 1))    # 11142.9 2679.3
print(round(second_half.mean(), 1), round(second_half.std(), 1))  # 15391.1 2718.1

The first four years (2016–2019) average 11,142.9 trips a month; the last four (2020–2023) average 15,391.1 — a 38% jump in the mean between two halves of the same series. That alone violates the constant-mean requirement and is exactly what you’d expect from Module 2’s trend, which climbed from about 8,946 to 17,413 across the same period. Notice, though, that the standard deviation barely moves (2,679.3 versus 2,718.1) — consistent with Module 2’s finding that Cyclepath is additive, with a seasonal swing that stays a roughly constant absolute size rather than growing with the level. Non-stationarity here is specifically a mean problem (the trend), not a variance problem — a distinction that will matter for choosing the right fix later in this module.

Two side-by-side line charts sharing a y-axis. Left chart, titled 'stationary', shows a flat, evenly-wiggling series whose center line stays level and whose spread of ups and downs looks the same from left to right, with two shaded horizontal bands marking the first-half and second-half means sitting at the same height. Right chart, titled 'Cyclepath: not stationary', shows a clearly rising, seasonally-wiggling series where the first-half shaded mean band sits noticeably lower than the second-half shaded mean band, visually demonstrating the mean has shifted.
A stationary series (left) has the same statistical character in any window — same center, same spread. Cyclepath (right) doesn't: its first-half mean band sits well below its second-half mean band, a direct, visible symptom of the trend Module 2 measured.

What “Fixing” This Will Mean

Splitting in half and eyeballing the numbers is a useful gut check, but it’s not a real test — it doesn’t tell you how sure to be, and it would miss subtler cases. That’s what Lesson 2’s Augmented Dickey-Fuller test provides: a formal statistical test with a p-value, giving you a rigorous yes-or-no instead of “the numbers look different.” And once you can detect non-stationarity reliably, the rest of the module is about fixing it: Lesson 3 removes trend with differencing, Lesson 4 removes seasonality with seasonal differencing and discusses when a transformation like log helps, and Lesson 5’s capstone runs the whole decision process on Cyclepath to find the specific fix that works without over-correcting.


Practice Exercises

Exercise 1: Which requirement does a trend break?

A series climbs steadily with no seasonality and constant-sized random noise around the climb. Which of the three stationarity requirements does it violate, and which does it satisfy?

Hint

It violates constant mean — the average level keeps rising, so any two non-overlapping windows will have different means, exactly like Cyclepath’s first-half/second-half comparison. It satisfies constant variance (the noise stays the same size throughout) and, with no seasonality and no other structure in the noise, it likely satisfies time-independent autocovariance too. A pure trend with constant-variance noise is a clean example of a series that fails stationarity for exactly one of the three reasons, not all three.

Exercise 2: A variance-only violation

A series has no trend (flat mean throughout) and no seasonality, but its swings get progressively wilder over time — small wiggles early on, large wiggles later. Is this series stationary?

Hint

No — it violates constant variance, even though the mean is flat. This is a common pattern in financial and volume data (a stock whose price is flat but whose day-to-day swings grow during turbulent periods) and it’s a different failure mode from Cyclepath’s, which is fundamentally a mean problem. Recognizing which requirement fails matters because the fix differs: differencing (this module’s tool) targets mean non-stationarity; a variance problem like this one typically calls for a transformation like a log or Box-Cox instead, which Lesson 4 discusses.

Exercise 3: Predict before testing

Without running any code, would you expect Cyclepath’s autocovariance between June and July to look the same as its autocovariance between June and December? What does your answer imply about time-independent autocovariance?

Hint

No — June and July are adjacent months in the same season (both near the summer peak), so they should be strongly positively related; June and December are opposite points in the seasonal cycle (peak versus trough), so they should be much less alike, or even inversely related. Critically, this same June-July-versus-June-December pattern repeats every year, which means the relationship depends only on the lag between two points (1 month vs. 6 months) in a fixed, repeating way — not on which specific year you check. That’s actually consistent with time-independent autocovariance in the technical sense (the pattern by lag doesn’t change year to year); it’s the seasonal content of that pattern, still present even after removing trend, that Lesson 4’s seasonal differencing specifically targets.


Summary

Stationarity requires three things to hold regardless of when you look: constant mean, constant variance, and time-independent autocovariance (the relationship between points a fixed distance apart doesn’t change over time). Trend breaks the mean requirement; seasonality means the calendar-driven relationship between points repeats identically every cycle. ARIMA-family models assume stationarity because they describe a series as a fixed relationship to its own recent past — a relationship that has to mean the same thing at every point in time to be learnable at all. An informal split-and-compare check already shows Cyclepath’s mean jumping 38% between its first and second half (11,142.9 → 15,391.1) while its standard deviation barely moves (2,679.3 → 2,718.1) — a mean problem consistent with the additive trend Module 2 measured, not a variance problem.

Key Concepts

  • Constant mean — the average level doesn’t drift over time; trend is the most common violation.
  • Constant variance — the spread around the mean doesn’t change; a distinct failure mode from a trend, usually fixed differently.
  • Time-independent autocovariance — the relationship between points a fixed lag apart doesn’t depend on which point in time you start from.
  • Why ARIMA needs it — these models learn one fixed relationship to the recent past; a moving target (trend, changing variance) breaks that assumption.

Why This Matters

Every technique in this module exists to turn a series that fails these three requirements into one that satisfies them, because that’s the raw material ARIMA and SARIMA can actually use. Recognizing which requirement a series violates — a mean problem like Cyclepath’s trend, a variance problem like growing volatility, or a seasonal autocovariance problem — tells you which fix to reach for before you’ve wasted time on the wrong one. Next, you’ll replace the informal split-and-compare check with a real statistical test: the Augmented Dickey-Fuller test, which gives you a rigorous, quantified answer instead of a visual impression.


Next Steps

Continue to Lesson 2 - The Augmented Dickey-Fuller Test

Replace the informal check with a formal statistical test for stationarity, and run it on Cyclepath.

Back to Module Overview

Return to the Stationarity and Differencing module overview


Continue Building Your Skills

You now have a precise definition of stationarity and have seen Cyclepath fail it informally — a 38% mean shift between halves of the series. Next you’ll formalize that impression with the Augmented Dickey-Fuller test, a real statistical test that turns “the numbers look different” into a rigorous, quantified answer.