Lesson 4 - Seasonal Differencing and Transformations

Welcome to Seasonal Differencing and Transformations

Lesson 3 left off with a real problem: regular differencing passed the ADF test but left a lag-12 autocorrelation of 0.801 — almost the entire seasonal pattern, untouched. Seasonal differencing is the direct fix: instead of comparing each point to the one right before it, compare it to the one a full cycle earlier. You’ll also see a genuine trap here — combining regular and seasonal differencing doesn’t always help, and on Cyclepath it actively makes things worse by one measure. The lesson closes with transformations (log, Box-Cox), and why Cyclepath, confirmed additive back in Module 2, doesn’t need one.

By the end of this lesson, you will be able to:

  • Compute a seasonal difference and explain what relationship it targets
  • Compare seasonal differencing alone against combining it with regular differencing, using variance as a tiebreaker
  • Recognize overdifferencing when it happens
  • Explain when a log or Box-Cox transformation is the right tool, and why Cyclepath isn’t a case for one

Let’s target that lag-12 structure directly.


Seasonal Differencing Alone

import numpy as np, pandas as pd
from statsmodels.tsa.stattools import adfuller, acf

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()

D1 = y.diff(12).dropna()
print(len(D1))                  # 84
print(round(D1.mean(), 2))      # 1073.26

stat, pval, *_ = adfuller(D1, autolag="AIC")
print(round(stat, 3), round(pval, 4))   # -4.689 0.0001

y.diff(12) computes y_t - y_{t-12} — this June minus last June, this January minus last January — which directly targets the calendar relationship regular differencing ignored. On Cyclepath it passes the ADF test on its own (p = 0.0001, comfortably under 0.05). Its mean, 1,073.26, isn’t zero — and that’s expected, not a problem: over 12 months the trend climbs about 12 × 90 = 1080, so subtracting a point from one exactly a year earlier still carries a full year’s worth of trend growth. Stationarity requires a constant mean, not a zero one, and 1,073.26 is constant across the series — it just isn’t centered at 0.

Check what happened to the seasonal autocorrelation that regular differencing left behind:

a = acf(D1, nlags=13, fft=True)
print(round(a[12], 3))   # -0.417

Down from 0.801 (Lesson 3’s regular differencing) to -0.417 — still not negligible, but far smaller, and now the right sign of problem for a technique aimed squarely at the seasonal lag.


Combining Both: Does It Help?

If regular differencing handles trend and seasonal differencing handles the yearly cycle, combining them seems like the obvious next move:

d1D1 = y.diff().diff(12).dropna()
print(round(d1D1.mean(), 2))    # -3.48

stat, pval, *_ = adfuller(d1D1, autolag="AIC")
print(round(stat, 3), round(pval, 4))   # -4.542 0.0002

The combined series’ mean drops to -3.48, essentially zero — makes sense, since differencing twice fully cancels a linear trend (Lesson 3’s Exercise 1 previewed exactly this). It still passes ADF comfortably (p = 0.0002). So far this looks like an improvement. But check the variance of all three candidates side by side:

print(round(y.diff().dropna().var(), 1))        # 1567486.3  (regular diff alone)
print(round(D1.var(), 1))                         # 124680.0   (seasonal diff alone)
print(round(d1D1.var(), 1))                       # 257768.8   (both combined)

Seasonal differencing alone has the smallest variance of the three — over 12 times smaller than regular differencing alone, and about half the variance of combining both. Adding the regular difference on top of the seasonal one didn’t help; it made the series noisier. This is overdifferencing: applying more differencing than a series actually needs, which doesn’t just fail to help — it actively introduces extra variance that a model then has to treat as noise, even though it’s an artifact of the transformation, not the underlying data.

A bar chart comparing the variance of four versions of the Cyclepath series after different differencing choices: raw (about 11.8 million, tallest bar, cut off with a break mark), regular difference d=1 (about 1.57 million), seasonal difference D=1 alone (about 125 thousand, the shortest bar), and both combined d=1 and D=1 (about 258 thousand, taller than seasonal-alone). A checkmark highlights the seasonal-alone bar as the lowest-variance option that still passes the ADF test.
Variance by differencing choice: seasonal differencing alone (D=1) achieves the lowest variance of any ADF-passing option — less than half of combining regular and seasonal differencing, and over 12 times less than regular differencing alone.

More differencing isn’t automatically better

It’s easy to assume that if one difference helps, two must help more — but differencing isn’t free. Each difference operation amplifies short-term noise even as it cancels out longer-term structure, and once the structure you’re targeting (trend, or a specific seasonal lag) is gone, any further differencing is pure cost with no benefit. The discipline is: difference only as much as the evidence (ADF test and variance) says you need, and stop as soon as it does. For Cyclepath, that point is seasonal differencing alone — regular differencing on top of it isn’t earning its keep.


When a Transformation Helps Instead

Differencing fixes a mean that drifts. A different kind of non-stationarity — variance that grows or shrinks over time — calls for a different tool: a transformation, most commonly the log (or, more generally, a Box-Cox transformation, which log is a special case of). The log transform compresses large values proportionally more than small ones, which is exactly what shrinks a growing seasonal swing back down to a roughly constant size.

This is precisely the situation Module 2, Lesson 3 identified with its toy multiplicative series — the one where the seasonal swing grew from 173 to 269 units alongside the rising level, because the swing was a fixed percentage of the level rather than a fixed amount. A series like that has genuinely non-constant variance, and logging it before differencing is the standard fix.

Cyclepath is different. Module 2 tested it directly and confirmed it’s additive — its seasonal swing stays a roughly constant absolute size (6,537 to 7,840 trips across all eight years) even as the level nearly doubles. There’s no growing-variance problem here for a log to solve. Confirming this with the numbers:

logy = np.log(y)
logd1 = logy.diff().dropna()
stat, pval, *_ = adfuller(logd1, autolag="AIC")
print(round(stat, 3), round(pval, 4))   # -8.502 0.0000
print(round(logd1.var(), 5))            # 0.0105

Log-then-difference passes ADF just as convincingly as plain differencing did (p effectively 0), which makes sense — a log transform doesn’t undo a valid fix, it’s just solving a problem Cyclepath doesn’t have. Its variance (0.0105) isn’t directly comparable to the earlier numbers since it’s in log-units rather than trip counts, but there’s no evidence here that logging bought anything regular differencing didn’t already provide. The right general rule: reach for a transformation when Module 2’s swing-to-level test flagged multiplicative structure; reach for differencing when the problem is trend or a fixed-lag seasonal relationship — and for Cyclepath, that’s a differencing problem through and through.


Practice Exercises

Exercise 1: Explain the nonzero mean

Seasonal differencing alone left a mean of 1,073.26, not zero. Is that a problem for stationarity? Why or why not?

Hint

No — stationarity requires a constant mean, not a zero one. 1,073.26 is the same expected value at every point in the seasonally-differenced series (it doesn’t drift as you move through time), which is exactly what the ADF test is checking for and exactly why it passed at p = 0.0001. The number itself makes sense as roughly 12 months of trend growth (12 × 90 = 1,080) leaking through a comparison that’s a year apart, but its size doesn’t threaten stationarity — only a mean that changes over time would.

Exercise 2: Spot overdifferencing without a variance table

If you didn’t have the variance comparison in this lesson, what other clue might suggest a series has been overdifferenced?

Hint

A classic symptom of overdifferencing is a strong negative autocorrelation at lag 1 in the differenced series — differencing something that was already close to stationary tends to introduce an artificial back-and-forth alternation that wasn’t in the original data. You’d also want to compare the ADF statistic’s margin past the critical value (a much more negative statistic than needed suggests “more than enough” differencing) rather than just checking pass/fail, since both the right amount and too much differencing will pass the test — the test alone can’t tell you when to stop.

Exercise 3: Choosing between differencing and a transformation

A colleague has a series whose swings visibly grow every year, but whose mean stays roughly flat (no trend). Should they reach for differencing or a log transform first?

Hint

A log transform, not differencing. Differencing targets a drifting mean — a trend or a fixed seasonal relationship — and this series doesn’t have one; its problem is growing variance, which is exactly the multiplicative-style symptom a log transform is built to compress back down to roughly constant size. Applying differencing here would be solving the wrong problem: it might still technically pass an ADF test by coincidence, but it wouldn’t address the actual non-constant-variance issue, the same way logging Cyclepath wouldn’t address a trend it doesn’t have a variance problem with.


Summary

Seasonal differencing (y.diff(12)) targets the year-over-year relationship directly, and on Cyclepath it works better than regular differencing alone on every measure that matters: it passes the ADF test (p = 0.0001), and its variance (124,680) is over 12 times smaller than regular differencing’s (1,567,486). Combining regular and seasonal differencing still passes ADF (p = 0.0002) and drives the mean to nearly zero, but its variance (257,769) is more than double seasonal differencing alone — a real, measured case of overdifferencing, where more transformation actively hurts rather than helps. Separately, a log transform fixes a different problem — growing variance, the multiplicative symptom Module 2 tested for — and since Cyclepath was confirmed additive, logging it doesn’t solve a problem it doesn’t have; log-then-difference still passes ADF, but there’s no evidence it improves on plain differencing here.

Key Concepts

  • Seasonal differencingy_t - y_{t-s} for seasonal period s; targets the fixed-lag relationship regular differencing can’t touch.
  • Nonzero mean ≠ non-stationary — stationarity requires a constant mean, not a zero one.
  • Overdifferencing — applying more differencing than needed increases variance without further reducing structure; use variance, not just the ADF pass/fail, as a tiebreaker.
  • Transformation vs. differencing — differencing fixes a drifting mean (trend, seasonal lag); a transformation like log fixes growing variance (multiplicative structure).

Why This Matters

Real differencing decisions aren’t “difference until the test passes and stop” — as this lesson showed, a series can pass the ADF test at multiple different differencing choices, and the variance comparison is what tells you which one is actually the best fit rather than merely adequate. That discipline, plus knowing when the right tool is a transformation instead of a difference, is what keeps a stationarity pipeline from either underfitting (leftover structure, like Lesson 3’s untouched seasonality) or overfitting the transformation itself (this lesson’s overdifferencing case). The module capstone next runs this entire decision process on Cyclepath from scratch, landing on the specific choice that stationarizes it most efficiently.


Next Steps

Continue to Lesson 5 - Guided Project: Making Cyclepath Stationary

Run the full decision process — ADF, differencing, seasonal differencing, and the overdifferencing check — end to end.

Back to Module Overview

Return to the Stationarity and Differencing module overview


Continue Building Your Skills

You’ve now built every tool this module needs: the ADF test, regular differencing, seasonal differencing, the overdifferencing check, and the transformation-vs-differencing distinction. The guided project runs all of it on Cyclepath from the top, landing on the single transformation that stationarizes it most efficiently — the series Module 4 will read ACF and PACF plots from to choose ARIMA’s orders.