Lesson 3 - Stationarizing and Reading the Autocorrelation
On this page
Welcome to Stationarizing and Reading the Autocorrelation
Lesson 2 settled two questions: Lantern & Vine is multiplicative, and its growth rate genuinely decelerated partway through. Both findings change how this lesson proceeds. Working in log space follows directly from the multiplicative finding, the same logic Module 3 taught but genuinely needed here for the first time. And the regime change means Cyclepath’s specific differencing answer, seasonal differencing alone, cannot simply be assumed to transfer.
By the end of this lesson, you will be able to:
- Explain why a multiplicative series’ stationarity work should happen in log space
- Run the same ADF-and-variance comparison Module 3 taught, on a series where the winner is different
- Explain why a trend regime change specifically breaks seasonal differencing
- Read leftover autocorrelation to justify a SARIMA seasonal term even when seasonal differencing itself fails
Let’s stationarize the series properly.
Working in Log Space
Lesson 2 confirmed multiplicative structure, so every stationarity test in this lesson runs on log(y), not y directly, exactly the log-transform logic Module 3 introduced but never strictly needed for Cyclepath:
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller, acf
def lantern_vine():
idx = pd.date_range("2020-01-06", periods=208, freq="W-MON")
t = np.arange(208)
rng = np.random.default_rng(7)
growth_rate = np.where(t < 104, 0.008, 0.003)
log_level = np.cumsum(growth_rate)
level = 500 * np.exp(log_level)
seasonal_factor = 1 + 0.35 * np.sin(2 * np.pi * (t - 35) / 52)
noise_factor = rng.normal(1, 0.04, 208)
y = level * seasonal_factor * noise_factor
return pd.Series(np.round(y).astype(int), index=idx, name="units_sold")
y = lantern_vine()
logy = np.log(y)
def report(name, s):
s = s.dropna()
stat, pval, *_ = adfuller(s, autolag="AIC")
print(f"{name:20s} n={len(s):3d} ADF={stat:8.3f} p={pval:.4f} var={s.var():10.5f}")
report("log(y) raw", logy)log(y) raw n=208 ADF= -1.928 p=0.3192 var= 0.19994Raw log sales fail the ADF test decisively (p = 0.3192), exactly as expected for a series with both a trend and a season.
The Same Four-Way Comparison, a Different Winner
report("diff log(y)", logy.diff())
report("seasonal-diff (52)", logy.diff(52))
report("d=1,D=1(52)", logy.diff().diff(52))diff log(y) n=207 ADF= -5.925 p=0.0000 var= 0.00375
seasonal-diff (52) n=156 ADF= -0.706 p=0.8451 var= 0.01587
d=1,D=1(52) n=155 ADF= -7.168 p=0.0000 var= 0.00594First differencing alone passes decisively (p = 0.0000) with the lowest variance of the three, 0.00375. Seasonal differencing alone fails outright (p = 0.8451), nowhere close to significant. Combining both differences passes (p = 0.0000) but at nearly double the variance of first differencing alone (0.00594 versus 0.00375). On Cyclepath, Module 3 found the exact opposite: seasonal differencing alone won cleanly, and regular differencing alone left too much seasonal structure behind. Here, plain first differencing is the answer, and seasonal differencing alone does not even clear the stationarity bar.
Why the Regime Change Breaks Seasonal Differencing Here
Seasonal differencing compares each week to the same week one year earlier, y_t - y_{t-52}. That comparison implicitly assumes the rate of growth between those two points has been roughly constant across the intervening year. Lesson 2 found that assumption is false for a large stretch of this series: any seasonal difference spanning the boundary around week 104 is comparing a week growing at one pace against a week from a year earlier growing at a genuinely different pace, leaving behind exactly the kind of non-constant residual drift that fails an ADF test. Cyclepath’s constant growth rate meant this problem never arose; Lantern & Vine’s regime change means it arises here directly, and is the reason seasonal differencing alone is not the fix on this series.
What’s Still Left: The Case for a Seasonal Term Anyway
Even though seasonal differencing alone was not the answer, check whether first differencing left any seasonal structure behind, exactly the check Module 3 taught you to always run:
d1 = logy.diff().dropna()
a = acf(d1, nlags=55, fft=True)
print(round(a[1], 3), round(a[26], 3), round(a[52], 3))-0.099 -0.237 0.154A real autocorrelation of 0.154 remains at lag 52, the seasonal lag, and -0.237 at lag 26, half a cycle away. Passing the ADF test with first differencing alone does not mean the series is free of seasonal structure, precisely Module 3’s warning, now demonstrated on a series where the differencing recipe itself came out different. This leftover autocorrelation is exactly the evidence that a SARIMA model still needs seasonal AR or MA terms, even though the stationarizing transformation here is plain first differencing rather than seasonal differencing.
Two separate decisions, not one
This lesson separates two decisions that are easy to conflate: what differencing makes the series stationary (here, d=1 alone, decided by ADF and variance), and what a SARIMA model’s seasonal orders should be (still informed by the leftover seasonal autocorrelation, regardless of whether D ended up being 0 or 1). On Cyclepath these two decisions pointed to the same transformation. Here they do not, and Lesson 4 builds a SARIMA specification that uses d=1 for stationarity while still including seasonal terms to capture the autocorrelation this lesson found left over.
Practice Exercises
Exercise 1: Why does the combined differencing’s variance rise?
The combined d=1, D=1 transformation had a higher variance (0.00594) than d=1 alone (0.00375), similar to Module 3’s overdifferencing finding on Cyclepath. Is the mechanism the same here?
Hint
Partly the same general mechanism, differencing more than a series needs amplifies noise without removing additional real structure, but here there is an added, more specific cause: seasonal differencing across the regime-change boundary introduces its own distortion, as this lesson explained, so the combined transformation is not just “extra” differencing on top of an already-adequate fix, it is compounding a differencing operation that was measurably broken (failing ADF on its own) with one that worked. Both effects push the combined variance up.
Exercise 2: Would seasonal differencing work on just one half?
If you ran seasonal differencing separately on weeks 0 to 103 and weeks 104 to 207, would you expect it to pass the ADF test within each half, even though it failed on the whole series?
Hint
Within either half alone, the growth rate is constant (0.78% a week throughout the first half, 0.36% throughout the second), so a seasonal difference computed entirely within one half would not cross the regime-change boundary, and this lesson’s specific failure mechanism would not apply. You would likely see it perform much better within either half separately, direct evidence that the whole-series failure really is about the boundary between the two regimes, not something wrong with seasonal differencing as a technique in general.
Exercise 3: What would you tell someone who assumed Cyclepath’s recipe would transfer?
Someone forecasting Lantern & Vine assumes, without testing, that seasonal differencing alone will work here because it worked well on Cyclepath. What would you tell them?
Hint
You would show them this lesson’s numbers directly: seasonal differencing alone fails the ADF test outright on this series (p = 0.8451), decisively worse than assuming nothing changed. The broader point is that Module 3’s discipline, test the actual evidence rather than assume the recipe from a previous series still applies, is precisely what this lesson demonstrates the value of. A series with a real trend regime change needs its differencing choice re-verified, not inherited from a different series just because the same general toolkit produced a good answer there.
Summary
Working in log space, as Lesson 2’s multiplicative finding requires, raw log sales fail the ADF test (p = 0.3192). First differencing alone passes decisively with the lowest variance of any option tested (p = 0.0000, variance 0.00375). Seasonal differencing alone fails outright (p = 0.8451), the opposite of Cyclepath’s result, because it compares points across a regime-change boundary where the growth rate itself was different on either side. Combining both differences passes (p = 0.0000) but at nearly double the variance of first differencing alone. Even with the right differencing choice, first-differenced log sales still show a real autocorrelation of 0.154 at the seasonal lag (52) and -0.237 at half a cycle (26), evidence that SARIMA seasonal terms are still needed for modeling, independent of the differencing decision itself.
Key Concepts
- Log space for multiplicative series — stationarity testing follows the structure diagnosis; a multiplicative series’ differencing work happens on its log.
- The same test, a different winner — Module 3’s ADF-and-variance comparison applies unchanged, but Lantern & Vine’s answer (
d=1alone) is the opposite of Cyclepath’s (D=1alone). - A regime change specifically breaks seasonal differencing — comparing across a boundary where the growth rate itself changed leaves behind non-constant drift.
- Stationarity and seasonal modeling are separate decisions — leftover seasonal autocorrelation can justify a SARIMA seasonal term even when seasonal differencing was not the fix.
Why This Matters
This lesson is the clearest demonstration in the whole course that a diagnostic discipline, not a specific answer, is what actually transfers between series. Every method here, the ADF test, the variance comparison, the leftover-autocorrelation check, is identical to Module 3’s. The recipe it produces is not. Trusting the discipline over the memorized answer is exactly what separates someone who understands time series forecasting from someone who has only memorized what worked once. Next, Lesson 4 builds a SARIMA specification and a Holt-Winters model informed by everything this lesson found, and backtests both properly.
Next Steps
Continue to Lesson 4 - Fitting and Backtesting Candidate Models
Fit SARIMA and Holt-Winters on Lantern & Vine, and backtest both across several origins before trusting either.
Back to Module Overview
Return to the Capstone module overview
Continue Building Your Skills
You have stationarized Lantern & Vine correctly, in log space, with a differencing recipe that is the mirror image of Cyclepath’s, and you know a seasonal term is still needed even though seasonal differencing itself was not the answer. Next, you will fit real candidate models with these findings in hand, and backtest them the way Module 8 insists on, rather than trusting a single split.