Lesson 3 - Holt-Winters Seasonal
Welcome to Holt-Winters Seasonal
Lesson 2 ended with a warning: Holt’s method, missing a seasonal term, mistook an ordinary yearly dip for a collapsing trend and crashed its forecast toward zero. Holt-Winters seasonal smoothing adds exactly the component that was missing, a third smoothed value that tracks the repeating seasonal pattern, leaving the level and trend to track only the genuine long-run movement. This is the model the whole module has been building toward.
By the end of this lesson, you will be able to:
- Write the Holt-Winters additive seasonal update formulas
- Fit Holt-Winters on Cyclepath and read what its smoothing parameters and initial values reveal
- Compare an additive and a multiplicative seasonal component using both AIC and test error
- Connect Holt-Winters’ recovered trend and season back to Module 2’s decomposition
Let’s add the missing piece.
The Holt-Winters Additive Formulas
Holt-Winters keeps three smoothed components: level , trend , and a set of seasonal indices , one for each point in the cycle, with season length (12 for Cyclepath):
The level update now removes the seasonal effect before comparing against the trend-adjusted previous level, exactly the same idea as Module 2’s classical decomposition, which subtracted a seasonal index before estimating the trend. The seasonal update compares the observation against the current level, so a repeated seasonal shape reinforces the same twelve indices year after year instead of leaking into the trend the way it did in Lesson 2. This is an additive season: the seasonal indices are added to the level, a fixed amount at each point in the cycle, exactly Module 2’s confirmed structure for Cyclepath.
Fitting Holt-Winters on Cyclepath
import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import ExponentialSmoothing
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96)
rng = np.random.default_rng(42)
trend = 9000 + 90 * t
seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
noise = rng.normal(0, 350, 96)
return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")
y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]
hw = ExponentialSmoothing(
train, trend="add", seasonal="add", seasonal_periods=12,
initialization_method="estimated",
).fit()
print({k: float(round(v, 4)) for k, v in hw.params.items()
if k in ("smoothing_level", "smoothing_trend", "smoothing_seasonal")})
# {'smoothing_level': 0.0, 'smoothing_trend': 0.0, 'smoothing_seasonal': 0.0}All three smoothing parameters optimize to exactly 0. Unlike Lesson 2’s crash, this is a good sign here, not a bad one: it means a single fixed level, trend, and seasonal pattern, estimated once from the whole training set and never updated month to month, already fits almost perfectly. That makes sense given how Cyclepath was built: a constant trend slope and an identical seasonal wave every year, plus independent noise. There is no genuine drift in the underlying pattern for the model to adapt to, so the optimizer correctly finds that adapting adds nothing.
Look at what those fixed initial values actually are:
print(round(hw.params["initial_level"], 1)) # 8973.8
print(round(hw.params["initial_trend"], 1)) # 89.4
print(np.round(hw.params["initial_seasons"], 1))
# [-3347.5 -2857.2 -1565.7 109.2 1479.4 2698.6 3275. 2635. 1583.9 -112.1 -1582.7 -2736.3]The initial level (8,973.8) and trend (89.4) sit close to Cyclepath’s true generative values, 9,000 and 90 per month. And the twelve seasonal indices closely match the classical decomposition from Module 2, which found a July peak of +3,307.3 and a January trough of -3,399.8 by explicitly averaging detrended values by calendar month. Holt-Winters found the same shape, peak of +3,275.0 in July, trough of -3,347.5 in January, through smoothing rather than decomposition. Two different methods, built on different logic, landing on nearly the same answer, is strong evidence both are reading the real structure in the data rather than an artifact of either method.
Scoring the Forecast
def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100
fc = hw.forecast(12)
print(round(mape(test, fc), 2)) # 1.57
print(round(hw.aic, 2)) # 960.57A 1.57% MAPE, against the seasonal-naive baseline’s 5.9% from Module 1, more than three and a half times better, and against Lesson 2’s crashed 59.93%, an entirely different model. Check the residuals directly:
resid = hw.resid
print(round(resid.mean(), 2)) # -0.0
print(round(resid.std(), 2)) # 252.94A residual mean of essentially 0 and a standard deviation of 252.94, in the same range as Module 2’s classical decomposition residual (229.2) and STL’s (215.25). Trend, level, and season together have absorbed almost everything predictable, leaving a residual that looks like noise, the same standard this course has held decomposition and ARIMA-family models to throughout.
Additive vs. Multiplicative
Holt-Winters supports a multiplicative season too, where the seasonal effect scales with the level rather than adding a fixed amount, the structure Module 2 tested for and ruled out for Cyclepath:
hw_mul = ExponentialSmoothing(
train, trend="add", seasonal="mul", seasonal_periods=12,
initialization_method="estimated",
).fit()
print(round(mape(test, hw_mul.forecast(12)), 2)) # 1.32
print(round(hw_mul.aic, 2)) # 1028.83The multiplicative version scores a slightly better test MAPE (1.32% versus the additive model’s 1.57%) but a clearly worse AIC (1028.83 versus 960.57). This is the same tension Modules 5 and 6 raised between AIC and out-of-sample error, now with a twist: you also already have independent, structural evidence from Module 2 that Cyclepath’s seasonal swing stays a constant absolute size rather than scaling with the level. That evidence, plus the additive model’s better AIC, is a more reliable basis for choosing than one test year’s slightly lower MAPE, which could easily be this particular year’s luck rather than a real difference. The additive model is the right choice here, not because it happened to win on every single number, but because it matches what you independently know to be true about the series.
Agreement with independent evidence beats a single test-year edge
When two reasonable models are close on test error, the tiebreaker should not just be “which number is smaller this one time.” Module 2’s swing-to-level test gave independent, structural evidence that Cyclepath is additive, arrived at without looking at any held-out year at all. When the additive model also wins on AIC, that is two separate pieces of evidence pointing the same way, against one test year’s narrow MAPE edge for the alternative. Prefer the model backed by more independent evidence, not just the one with the lowest number on the metric you happen to be looking at.
Practice Exercises
Exercise 1: Why did all three parameters go to zero?
All three Holt-Winters smoothing parameters optimized to exactly 0 on Cyclepath. What would it mean if, instead, gamma alone had optimized to something like 0.4?
Hint
A nonzero gamma would mean the seasonal shape itself is genuinely changing over time, so the model benefits from updating its seasonal indices as new years arrive rather than locking them in from the start. Since Cyclepath’s true seasonal wave is identical every year by construction, there is nothing for gamma to adapt to, which is exactly why it optimized to 0. A real series whose seasonal peak gradually shifts or grows would instead show a meaningfully nonzero gamma.
Exercise 2: Interpreting the residual standard deviation
Holt-Winters’ residual standard deviation (252.94) is close to, but not identical to, Module 2’s classical decomposition residual (229.2) and STL’s (215.25). Should that small difference concern you?
Hint
No. The three methods, classical decomposition, STL, and Holt-Winters, estimate trend and season in different ways, using different amounts of the data and different smoothing assumptions, so small differences in how much noise is left over are expected rather than alarming. What matters is that all three land in the same general range (roughly 210 to 255) and all three residuals look like unstructured noise, which is the real agreement to pay attention to, not matching to the decimal.
Exercise 3: Choosing additive versus multiplicative on a new series
You are given a new series and told nothing about its structure. How would you decide between an additive and a multiplicative Holt-Winters season, using tools from this course?
Hint
Run Module 2’s swing-to-level test first: group by year, compute the peak-to-trough swing and the average level, and see whether the ratio stays flat (multiplicative signature) or drifts while the raw swing stays constant (additive signature). Then fit both Holt-Winters variants and compare AIC. If the structural test and the AIC agree, that is a confident choice. If they disagree, as this lesson’s multiplicative variant briefly seemed to (better test MAPE, worse AIC and against the structural evidence), trust the structural test and the AIC over one held-out year’s narrow accuracy edge.
Summary
Holt-Winters seasonal smoothing adds a seasonal component alongside Holt’s level and trend, forecasting . Fit on Cyclepath with an additive season, all three smoothing parameters optimized to 0, meaning a single fixed level (8,973.8), trend (89.4), and seasonal pattern already fit almost perfectly, both numbers closely matching Cyclepath’s true generative values and Module 2’s classical decomposition results. The test MAPE was 1.57%, more than three and a half times better than the 5.9% seasonal-naive baseline. A multiplicative variant scored a slightly better test MAPE (1.32%) but a clearly worse AIC (1028.83 versus 960.57), and Module 2’s independent structural evidence agrees with the additive model, making it the right choice despite the narrow test-year edge for the alternative.
Key Concepts
- Holt-Winters seasonal update — ; the level update subtracts the seasonal effect before smoothing.
- All parameters at zero — a strong fit, not a failure, when the underlying pattern genuinely does not change over time.
- Independent recovery — Holt-Winters and classical decomposition, built on different logic, agreeing on trend and season is strong evidence both are real.
- Structural evidence over one test year — prefer the model matching known structure (Module 2) and AIC, not just the narrowest test-year MAPE win.
Why This Matters
This lesson is the payoff for the whole module’s build-up: SES’s flat naive-equivalent forecast and Holt’s crash both trace back to a missing seasonal term, and adding it here produces a model that recovers the series’ true shape from two independent angles and comfortably beats the baseline that has stood since Module 1. The additive-versus-multiplicative choice also reinforces a habit that runs through this entire course: when metrics disagree, reach for independent, structural evidence rather than trusting whichever number happens to be smaller. Next, Lesson 4 steps back to compare every model built in this module side by side, and takes a closer look at what a fitted parameter and an AIC score can and cannot tell you.
Next Steps
Continue to Lesson 4 - Fitting, Comparing, and Choosing a Configuration
Compare every exponential smoothing model built so far on AIC and test error, and see the starkest AIC-versus-accuracy gap yet.
Back to Module Overview
Return to the Exponential Smoothing module overview
Continue Building Your Skills
You now have a Holt-Winters model that beats the seasonal-naive baseline by more than three and a half times, and independently recovers almost the same trend and seasonal shape Module 2 found by decomposition. Next, you will lay every model from this module side by side, SES, Holt, and both Holt-Winters variants, and use their AIC scores and test errors together to justify picking one configuration over the others.