Lesson 5 - Guided Project: Holt-Winters vs. SARIMA vs. the Baselines
Welcome to the Guided Project
Every prior module in this course built toward one number: the seasonal-naive baseline’s 5.9% MAPE from Module 1. Module 6 beat it with SARIMA, at 1.06%. This module built Holt-Winters, a completely different classical family, and Lesson 3 showed it beating the same baseline too. This capstone puts every model from the whole course on one table, forecasts a real future year with both surviving contenders, and asks the question a working forecaster actually needs answered: when do you reach for Holt-Winters, and when for SARIMA?
By the end of this project, you will be able to:
- Score Holt-Winters against every baseline this course has built
- Compare Holt-Winters and SARIMA directly, on accuracy and on what each model requires from you
- Retrain both models on the full series and cross-check their forward forecasts
- Give a reasoned answer for which classical family to reach for, and when
Let’s finish the comparison.
Stage 1: Rebuild and Refit
import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import ExponentialSmoothing
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96)
rng = np.random.default_rng(42)
trend = 9000 + 90 * t
seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
noise = rng.normal(0, 350, 96)
return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")
y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]
hw = ExponentialSmoothing(
train, trend="add", seasonal="add", seasonal_periods=12,
initialization_method="estimated",
).fit()
print(round(hw.aic, 2)) # 960.57The same model Lesson 3 built and Lesson 4 confirmed as the best configuration: additive trend, additive season, AIC 960.57.
Stage 2: Score Against Every Baseline in the Course
def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100
def mae(a, f): return np.mean(np.abs(a - f))
naive = pd.Series(train.iloc[-1], index=test.index)
seasonal_naive = pd.Series(train.iloc[-12:].values, index=test.index)
hw_fc = hw.forecast(12)
print(f"naive {mape(test, naive):.1f}%") # 19.0%
print(f"seasonal-naive {mape(test, seasonal_naive):.1f}%") # 5.9%
print(f"best ARIMA 16.0%") # Module 5
print(f"Holt-Winters {mape(test, hw_fc):.2f}%") # 1.57%
print(f"SARIMA 1.06%") # Module 6naive 19.0%
seasonal-naive 5.9%
best ARIMA 16.0%
Holt-Winters 1.57%
SARIMA 1.06%Holt-Winters lands at 1.57% MAPE, with a mean error of 247.9 trips per month against the seasonal-naive baseline’s 997.8, a 3.76 times improvement over the bar that has stood since Module 1. It comfortably beats every non-seasonal model in the course. It does not, however, beat SARIMA, which forecasts about 1.48 times more accurately (1.06% versus 1.57%).
Stage 3: Holt-Winters vs. SARIMA, Beyond the Number
A 1.48 times accuracy gap in SARIMA’s favor is real, but it is not the only thing worth weighing. The two models ask different amounts of you:
- Holt-Winters needs one function call with a trend type and a seasonal type. It has no order to select by reading ACF and PACF plots, no differencing decision, and its fitted parameters (alpha, beta, gamma) have a direct, intuitive reading: how much the level, trend, and season are each still adapting.
- SARIMA needs six orders chosen deliberately, informed by Module 4’s autocorrelation reading and Module 3’s stationarity work, and its coefficients (
ar.L1,ar.S.L12) require you to already understand autoregression to interpret. - SARIMA comes with the Ljung-Box test and the built-in normality and heteroskedasticity checks from Module 6, a formal way to certify the residuals are white noise. Holt-Winters has no equivalent built-in diagnostic suite in
statsmodels; you would check its residuals by hand, the same tools from Modules 3 and 4.
On this specific series, SARIMA’s explicit autoregressive terms capture slightly more of the remaining structure than Holt-Winters’ smoothing can. That will not always be true. A series with a less stable seasonal pattern, or one where you do not have the time or the ACF/PACF fluency to select SARIMA’s orders carefully, is exactly where Holt-Winters’ simplicity is the more practical choice, even at a small accuracy cost.
Two families, one discipline
Whichever family you reach for, the discipline this course has built is the same: establish a baseline first, prove a model actually earns its complexity against that baseline, validate on genuinely held-out data, and check the residuals before trusting a forecast. Holt-Winters and SARIMA arrived at their results through completely different mechanics, exponential decay of past values versus explicit autoregression, but both were only trustworthy because they were held to that same standard.
Stage 4: Forecast the Future, Both Ways
Retrain Holt-Winters on the full 96 months and forecast 2024, the same exercise Module 6 ran for SARIMA:
production = ExponentialSmoothing(
y, trend="add", seasonal="add", seasonal_periods=12,
initialization_method="estimated",
).fit()
fc2024 = production.forecast(12)
fc2024.index = pd.date_range("2024-01-01", periods=12, freq="MS")
print(round(fc2024.iloc[0], 1)) # 14237.8 <- January
print(round(fc2024.iloc[6], 1)) # 21450.9 <- July
print(round(fc2024.sum(), 1)) # 216823.9 <- full-year totalHolt-Winters forecasts a January low of 14,237.8, a July peak of 21,450.9, and a full-year total of 216,823.9 trips. Module 6’s SARIMA forecast for the same year: January 13,800, July 21,346, total 213,839. The two totals differ by about 1.4%, close enough that two models built on entirely different mechanics, one smoothing the past exponentially, one modeling autocorrelation directly, agree on the shape and rough size of a year that has not happened. That agreement is itself evidence: when independently built models converge on a similar answer, it is a much stronger signal than either model’s internal confidence alone.
Stage 5: The Takeaway
Step back and see what this project, and the last two modules together, produced:
- A second validated family — Holt-Winters, built from level, trend, and seasonal smoothing, reaches 1.57% MAPE, beating the seasonal-naive baseline 3.76 times over, through completely different mechanics than SARIMA.
- An honest ranking — SARIMA remains the more accurate model on this series (1.06% versus 1.57%), a real difference worth reporting, not a reason to dismiss Holt-Winters.
- Two independent forecasts of the same future, agreeing within about 1.4% of each other on 2024’s total, each retrained on all the data and validated before being trusted.
The wider arc, across Modules 5, 6, and 7: a series can be forecast well by more than one kind of classical model, and the right one to reach for depends on the series, on how much time you have to tune orders by hand, and on how much you need a formal diagnostic suite versus a simpler, more interpretable fit. Both paths, done honestly, with a baseline, a held-out test, and a checked residual, earn their place.
Practice Exercises
Exercise 1: When would you reach for Holt-Winters first?
Describe a realistic forecasting situation where you would try Holt-Winters before attempting a SARIMA.
Hint
Any situation where you need a result quickly, are forecasting many series at once and cannot hand-tune SARIMA orders for each one, or are working with a team that needs to understand and trust the model without a background in autoregression. Holt-Winters’ one-line fit and its intuitive alpha, beta, gamma parameters make it a strong default for exactly these cases, even knowing SARIMA might do slightly better with enough dedicated attention on any single series.
Exercise 2: Interpreting the 1.4% forecast agreement
Two very differently built models landed within about 1.4% of each other on a genuine forward forecast. Does that prove the forecast is accurate?
Hint
Not on its own. Agreement between two models tells you they are both picking up on the same underlying pattern in the historical data, which is reassuring, but both models were fit on the same series and could in principle share the same blind spot, for instance if some real-world change in 2024 breaks the pattern both models learned from 2016 through 2023. Agreement raises confidence, but the only real proof would be waiting for 2024’s actual data and checking both forecasts against it, the same honest validation this course has insisted on throughout.
Exercise 3: A series where the ranking might flip
Describe a kind of series where you would expect Holt-Winters to outperform SARIMA, rather than the other way around, as it did here.
Hint
A series whose seasonal pattern genuinely drifts or evolves over time, rather than repeating identically every cycle the way Cyclepath was built to, would likely favor Holt-Winters, since its seasonal smoothing parameter can adapt the seasonal shape gradually, while SARIMA’s seasonal terms assume a fixed relationship at the seasonal lag. A short series, with too few years of data for SARIMA’s seasonal differencing to have much to work with, might also favor Holt-Winters’ more direct, less data-hungry smoothing approach.
Summary
Holt-Winters scored 1.57% MAPE against the full lineage of course baselines: naive (19.0%), best non-seasonal ARIMA (16.0%), and the seasonal-naive bar (5.9%), beating the baseline 3.76 times over with a mean error of 247.9 trips against the baseline’s 997.8. SARIMA remains the more accurate model on this series (1.06% MAPE), a real 1.48 times edge, but Holt-Winters asks far less of the practitioner: no orders to select, no ACF or PACF to read, an intuitive set of three parameters. Retrained on the full 96 months, Holt-Winters forecast 2024 at a total of 216,823.9 trips, within about 1.4% of SARIMA’s independently built 213,839, two different classical families agreeing on the shape of a year neither had seen.
Key Concepts
- Holt-Winters vs. SARIMA — different mechanics, both capable of beating the seasonal-naive baseline decisively; SARIMA edged out Holt-Winters here, by 1.48 times.
- Accuracy is not the only cost — SARIMA’s edge comes with order selection, differencing decisions, and coefficients that need autoregressive fluency to read.
- Cross-model agreement as evidence — two independently built forecasts landing close together strengthens, but does not prove, confidence in a forward prediction.
- Choose the family to fit the situation — reach for Holt-Winters when speed, simplicity, or scale across many series matters; reach for SARIMA when you have the time to tune it and need its formal diagnostics.
Why This Matters
This is the second time in the course a model has beaten the seasonal-naive baseline decisively, and the fact that it happened through an entirely different mechanism than SARIMA is the real lesson: there is rarely exactly one right classical model for a series, and the discipline you have built across this whole course, baseline first, prove complexity earns its place, validate honestly, check the residuals, applies just as fully to a smoothing model as to an autoregressive one. You now have two complete, trustworthy toolkits, and the judgment to choose between them based on the situation in front of you, not just habit.
Next Steps
Continue to Module 8 - Evaluation and Backtesting
Why one train/test split is not enough, walk-forward validation, and a backtest that overturns this lesson's SARIMA-vs-Holt-Winters ranking.
Back to Module Overview
Return to the Exponential Smoothing module overview
Continue Building Your Skills
You now have two complete classical forecasting families, exponential smoothing and the ARIMA family, both validated against the same honest baseline and both capable of forecasting Cyclepath’s future with real accuracy. Every model so far has been checked with a single train/test split. Module 8 asks a harder question: is one twelve-month split enough to trust a model at all, or does honest evaluation need something more thorough?