Lesson 4 - Fitting, Comparing, and Choosing a Configuration
Welcome to Fitting, Comparing, and Choosing a Configuration
This module has built up three models in sequence: simple exponential smoothing, which quietly reduced to the naive forecast; Holt’s linear trend, which crashed badly by mistaking a seasonal dip for a real decline; and Holt-Winters, which finally recovered the series’ true shape and beat the baseline. This lesson steps back, lays all of them side by side on the same two numbers, AIC and test MAPE, and adds one more configuration option: a damped trend.
By the end of this lesson, you will be able to:
- Explain what a damped trend does and why Cyclepath does not need one
- Read a full comparison table across every model this module has built
- Point to the single clearest case in this course of AIC and forecast accuracy disagreeing
- Justify choosing one final configuration using more than one kind of evidence
Let’s put every model on the same table.
Damped Trend: A Fourth Option
A damped trend shrinks the trend’s influence the further out you forecast, instead of extending it in a perfectly straight line forever. A damping parameter , between 0 and 1, controls how quickly the trend fades:
When , the sum simplifies back to , exactly Lesson 3’s undamped forecast. When is meaningfully below 1, each additional step ahead adds a little less trend than the one before, so a long-horizon forecast levels off rather than climbing or falling forever. Fit it on Cyclepath:
import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import ExponentialSmoothing
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96)
rng = np.random.default_rng(42)
trend = 9000 + 90 * t
seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
noise = rng.normal(0, 350, 96)
return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")
y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]
def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100
hw_damped = ExponentialSmoothing(
train, trend="add", damped_trend=True, seasonal="add", seasonal_periods=12,
initialization_method="estimated",
).fit()
print(round(hw_damped.params["damping_trend"], 4)) # 0.995
print(round(mape(test, hw_damped.forecast(12)), 2)) # 1.47
print(round(hw_damped.aic, 2)) # 980.1The fitted damping parameter is 0.995, barely below 1, essentially undamped. That is exactly what you would expect: Cyclepath’s true trend, built with a fixed slope of 90 per month, never decelerates, so the optimizer correctly finds almost no damping is warranted. The damped model’s test MAPE (1.47%) is slightly better than Lesson 3’s non-damped additive model (1.57%), but its AIC (980.1) is clearly worse than the non-damped model’s (960.57), because the extra damping parameter costs a complexity penalty it barely earns back. A tiny, almost-1 damping value on a real linear trend is close to the same signal Modules 1 through 3 kept surfacing: extra flexibility that the data does not actually need shows up as a worse AIC even when it nudges test error slightly in its favor.
The Full Comparison
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, Holt
models = {
"SES": SimpleExpSmoothing(train, initialization_method="estimated").fit(),
"Holt (trend, no season)": Holt(train, initialization_method="estimated").fit(),
"HW additive": ExponentialSmoothing(
train, trend="add", seasonal="add", seasonal_periods=12,
initialization_method="estimated").fit(),
"HW multiplicative": ExponentialSmoothing(
train, trend="add", seasonal="mul", seasonal_periods=12,
initialization_method="estimated").fit(),
"HW damped additive": hw_damped,
}
for name, m in models.items():
fc = m.forecast(12)
print(f"{name:28s} AIC={m.aic:8.2f} MAPE={mape(test, fc):6.2f}%")SES AIC= 1199.74 MAPE= 18.99%
Holt (trend, no season) AIC= 1147.01 MAPE= 59.93%
HW additive AIC= 960.57 MAPE= 1.57%
HW multiplicative AIC= 1028.83 MAPE= 1.32%
HW damped additive AIC= 980.10 MAPE= 1.47%Read the first two rows together, because they are the single starkest disagreement between AIC and forecast accuracy anywhere in this course. Holt’s method has the better AIC (1147.01 versus SES’s 1199.74), suggesting it fits the training data a little better once its extra parameter is accounted for. But Holt’s test MAPE is more than three times worse (59.93% versus 18.99%). A model that scores better on the metric that only looks backward can still be dramatically worse on the metric that actually matters, forecasting data it has never seen. If you compared these two models by AIC alone, you would deploy the one that crashes.
Reading a Fitted Parameter
Across this module, a fitted smoothing parameter has meant one of three things every time:
- Near 0: the optimizer found a single fixed value already fits well, and updating over time adds nothing. Lesson 1’s pure-noise toy series, Lesson 2’s clean toy trend, and all three of Holt-Winters’ parameters on Cyclepath landed here.
- Near 1: the optimizer found almost no long-run structure worth trusting, and the best strategy is to chase the most recent data as closely as possible. Lesson 1’s SES on Cyclepath (alpha = 1.0) and Lesson 2’s Holt on Cyclepath (alpha = beta = 0.9856) both landed here, for two different reasons: SES because it had no trend term to use instead, Holt because it had no seasonal term and mistook a seasonal dip for real movement.
- Somewhere in between: the optimizer found genuine, gradually evolving structure worth adapting to. Lesson 1’s drifting toy series (alpha = 0.5513) landed here.
A fitted parameter is never just a number to report. It is a direct readout of how much real, trackable change the model found in the series, and reading it correctly is what lets you notice a Lesson 2-style failure before you trust its forecast.
Choosing a final configuration
For Cyclepath, the additive, non-damped Holt-Winters model is the right choice on every count that matters: the best AIC (960.57) of any model in this comparison, a test MAPE (1.57%) close to the best of any variant, and a seasonal structure that matches Module 2’s independent, structural confirmation that Cyclepath is additive. The multiplicative variant’s slightly better MAPE (1.32%) and the damped variant’s slightly better MAPE (1.47%) are each bought with a worse AIC and no supporting structural evidence, which is not a good trade. When one configuration wins on AIC, wins or nearly wins on test error, and matches what you already know about the series, that is the configuration to carry forward.
Practice Exercises
Exercise 1: Spot the danger in Holt’s AIC
A colleague, working only from AIC scores and never looking at a test-set forecast, tells you Holt’s method is a better fit than SES for Cyclepath. What would you say?
Hint
You would point out that AIC alone cannot be trusted here: Holt’s method does have the better AIC (1147.01 versus 1199.74), but its test-set forecast crashes to 59.93% MAPE, more than three times worse than SES’s 18.99%. AIC measures in-sample fit adjusted for parameter count, not out-of-sample accuracy, and this pair of models is the clearest demonstration in the whole course that the two can point in completely opposite directions. Always check a genuine held-out forecast before trusting an AIC-based ranking.
Exercise 2: Interpret a damping parameter near 0.7
If a damped-trend model on a different series fit a damping parameter of 0.7, what would that suggest about the series, compared to Cyclepath’s fitted 0.995?
Hint
A damping parameter of 0.7 suggests the series’ trend genuinely decelerates over a fairly short horizon: whatever is driving growth or decline is running out of room relatively quickly, so extrapolating the current slope in a straight line would overshoot. Cyclepath’s 0.995 is nearly undamped, meaning its trend keeps going at essentially the same rate indefinitely, consistent with how it was actually built, a fixed slope of 90 per month with no deceleration term at all.
Exercise 3: Building the same table for a different series
You want to run this same five-model comparison on a new series with a strong trend but no seasonality. What would you expect to see differently, and what would you skip?
Hint
You would skip the Holt-Winters rows’ seasonal component entirely, since there is no season to model, meaning Holt’s method (or a damped Holt) become the natural top candidates rather than a step on the way to something better. You would also expect Holt’s method to behave well here, unlike on Cyclepath, since without a seasonal pattern to misinterpret, its trend term is finally tracking only genuine long-run movement, the situation Lesson 2’s Exercise 2 described.
Summary
Laying every model from this module on one table produced the clearest AIC-versus-accuracy disagreement anywhere in this course: Holt’s method scored a better AIC than SES (1147.01 versus 1199.74) while forecasting more than three times worse (59.93% MAPE versus 18.99%). A damped-trend Holt-Winters variant fit a damping parameter of 0.995, essentially undamped, confirming Cyclepath’s real trend never decelerates, and traded a worse AIC (980.10 versus 960.57) for a marginally better test MAPE (1.47% versus 1.57%), not a good trade. The additive, non-damped Holt-Winters model remains the best overall choice: best AIC among all five models, a test MAPE within a fraction of a percentage point of the best variant, and a seasonal structure backed by Module 2’s independent evidence.
Key Concepts
- Damped trend — a damping parameter shrinks the trend’s contribution the further ahead you forecast; near 1 means essentially undamped.
- AIC versus test accuracy — Holt’s better AIC alongside its much worse forecast is this course’s starkest example of the two disagreeing.
- Reading a fitted parameter — near 0 means a fixed pattern fits, near 1 means no long-run structure was found, in between means genuine adaptation.
- Choosing with more than one score — the best configuration wins on AIC, is competitive on test error, and matches independently known structure.
Why This Matters
Every model-selection decision in this course, from Module 5’s AIC-versus-test-error tension to Module 6’s airline-model comparison to this lesson’s Holt-versus-SES contradiction, teaches the same lesson from a different angle: no single number tells the whole story, and the discipline of checking several kinds of evidence together is what protects you from deploying a model that looks fine on paper and fails in production. With a clear winner chosen, the guided project puts Holt-Winters against SARIMA and every baseline from the entire course, side by side, for the final word on this series.
Next Steps
Continue to Lesson 5 - Guided Project: Holt-Winters vs. SARIMA vs. the Baselines
Put every model built across the whole course on one table and forecast a genuine future year.
Back to Module Overview
Return to the Exponential Smoothing module overview
Continue Building Your Skills
You have now compared every exponential smoothing configuration this module built, and settled on the additive Holt-Winters model with clear, multi-sided justification. The guided project takes that model and asks the biggest question left in the course: how does it compare to SARIMA, the other classical family you have already proven can beat the baseline decisively?