Lesson 4 - Fitting, Comparing, and Choosing a Configuration

Welcome to Fitting, Comparing, and Choosing a Configuration

This module has built up three models in sequence: simple exponential smoothing, which quietly reduced to the naive forecast; Holt’s linear trend, which crashed badly by mistaking a seasonal dip for a real decline; and Holt-Winters, which finally recovered the series’ true shape and beat the baseline. This lesson steps back, lays all of them side by side on the same two numbers, AIC and test MAPE, and adds one more configuration option: a damped trend.

By the end of this lesson, you will be able to:

Explain what a damped trend does and why Cyclepath does not need one
Read a full comparison table across every model this module has built
Point to the single clearest case in this course of AIC and forecast accuracy disagreeing
Justify choosing one final configuration using more than one kind of evidence

Let’s put every model on the same table.

Damped Trend: A Fourth Option

A damped trend shrinks the trend’s influence the further out you forecast, instead of extending it in a perfectly straight line forever. A damping parameter $\phi$ , between 0 and 1, controls how quickly the trend fades:

\hat{y}_{t+h} = l_t + \left(\sum_{i=1}^{h} \phi^i\right) b_t + s_{t+h-s}

When $\phi = 1$ , the sum simplifies back to $h \times b_t$ , exactly Lesson 3’s undamped forecast. When $\phi$ is meaningfully below 1, each additional step ahead adds a little less trend than the one before, so a long-horizon forecast levels off rather than climbing or falling forever. Fit it on Cyclepath:

import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import ExponentialSmoothing

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96)
    rng = np.random.default_rng(42)
    trend = 9000 + 90 * t
    seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
    noise = rng.normal(0, 350, 96)
    return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")

y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]
def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100

hw_damped = ExponentialSmoothing(
    train, trend="add", damped_trend=True, seasonal="add", seasonal_periods=12,
    initialization_method="estimated",
).fit()

print(round(hw_damped.params["damping_trend"], 4))   # 0.995
print(round(mape(test, hw_damped.forecast(12)), 2))    # 1.47
print(round(hw_damped.aic, 2))                          # 980.1

The fitted damping parameter is 0.995, barely below 1, essentially undamped. That is exactly what you would expect: Cyclepath’s true trend, built with a fixed slope of 90 per month, never decelerates, so the optimizer correctly finds almost no damping is warranted. The damped model’s test MAPE (1.47%) is slightly better than Lesson 3’s non-damped additive model (1.57%), but its AIC (980.1) is clearly worse than the non-damped model’s (960.57), because the extra damping parameter costs a complexity penalty it barely earns back. A tiny, almost-1 damping value on a real linear trend is close to the same signal Modules 1 through 3 kept surfacing: extra flexibility that the data does not actually need shows up as a worse AIC even when it nudges test error slightly in its favor.

The Full Comparison

from statsmodels.tsa.holtwinters import SimpleExpSmoothing, Holt

models = {
    "SES": SimpleExpSmoothing(train, initialization_method="estimated").fit(),
    "Holt (trend, no season)": Holt(train, initialization_method="estimated").fit(),
    "HW additive": ExponentialSmoothing(
        train, trend="add", seasonal="add", seasonal_periods=12,
        initialization_method="estimated").fit(),
    "HW multiplicative": ExponentialSmoothing(
        train, trend="add", seasonal="mul", seasonal_periods=12,
        initialization_method="estimated").fit(),
    "HW damped additive": hw_damped,
}

for name, m in models.items():
    fc = m.forecast(12)
    print(f"{name:28s} AIC={m.aic:8.2f}  MAPE={mape(test, fc):6.2f}%")

SES                          AIC= 1199.74  MAPE= 18.99%
Holt (trend, no season)      AIC= 1147.01  MAPE= 59.93%
HW additive                  AIC=  960.57  MAPE=  1.57%
HW multiplicative            AIC= 1028.83  MAPE=  1.32%
HW damped additive           AIC=  980.10  MAPE=  1.47%

Read the first two rows together, because they are the single starkest disagreement between AIC and forecast accuracy anywhere in this course. Holt’s method has the better AIC (1147.01 versus SES’s 1199.74), suggesting it fits the training data a little better once its extra parameter is accounted for. But Holt’s test MAPE is more than three times worse (59.93% versus 18.99%). A model that scores better on the metric that only looks backward can still be dramatically worse on the metric that actually matters, forecasting data it has never seen. If you compared these two models by AIC alone, you would deploy the one that crashes.

Five models, two scores each. Holt's row is the clearest AIC-versus-accuracy contradiction in the course. The three Holt-Winters variants all beat the baseline convincingly; additive wins on AIC and matches the series' known structure.

Reading a Fitted Parameter

Across this module, a fitted smoothing parameter has meant one of three things every time:

Near 0: the optimizer found a single fixed value already fits well, and updating over time adds nothing. Lesson 1’s pure-noise toy series, Lesson 2’s clean toy trend, and all three of Holt-Winters’ parameters on Cyclepath landed here.
Near 1: the optimizer found almost no long-run structure worth trusting, and the best strategy is to chase the most recent data as closely as possible. Lesson 1’s SES on Cyclepath (alpha = 1.0) and Lesson 2’s Holt on Cyclepath (alpha = beta = 0.9856) both landed here, for two different reasons: SES because it had no trend term to use instead, Holt because it had no seasonal term and mistook a seasonal dip for real movement.
Somewhere in between: the optimizer found genuine, gradually evolving structure worth adapting to. Lesson 1’s drifting toy series (alpha = 0.5513) landed here.

A fitted parameter is never just a number to report. It is a direct readout of how much real, trackable change the model found in the series, and reading it correctly is what lets you notice a Lesson 2-style failure before you trust its forecast.

Choosing a final configuration

For Cyclepath, the additive, non-damped Holt-Winters model is the right choice on every count that matters: the best AIC (960.57) of any model in this comparison, a test MAPE (1.57%) close to the best of any variant, and a seasonal structure that matches Module 2’s independent, structural confirmation that Cyclepath is additive. The multiplicative variant’s slightly better MAPE (1.32%) and the damped variant’s slightly better MAPE (1.47%) are each bought with a worse AIC and no supporting structural evidence, which is not a good trade. When one configuration wins on AIC, wins or nearly wins on test error, and matches what you already know about the series, that is the configuration to carry forward.

Practice Exercises

Exercise 1: Spot the danger in Holt’s AIC

A colleague, working only from AIC scores and never looking at a test-set forecast, tells you Holt’s method is a better fit than SES for Cyclepath. What would you say?

Hint

You would point out that AIC alone cannot be trusted here: Holt’s method does have the better AIC (1147.01 versus 1199.74), but its test-set forecast crashes to 59.93% MAPE, more than three times worse than SES’s 18.99%. AIC measures in-sample fit adjusted for parameter count, not out-of-sample accuracy, and this pair of models is the clearest demonstration in the whole course that the two can point in completely opposite directions. Always check a genuine held-out forecast before trusting an AIC-based ranking.

Exercise 2: Interpret a damping parameter near 0.7

If a damped-trend model on a different series fit a damping parameter of 0.7, what would that suggest about the series, compared to Cyclepath’s fitted 0.995?

Hint

A damping parameter of 0.7 suggests the series’ trend genuinely decelerates over a fairly short horizon: whatever is driving growth or decline is running out of room relatively quickly, so extrapolating the current slope in a straight line would overshoot. Cyclepath’s 0.995 is nearly undamped, meaning its trend keeps going at essentially the same rate indefinitely, consistent with how it was actually built, a fixed slope of 90 per month with no deceleration term at all.

Exercise 3: Building the same table for a different series

You want to run this same five-model comparison on a new series with a strong trend but no seasonality. What would you expect to see differently, and what would you skip?

Hint

You would skip the Holt-Winters rows’ seasonal component entirely, since there is no season to model, meaning Holt’s method (or a damped Holt) become the natural top candidates rather than a step on the way to something better. You would also expect Holt’s method to behave well here, unlike on Cyclepath, since without a seasonal pattern to misinterpret, its trend term is finally tracking only genuine long-run movement, the situation Lesson 2’s Exercise 2 described.

Summary

Laying every model from this module on one table produced the clearest AIC-versus-accuracy disagreement anywhere in this course: Holt’s method scored a better AIC than SES (1147.01 versus 1199.74) while forecasting more than three times worse (59.93% MAPE versus 18.99%). A damped-trend Holt-Winters variant fit a damping parameter of 0.995, essentially undamped, confirming Cyclepath’s real trend never decelerates, and traded a worse AIC (980.10 versus 960.57) for a marginally better test MAPE (1.47% versus 1.57%), not a good trade. The additive, non-damped Holt-Winters model remains the best overall choice: best AIC among all five models, a test MAPE within a fraction of a percentage point of the best variant, and a seasonal structure backed by Module 2’s independent evidence.

Key Concepts

Damped trend — a damping parameter $\phi$ shrinks the trend’s contribution the further ahead you forecast; near 1 means essentially undamped.
AIC versus test accuracy — Holt’s better AIC alongside its much worse forecast is this course’s starkest example of the two disagreeing.
Reading a fitted parameter — near 0 means a fixed pattern fits, near 1 means no long-run structure was found, in between means genuine adaptation.
Choosing with more than one score — the best configuration wins on AIC, is competitive on test error, and matches independently known structure.

Why This Matters

Every model-selection decision in this course, from Module 5’s AIC-versus-test-error tension to Module 6’s airline-model comparison to this lesson’s Holt-versus-SES contradiction, teaches the same lesson from a different angle: no single number tells the whole story, and the discipline of checking several kinds of evidence together is what protects you from deploying a model that looks fine on paper and fails in production. With a clear winner chosen, the guided project puts Holt-Winters against SARIMA and every baseline from the entire course, side by side, for the final word on this series.

Next Steps

Continue to Lesson 5 - Guided Project: Holt-Winters vs. SARIMA vs. the Baselines

Put every model built across the whole course on one table and forecast a genuine future year.

Back to Module Overview

Return to the Exponential Smoothing module overview

Continue Building Your Skills

You have now compared every exponential smoothing configuration this module built, and settled on the additive Holt-Winters model with clear, multi-sided justification. The guided project takes that model and asks the biggest question left in the course: how does it compare to SARIMA, the other classical family you have already proven can beat the baseline decisively?

Previous lesson

Lesson 3 - Holt-Winters Seasonal

Next lesson

Lesson 5 - Guided Project: Holt-Winters vs. SARIMA vs. the Baselines

Courses

DATATWEETS

Title here

Lesson 4 - Fitting, Comparing, and Choosing a Configuration

Welcome to Fitting, Comparing, and Choosing a Configuration

Damped Trend: A Fourth Option

The Full Comparison

Reading a Fitted Parameter

Practice Exercises

Exercise 1: Spot the danger in Holt’s AIC

Exercise 2: Interpret a damping parameter near 0.7

Exercise 3: Building the same table for a different series

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 5 - Guided Project: Holt-Winters vs. SARIMA vs. the Baselines

Back to Module Overview

Continue Building Your Skills

Lesson 4 - Fitting, Comparing, and Choosing a Configuration

Welcome to Fitting, Comparing, and Choosing a Configuration#

Damped Trend: A Fourth Option#

The Full Comparison#

Reading a Fitted Parameter#

Practice Exercises#

Exercise 1: Spot the danger in Holt’s AIC#

Exercise 2: Interpret a damping parameter near 0.7#

Exercise 3: Building the same table for a different series#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 5 - Guided Project: Holt-Winters vs. SARIMA vs. the Baselines

Back to Module Overview

Continue Building Your Skills#

Welcome to Fitting, Comparing, and Choosing a Configuration

Damped Trend: A Fourth Option

The Full Comparison

Reading a Fitted Parameter

Practice Exercises

Exercise 1: Spot the danger in Holt’s AIC

Exercise 2: Interpret a damping parameter near 0.7

Exercise 3: Building the same table for a different series

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills