Lesson 2 - Holt's Linear Trend

Welcome to Holt’s Linear Trend

Lesson 1 ended with simple exponential smoothing quietly giving up on Cyclepath: its fitted alpha went to 1, and its forecast became the naive baseline in disguise. The obvious next step is to give the model a trend to work with. Holt’s linear trend method does exactly that, adding a second smoothed component that tracks the slope of the series alongside its level. On a series that actually has a clean trend, this works well. On Cyclepath, it does not, and the way it fails is one of the most useful lessons in this module.

By the end of this lesson, you will be able to:

  • Write Holt’s level and trend update formulas
  • Fit Holt’s method with statsmodels and recover a known trend on a toy series
  • Explain why Holt’s method can mistake a seasonal swing for a real trend
  • Read an AIC-versus-forecast-accuracy disagreement as a warning, not just a curiosity

Let’s add the trend term.


The Holt Formulas

Holt’s method keeps two smoothed values instead of one: a level lt l_t and a trend bt b_t , each with its own smoothing parameter:

lt=αyt+(1α)(lt1+bt1) l_t = \alpha\, y_t + (1 - \alpha)(l_{t-1} + b_{t-1}) bt=β(ltlt1)+(1β)bt1 b_t = \beta\, (l_t - l_{t-1}) + (1 - \beta)\, b_{t-1} y^t+h=lt+hbt \hat{y}_{t+h} = l_t + h\, b_t

The level update looks like SES’s, except it compares the new observation against the previous level plus the previous trend, the best guess of where the series should be if the trend held. The trend update smooths the change in level from one step to the next, the same way the level smooths the series itself. The forecast is no longer flat: it is a straight line, extending the current level forward at the current trend’s slope for every step h h into the future.


Recovering a Known Trend

Build a toy series with a genuine linear trend, slope 2.5 per step, and small noise:

import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import Holt

rng = np.random.default_rng(3)
n = 60
t = np.arange(n)
toy_trend = pd.Series(100 + 2.5 * t + rng.normal(0, 4, n))

holt_toy = Holt(toy_trend, initialization_method="estimated").fit()
print(round(holt_toy.params["smoothing_level"], 3))   # 0.0
print(round(holt_toy.params["smoothing_trend"], 3))    # 0.0
print(np.round(holt_toy.forecast(5).values, 2))
# [250.05 252.56 255.06 257.56 260.07]

Both smoothing parameters go to 0, the same signature as Lesson 1’s pure-noise toy series: with a genuinely constant slope and only mild noise, there is nothing to adapt to, so the best fit is a single fixed line through the whole series, exactly what alpha = beta = 0 produces. The forecast climbs by about 2.5 per step (260.07250.05=10.02 260.07 - 250.05 = 10.02 over 4 steps, or 2.505 per step), matching the true slope almost exactly, and the first forecast value (250.05) lands close to the true final level of 100+2.5×59=247.5 100 + 2.5 \times 59 = 247.5 . On a series that actually behaves the way Holt’s method assumes, it recovers the truth cleanly.


Applying Holt’s Method to Cyclepath

Now fit the same model on Cyclepath’s training set:

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96)
    rng = np.random.default_rng(42)
    trend = 9000 + 90 * t
    seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
    noise = rng.normal(0, 350, 96)
    return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")

y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]

holt = Holt(train, initialization_method="estimated").fit()
print(round(holt.params["smoothing_level"], 4))   # 0.9856
print(round(holt.params["smoothing_trend"], 4))    # 0.9856
print(round(holt.level.iloc[-1], 1))                 # 13559.9
print(round(holt.trend.iloc[-1], 1))                 # -1091.5

Both smoothing parameters land at 0.9856, close to the maximum, meaning the model is trusting almost nothing but the most recent handful of points for both its level and its trend. The fitted trend at the end of training is -1091.5 per month, a steep decline. Look at what training ends on to see where that number comes from:

print(train.iloc[-8:].tolist())
# [17290, 18757, 19439, 18773, 17960, 16058, 14653, 13565]

Training ends in December 2022, at the tail of Cyclepath’s yearly cycle: ridership peaked at 19,439 around September, then descended steadily toward the winter trough at 13,565. That descent is exactly the shape a seasonal trough produces every single year, and it has nothing to do with a genuine long-run decline. But Holt’s method has no seasonal term. All it can see is that the last several months went sharply down, and with beta pushed to 0.9856, it treats that recent local slope as the trend and extrapolates it in a straight line:

fc = holt.forecast(12)
print(np.round(fc.values, 1))
# [12468.4, 11376.9, 10285.3, 9193.8, 8102.3, 7010.7, 5919.2, 4827.7, 3736.1, 2644.6, 1553.1, 461.5]

def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100
print(round(mape(test, fc), 2))   # 59.93

The forecast falls by roughly 1,091 every month, all the way down to 461.5 by December 2023, a bike-share ridership number that would be absurd for a series that actually averaged over 13,000 trips a month that same December a year earlier. The test MAPE is 59.93%, dramatically worse than Lesson 1’s SES result (18.99%) and worse than the naive baseline (19.0%) it was supposed to improve on.

A line chart of Cyclepath's training data ending at a steep seasonal descent from a September peak near 19,400 down to a December trough near 13,565. From that last point, a straight forecast line labeled 'Holt's extrapolated trend' continues the same steep downward slope for 12 more months, crossing below 1,000 by the end, while a second line labeled 'actual 2023' shows the real values rising and falling with the season, staying well above 12,000 throughout. The two lines diverge dramatically.
Holt's method reads the last few months of a normal seasonal descent as a real trend and extrapolates it in a straight line, crashing toward zero. The actual 2023 values follow the season instead, staying in the same range as every prior year.

Adding a component is not automatically an improvement

It would be easy to assume that giving a model more structure, a trend term where there was none before, can only help. This is the clearest counterexample so far in the course: Holt’s method has a real capability SES lacks, and on this series that capability actively hurts, because the model has no way to tell a genuine long-run trend apart from an ordinary seasonal dip. A new component only helps when the series actually needs that component and nothing is left for it to misinterpret. Lesson 3 adds the missing piece, a seasonal term, so the trend component finally has only the real trend left to track.


AIC Says Something Different

Check Holt’s AIC against SES’s from Lesson 1:

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

ses = SimpleExpSmoothing(train, initialization_method="estimated").fit()
print(round(ses.aic, 2))    # 1199.74
print(round(holt.aic, 2))    # 1147.01

Holt’s AIC (1147.01) is better (lower) than SES’s (1199.74), even though Holt’s test forecast is catastrophically worse. AIC measures how well a model fits the data it was trained on, adjusted for how many parameters it used, not how well it forecasts data it has never seen. Holt fits the training data’s ups and downs a little more closely by chasing recent moves aggressively, which improves its in-sample score, and that is precisely the behavior that destroys its forecast the moment the series turns a corner it has not seen yet. This is the same warning from Modules 5 and 6, now in its most extreme form yet: never trust a fit-quality score without also checking a genuine, honest, held-out forecast.


Practice Exercises

Exercise 1: Predict the toy series’ behavior extended

If the toy trending series from this lesson were extended another 60 points with the same true slope of 2.5 and similar noise, would you expect Holt’s fitted alpha and beta to change much?

Hint

No, you would expect them to stay close to 0. The series’ underlying structure has not changed, a constant slope with mild noise around it, so the same reasoning that pushed alpha and beta to 0 the first time still applies: there is nothing new to adapt to, and a single fixed line remains the best fit. More data would only make the fitted slope estimate more precise, not change the qualitative conclusion that smoothing is unnecessary here.

Exercise 2: Where would Holt’s method work well on a real series?

Name a kind of real series where Holt’s linear trend method, without a seasonal term, would likely perform well.

Hint

Any series with a genuine, evolving trend and no seasonality: for example, a company’s cumulative registered users over its first few years, which climbs steadily but does not repeat a yearly or weekly pattern, or a slowly rising temperature-adjusted energy efficiency metric measured monthly with no seasonal cycle. The failure in this lesson is specifically about seasonality being mistaken for trend. Remove the seasonality, and Holt’s method has exactly the structure the model assumes.

Exercise 3: Why did beta end up so close to alpha?

Holt’s fitted alpha and beta were both about 0.9856, essentially identical. Is that a coincidence?

Hint

Not really a coincidence, more a symptom of the same underlying problem. With no seasonal term, both the level and the trend are being asked to explain the same steep seasonal descent, and the optimizer’s best available response, for both parameters, is to weight the most recent few points as heavily as possible and discount everything older. Since the same local pattern is driving both updates, it is not surprising the optimizer pushes both parameters to a similar extreme rather than landing on two very different values.


Summary

Holt’s linear trend method adds a smoothed trend bt=β(ltlt1)+(1β)bt1 b_t = \beta(l_t - l_{t-1}) + (1-\beta) b_{t-1} alongside the level, forecasting y^t+h=lt+hbt \hat{y}_{t+h} = l_t + h\, b_t , a straight line rather than SES’s flat one. On a toy series with a genuine linear trend, it recovered the true slope almost exactly, with both smoothing parameters at 0. On Cyclepath, both parameters were pushed to 0.9856, and the model read the ordinary seasonal descent from a September peak (19,439) to a December trough (13,565) as a real trend, extrapolating a straight-line crash to 461.5 by the end of the test year, a 59.93% MAPE, far worse than the naive baseline. Its AIC (1147.01) was nevertheless better than SES’s (1199.74), an extreme demonstration that AIC and out-of-sample accuracy can point in opposite directions.

Key Concepts

  • Holt’s trend updatebt=β(ltlt1)+(1β)bt1 b_t = \beta(l_t - l_{t-1}) + (1-\beta) b_{t-1} ; forecasts extend the level in a straight line at the current slope.
  • Trend versus seasonal dip — with no seasonal term, a model can mistake an ordinary seasonal decline for a genuine long-run trend.
  • More structure is not automatically better — a component that does not match what the series actually needs can make forecasts worse, not better.
  • AIC is not a forecast guarantee — a better in-sample fit score can coexist with a much worse held-out forecast.

Why This Matters

This lesson’s failure is more instructive than a clean success would have been. It shows precisely why SARIMA needed seasonal terms in Module 6, and it previews exactly what exponential smoothing needs next: a component that absorbs the seasonal pattern, so the trend component is left to track only genuine long-run movement. It also reinforces, in the starkest terms this course has shown, why a fit-quality score like AIC can never substitute for checking a real forecast against real held-out data. Next, Holt-Winters adds the seasonal term, and the trend component finally gets to do the job it was designed for.


Next Steps

Continue to Lesson 3 - Holt-Winters Seasonal

Add a seasonal component to exponential smoothing, and watch the level, trend, and season converge on the truth.

Back to Module Overview

Return to the Exponential Smoothing module overview


Continue Building Your Skills

You have now seen exponential smoothing fail twice, once quietly (SES reducing to naive) and once dramatically (Holt’s method crashing toward zero), and both failures trace back to the same missing piece: a seasonal component. Next, Holt-Winters adds exactly that, and the change in behavior is not subtle.

Sponsor

Keep DATATWEETS free. Help fund practical data, AI, and engineering lessons for learners worldwide.

Buy Me a Coffee at ko-fi.com