Lesson 1 - Simple Exponential Smoothing

Welcome to Simple Exponential Smoothing

Modules 5 and 6 built one whole forecasting family around autocorrelation: AR, MA, ARMA, ARIMA, and SARIMA. This module builds a second, different family. Exponential smoothing does not model autocorrelation directly. Instead, it forecasts with a weighted average of past values, where recent values get more weight and older ones get exponentially less. Simple exponential smoothing (SES) is the simplest member: it assumes the series has a level that can shift, but no trend and no season.

By the end of this lesson, you will be able to:

Write the SES update formula and explain what the smoothing parameter alpha controls
Compute a few steps of SES by hand
Fit SES with statsmodels and read the fitted alpha
Explain why SES fit to a trending series reduces to the naive forecast

Let’s build the simplest smoother first.

The SES Formula

SES keeps a single running estimate of the series’ level, called $l_t$ . Each new observation nudges that level a little, and the forecast for every future point is just the most recent level:

l_t = \alpha\, y_t + (1 - \alpha)\, l_{t-1}

\hat{y}_{t+h} = l_t \quad \text{for every } h > 0

The parameter $\alpha$ (alpha), between 0 and 1, controls the trade-off. A high $\alpha$ means the level moves quickly toward the newest observation, trusting recent data heavily. A low $\alpha$ means the level barely moves, trusting the long history built up so far. Because each update mixes in the previous level, and that previous level already mixed in the one before it, the influence of any single past observation shrinks by a factor of $(1 - \alpha)$ every step after it arrives. That is the “exponential” in exponential smoothing: the weights on past observations decay exponentially, not the forecast itself.

A Manual Example

Six observations, and a fixed $\alpha = 0.3$ , computed by hand:

obs = [50, 54, 49, 53, 51, 55]
alpha = 0.3

level = obs[0]
levels = [level]
for x in obs[1:]:
    level = alpha * x + (1 - alpha) * level
    levels.append(round(level, 3))

print(levels)
# [50, 51.2, 50.54, 51.278, 51.195, 52.336]

The level starts at the first observation (50), then drifts toward each new value by 30% of the gap each step. After 54 arrives, the level moves from 50 to $0.3 \times 54 + 0.7 \times 50 = 51.2$ . After 49 arrives, it moves back down to 50.54. It never jumps all the way to the newest value. It always takes a fraction $\alpha$ of the step. Forecasting from here for any number of steps ahead just repeats the last level, 52.336, forever. A flat line is the only shape SES can produce, because it has no mechanism to represent a trend or a season.

What Alpha Means, on Two Toy Series

Rather than guessing $\alpha$ , you let statsmodels fit it by minimizing forecast error on the training data. Two small toy series show the two extremes:

import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import SimpleExpSmoothing

rng = np.random.default_rng(5)
pure_noise = pd.Series(50 + rng.normal(0, 3, 60))

ses_noise = SimpleExpSmoothing(pure_noise, initialization_method="estimated").fit()
print(round(ses_noise.params["smoothing_level"], 4))   # 0.0
print(np.round(ses_noise.forecast(5).values, 2))        # [49.36 49.36 49.36 49.36 49.36]

rng2 = np.random.default_rng(9)
steps = rng2.normal(0, 5, 60)
drifting = pd.Series(50 + np.cumsum(steps) * 0.3 + rng2.normal(0, 1, 60))

ses_drift = SimpleExpSmoothing(drifting, initialization_method="estimated").fit()
print(round(ses_drift.params["smoothing_level"], 4))    # 0.5513

The first series is pure noise scattered around a constant level, with nothing real to track. The fitted $\alpha$ is 0.0: the optimizer finds that the best forecast is just the historical average, ignoring every individual fluctuation, because reacting to noise only adds error. The second series genuinely wanders, with each step building on the last. The fitted $\alpha$ is 0.5513, roughly splitting the difference between trusting history and trusting the newest point, because here the newest point really does carry information the average would miss. Reading a fitted $\alpha$ is reading how much real, trackable movement the optimizer found in the series.

Applying SES to Cyclepath

Now fit SES on the real, trending, seasonal Cyclepath series and see what happens:

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96)
    rng = np.random.default_rng(42)
    trend = 9000 + 90 * t
    seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
    noise = rng.normal(0, 350, 96)
    return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")

y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]

ses = SimpleExpSmoothing(train, initialization_method="estimated").fit()
print(round(ses.params["smoothing_level"], 4))   # 1.0
print(round(ses.forecast(12).iloc[0], 1))          # 13565.0

The fitted $\alpha$ is exactly 1.0, the opposite extreme from the pure-noise toy series. At $\alpha = 1$ , the SES update collapses to $l_t = y_t$ : no smoothing at all, just the latest observed value carried forward as the forecast for every future month. Compare that to the naive baseline from Module 1, which forecasts every future month as the last observed value:

def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100

naive = pd.Series(train.iloc[-1], index=test.index)
ses_fc = ses.forecast(12)

print(np.allclose(ses_fc.values, naive.values))   # True
print(round(mape(test, ses_fc), 2))                 # 18.99

The two forecasts are numerically identical, and SES’s test MAPE (18.99%) matches Module 1’s naive baseline (19.0%) to within rounding. This is not a coincidence, and it is not a bug. Cyclepath has a real trend, and SES has no term for a trend. Given a level-only model and a series that keeps climbing, the optimizer’s best move is to abandon smoothing entirely and just track the latest point, which is exactly what the naive forecast does. SES did not fail by producing something worse than naive. It failed by discovering that naive is the best a level-only model can do here.

Where a fitted alpha lands tells you how much trackable structure SES found. Pure noise pushes alpha to 0 (trust the average). A trending series with no trend term pushes alpha to 1 (trust only the last point) — which is exactly the naive forecast.

A flat forecast is the tell

Whatever alpha turns out to be, an SES forecast is always flat: the same value repeated for every future step, because the model has nothing but a level to work with. When you see a flat forecast from an exponential smoothing model, that is not a modeling choice, it is the model telling you it has no way to represent trend or seasonality. If the series you are forecasting clearly has either, that flat line is a diagnostic, not a result to trust. The next two lessons add exactly the components SES is missing.

Practice Exercises

Exercise 1: Predict the next level by hand

Continuing the manual example ( $\alpha = 0.3$ , last level 52.336), a seventh observation of 58 arrives. What is the new level?

Hint

Apply the formula directly: $l_7 = 0.3 \times 58 + 0.7 \times 52.336 = 17.4 + 36.635 = 54.035$ . The new level moves about 30% of the way from 52.336 toward 58, landing at roughly 54.0, the same partial-step behavior as every earlier update.

Exercise 2: Interpret alpha near 0.9

You fit SES to a series and get $\alpha = 0.92$ . What does that tell you about the series, and what would the forecast look like?

Hint

An alpha that close to 1 means the optimizer found almost no benefit in smoothing. The series is close to behaving like a random walk, where the best guess for tomorrow is very close to today’s value with little useful signal in the older history. The forecast would still be flat (SES always forecasts flat), but the flat value would sit very close to the last observed point, almost like the naive forecast but not quite identical, since alpha is close to 1 but not exactly 1.

Exercise 3: Why not just always use naive, then?

If SES on a trending series just becomes the naive forecast, why bother fitting SES at all instead of using the one-line naive forecast directly?

Hint

You would not, if you already knew the series had a strong trend and nothing else. The value of fitting SES is that the fitted alpha tells you something you might not have known in advance: an alpha near 1 is itself the diagnostic that a level-only model cannot do better than naive here, which is exactly the signal that you need a trend term (Lesson 2) or a seasonal term (Lesson 3). SES is a useful first probe precisely because its failure mode is informative, not because you would deploy it once you already know a series is trending.

Summary

Simple exponential smoothing forecasts a single smoothed level, updated by $l_t = \alpha y_t + (1-\alpha) l_{t-1}$ , with every future forecast equal to the most recent level, a flat line. On a pure-noise toy series, the fitted $\alpha$ went to 0.0, trusting the historical average over any single fluctuation. On a genuinely drifting toy series, it landed at 0.5513, splitting the difference. On Cyclepath, a series with a real trend and no trend term available, the fitted $\alpha$ went to the opposite extreme, 1.0, which collapses the update to $l_t = y_t$ and makes the SES forecast numerically identical to the naive baseline from Module 1: same values, 18.99% MAPE versus the naive baseline’s 19.0%.

Key Concepts

SES level update — $l_t = \alpha y_t + (1-\alpha) l_{t-1}$ ; the forecast is the most recent level, repeated flat forever.
Alpha as a dial — near 0 trusts history (the average), near 1 trusts only the newest observation.
Exponentially decaying weights — each past observation’s influence shrinks by $(1-\alpha)$ every step after it arrives.
Alpha = 1 is naive — on a series with a real trend and no trend term, SES optimizes to the naive forecast.

Why This Matters

A fitted alpha is not just a number to report, it is a diagnostic. An alpha near 0 says the series is close to noise around a constant. An alpha near 1 says the model found nothing worth smoothing and gave up on history entirely, which is exactly what happens when a real trend has no term to attach to. Recognizing that failure mode here, rather than being surprised by it later, is what makes the next two lessons feel like a direct response to a problem you have already seen, rather than new material appearing out of nowhere. Next, Holt’s method adds a trend term directly, and you will see it fail in an even more informative way on this same series.

Next Steps

Continue to Lesson 2 - Holt's Linear Trend

Add a trend component to exponential smoothing, and watch it go badly wrong on a seasonal series.

Back to Module Overview

Return to the Exponential Smoothing module overview

Continue Building Your Skills

You have seen SES reduce to the naive forecast on a trending series, and you know why: alpha went to 1 because a level-only model has nothing else to offer a series that keeps climbing. Next, Holt’s method gives the model an explicit trend term. You might expect that to fix things immediately. Instead, on this seasonal series, it produces a forecast that is dramatically worse, for a reason that is just as informative as this lesson’s naive-equivalent result.

Next lesson

Lesson 2 - Holt's Linear Trend

Courses

DATATWEETS

Title here

Lesson 1 - Simple Exponential Smoothing

Welcome to Simple Exponential Smoothing

The SES Formula

A Manual Example

What Alpha Means, on Two Toy Series

Applying SES to Cyclepath

Practice Exercises

Exercise 1: Predict the next level by hand

Exercise 2: Interpret alpha near 0.9

Exercise 3: Why not just always use naive, then?

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 2 - Holt's Linear Trend

Back to Module Overview

Continue Building Your Skills

Lesson 1 - Simple Exponential Smoothing

Welcome to Simple Exponential Smoothing#

The SES Formula#

A Manual Example#

What Alpha Means, on Two Toy Series#

Applying SES to Cyclepath#

Practice Exercises#

Exercise 1: Predict the next level by hand#

Exercise 2: Interpret alpha near 0.9#

Exercise 3: Why not just always use naive, then?#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 2 - Holt's Linear Trend

Back to Module Overview

Continue Building Your Skills#

Welcome to Simple Exponential Smoothing

The SES Formula

A Manual Example

What Alpha Means, on Two Toy Series

Applying SES to Cyclepath

Practice Exercises

Exercise 1: Predict the next level by hand

Exercise 2: Interpret alpha near 0.9

Exercise 3: Why not just always use naive, then?

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills