Lesson 1 - Simple Exponential Smoothing
Welcome to Simple Exponential Smoothing
Modules 5 and 6 built one whole forecasting family around autocorrelation: AR, MA, ARMA, ARIMA, and SARIMA. This module builds a second, different family. Exponential smoothing does not model autocorrelation directly. Instead, it forecasts with a weighted average of past values, where recent values get more weight and older ones get exponentially less. Simple exponential smoothing (SES) is the simplest member: it assumes the series has a level that can shift, but no trend and no season.
By the end of this lesson, you will be able to:
- Write the SES update formula and explain what the smoothing parameter alpha controls
- Compute a few steps of SES by hand
- Fit SES with statsmodels and read the fitted alpha
- Explain why SES fit to a trending series reduces to the naive forecast
Let’s build the simplest smoother first.
The SES Formula
SES keeps a single running estimate of the series’ level, called . Each new observation nudges that level a little, and the forecast for every future point is just the most recent level:
The parameter (alpha), between 0 and 1, controls the trade-off. A high means the level moves quickly toward the newest observation, trusting recent data heavily. A low means the level barely moves, trusting the long history built up so far. Because each update mixes in the previous level, and that previous level already mixed in the one before it, the influence of any single past observation shrinks by a factor of every step after it arrives. That is the “exponential” in exponential smoothing: the weights on past observations decay exponentially, not the forecast itself.
A Manual Example
Six observations, and a fixed , computed by hand:
obs = [50, 54, 49, 53, 51, 55]
alpha = 0.3
level = obs[0]
levels = [level]
for x in obs[1:]:
level = alpha * x + (1 - alpha) * level
levels.append(round(level, 3))
print(levels)
# [50, 51.2, 50.54, 51.278, 51.195, 52.336]The level starts at the first observation (50), then drifts toward each new value by 30% of the gap each step. After 54 arrives, the level moves from 50 to . After 49 arrives, it moves back down to 50.54. It never jumps all the way to the newest value. It always takes a fraction of the step. Forecasting from here for any number of steps ahead just repeats the last level, 52.336, forever. A flat line is the only shape SES can produce, because it has no mechanism to represent a trend or a season.
What Alpha Means, on Two Toy Series
Rather than guessing , you let statsmodels fit it by minimizing forecast error on the training data. Two small toy series show the two extremes:
import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
rng = np.random.default_rng(5)
pure_noise = pd.Series(50 + rng.normal(0, 3, 60))
ses_noise = SimpleExpSmoothing(pure_noise, initialization_method="estimated").fit()
print(round(ses_noise.params["smoothing_level"], 4)) # 0.0
print(np.round(ses_noise.forecast(5).values, 2)) # [49.36 49.36 49.36 49.36 49.36]rng2 = np.random.default_rng(9)
steps = rng2.normal(0, 5, 60)
drifting = pd.Series(50 + np.cumsum(steps) * 0.3 + rng2.normal(0, 1, 60))
ses_drift = SimpleExpSmoothing(drifting, initialization_method="estimated").fit()
print(round(ses_drift.params["smoothing_level"], 4)) # 0.5513The first series is pure noise scattered around a constant level, with nothing real to track. The fitted is 0.0: the optimizer finds that the best forecast is just the historical average, ignoring every individual fluctuation, because reacting to noise only adds error. The second series genuinely wanders, with each step building on the last. The fitted is 0.5513, roughly splitting the difference between trusting history and trusting the newest point, because here the newest point really does carry information the average would miss. Reading a fitted is reading how much real, trackable movement the optimizer found in the series.
Applying SES to Cyclepath
Now fit SES on the real, trending, seasonal Cyclepath series and see what happens:
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96)
rng = np.random.default_rng(42)
trend = 9000 + 90 * t
seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
noise = rng.normal(0, 350, 96)
return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")
y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]
ses = SimpleExpSmoothing(train, initialization_method="estimated").fit()
print(round(ses.params["smoothing_level"], 4)) # 1.0
print(round(ses.forecast(12).iloc[0], 1)) # 13565.0The fitted is exactly 1.0, the opposite extreme from the pure-noise toy series. At , the SES update collapses to : no smoothing at all, just the latest observed value carried forward as the forecast for every future month. Compare that to the naive baseline from Module 1, which forecasts every future month as the last observed value:
def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100
naive = pd.Series(train.iloc[-1], index=test.index)
ses_fc = ses.forecast(12)
print(np.allclose(ses_fc.values, naive.values)) # True
print(round(mape(test, ses_fc), 2)) # 18.99The two forecasts are numerically identical, and SES’s test MAPE (18.99%) matches Module 1’s naive baseline (19.0%) to within rounding. This is not a coincidence, and it is not a bug. Cyclepath has a real trend, and SES has no term for a trend. Given a level-only model and a series that keeps climbing, the optimizer’s best move is to abandon smoothing entirely and just track the latest point, which is exactly what the naive forecast does. SES did not fail by producing something worse than naive. It failed by discovering that naive is the best a level-only model can do here.
A flat forecast is the tell
Whatever alpha turns out to be, an SES forecast is always flat: the same value repeated for every future step, because the model has nothing but a level to work with. When you see a flat forecast from an exponential smoothing model, that is not a modeling choice, it is the model telling you it has no way to represent trend or seasonality. If the series you are forecasting clearly has either, that flat line is a diagnostic, not a result to trust. The next two lessons add exactly the components SES is missing.
Practice Exercises
Exercise 1: Predict the next level by hand
Continuing the manual example (, last level 52.336), a seventh observation of 58 arrives. What is the new level?
Hint
Apply the formula directly: . The new level moves about 30% of the way from 52.336 toward 58, landing at roughly 54.0, the same partial-step behavior as every earlier update.
Exercise 2: Interpret alpha near 0.9
You fit SES to a series and get . What does that tell you about the series, and what would the forecast look like?
Hint
An alpha that close to 1 means the optimizer found almost no benefit in smoothing. The series is close to behaving like a random walk, where the best guess for tomorrow is very close to today’s value with little useful signal in the older history. The forecast would still be flat (SES always forecasts flat), but the flat value would sit very close to the last observed point, almost like the naive forecast but not quite identical, since alpha is close to 1 but not exactly 1.
Exercise 3: Why not just always use naive, then?
If SES on a trending series just becomes the naive forecast, why bother fitting SES at all instead of using the one-line naive forecast directly?
Hint
You would not, if you already knew the series had a strong trend and nothing else. The value of fitting SES is that the fitted alpha tells you something you might not have known in advance: an alpha near 1 is itself the diagnostic that a level-only model cannot do better than naive here, which is exactly the signal that you need a trend term (Lesson 2) or a seasonal term (Lesson 3). SES is a useful first probe precisely because its failure mode is informative, not because you would deploy it once you already know a series is trending.
Summary
Simple exponential smoothing forecasts a single smoothed level, updated by , with every future forecast equal to the most recent level, a flat line. On a pure-noise toy series, the fitted went to 0.0, trusting the historical average over any single fluctuation. On a genuinely drifting toy series, it landed at 0.5513, splitting the difference. On Cyclepath, a series with a real trend and no trend term available, the fitted went to the opposite extreme, 1.0, which collapses the update to and makes the SES forecast numerically identical to the naive baseline from Module 1: same values, 18.99% MAPE versus the naive baseline’s 19.0%.
Key Concepts
- SES level update — ; the forecast is the most recent level, repeated flat forever.
- Alpha as a dial — near 0 trusts history (the average), near 1 trusts only the newest observation.
- Exponentially decaying weights — each past observation’s influence shrinks by every step after it arrives.
- Alpha = 1 is naive — on a series with a real trend and no trend term, SES optimizes to the naive forecast.
Why This Matters
A fitted alpha is not just a number to report, it is a diagnostic. An alpha near 0 says the series is close to noise around a constant. An alpha near 1 says the model found nothing worth smoothing and gave up on history entirely, which is exactly what happens when a real trend has no term to attach to. Recognizing that failure mode here, rather than being surprised by it later, is what makes the next two lessons feel like a direct response to a problem you have already seen, rather than new material appearing out of nowhere. Next, Holt’s method adds a trend term directly, and you will see it fail in an even more informative way on this same series.
Next Steps
Continue to Lesson 2 - Holt's Linear Trend
Add a trend component to exponential smoothing, and watch it go badly wrong on a seasonal series.
Back to Module Overview
Return to the Exponential Smoothing module overview
Continue Building Your Skills
You have seen SES reduce to the naive forecast on a trending series, and you know why: alpha went to 1 because a level-only model has nothing else to offer a series that keeps climbing. Next, Holt’s method gives the model an explicit trend term. You might expect that to fix things immediately. Instead, on this seasonal series, it produces a forecast that is dramatically worse, for a reason that is just as informative as this lesson’s naive-equivalent result.