Lesson 2 - Classical Decomposition by Hand

Welcome to Classical Decomposition by Hand

The classical decomposition method is old — it predates modern software by decades — but it’s still the clearest way to see how trend, seasonality, and residual get separated, because every step is something you could do with a pencil and a long enough table. You’ll build it in three moves: smooth away the wiggle to estimate trend, subtract that trend to expose the seasonal shape, then average by calendar month to pin down the seasonal indices. Do this once by hand and statsmodels.tsa.seasonal.seasonal_decompose stops being a black box.

By the end of this lesson, you will be able to:

  • Estimate trend with a centered moving average, including why an even period needs special handling
  • Subtract the trend to expose the seasonal-plus-residual shape and extract seasonal indices by averaging across years
  • Compute the residual as whatever’s left, and check that it looks like noise
  • Verify a hand-built decomposition against seasonal_decompose

Let’s build it on Cyclepath.


Step 1: Estimate Trend with a Moving Average

The idea: average away a full seasonal cycle’s worth of points around each month, and the seasonal wiggle cancels out, leaving the trend. For Cyclepath’s monthly data with a 12-month cycle, that means each trend estimate should average 12 consecutive months — but 12 is even, so there’s no single “center” month. The fix is a 2×12 moving average: a 13-point window centered on the target month, where the two endpoint months each get half weight so the total still averages exactly 12 months’ worth of data.

import numpy as np, pandas as pd

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()

weights = np.array([0.5] + [1]*11 + [0.5]) / 12
trend = y.rolling(window=13, center=True).apply(lambda w: np.dot(w, weights), raw=True)

print(y.index[11:14].tolist())
print(trend.iloc[11:14].round(1).tolist())   # [10008.7, 10114.6, 10219.4]
print(trend.notna().sum())                    # 84

At Dec 2016, Jan 2017, Feb 2017 the trend estimate is 10,008.7 → 10,114.6 → 10,219.4 — smoothly rising, with no trace of the seasonal peak-and-trough left. The centered 13-point window means the first six and last six months of the series have no valid trend estimate (there’s no full window to average), so trend has exactly 84 non-missing values out of 96. That’s the price of a centered window: you lose data at both edges, not just one.


Step 2: Subtract Trend to Expose Seasonality

Subtracting the trend from the observed series removes the slow drift, leaving seasonal-plus-residual (S + R) behind:

detrended = y - trend
print(y.iloc[11:14].tolist())          # [7491, 6903, 7793]
print(detrended.iloc[11:14].round(1).tolist())   # [-2517.7, -3211.6, -2426.4]

December, January, and February all land well below the trend line — exactly the winter trough Module 1 described. But this detrended value still mixes two things: the seasonal pattern (winter is always low) and residual noise (this particular December might be a bit higher or lower than a typical December). Separating those is the next step.


Step 3: Average by Month for Seasonal Indices

To isolate the seasonal part, group the detrended series by calendar month and average — this cancels out the year-to-year noise, since noise is random but the seasonal effect is consistent every year. Then center the twelve monthly averages around zero, so the seasonal component adds nothing to the overall level (that’s already the trend’s job):

seasonal_raw = detrended.groupby(detrended.index.month).mean()
seasonal_idx = seasonal_raw - seasonal_raw.mean()
print(seasonal_idx.round(1).to_dict())
# {1: -3399.8, 2: -2760.3, 3: -1606.2, 4: 81.9, 5: 1623.2, 6: 2817.8,
#  7: 3307.3, 8: 2666.6, 9: 1614.3, 10: -77.8, 11: -1552.1, 12: -2714.9}

Twelve numbers, one per calendar month, and they tell the whole seasonal story at a glance: July is the peak at +3,307 above trend, January is the trough at -3,400 below trend, and the months in between step smoothly from one extreme to the other — exactly the sine-shaped summer-peak pattern Cyclepath was built with. Broadcast these twelve values back across all 96 months (matching each date to its calendar month) to get the full seasonal component.


Step 4: What’s Left Is the Residual

Subtract both trend and seasonal from the original series, and whatever remains is the residual:

seasonal_full = pd.Series(y.index.month.map(seasonal_idx).values, index=y.index)
resid = y - trend - seasonal_full

print(resid.iloc[11:14].round(1).tolist())   # [197.3, 188.3, 333.9]
print(round(resid.std(), 2))                  # 229.2

The residual has no visible pattern left — no trend, no repeating shape — just values scattered around zero with a standard deviation of about 229. Compare that to Cyclepath’s noise term, generated with rng.normal(0, 350, 96): a bit smaller than 350, which makes sense — the moving-average trend and the seasonal averaging both smooth out some of that noise along with the true signal. A residual this unstructured is the confirmation that trend and seasonality were both captured correctly.

Four small panels showing the classical decomposition pipeline in order. Panel 1: the raw Cyclepath series with a 13-point centered moving-average window sliding across it. Panel 2: the smooth trend line extracted, with the first and last six months shown as a gap. Panel 3: the detrended series (original minus trend) still showing a repeating wiggle, next to twelve small bars labeled Jan through Dec representing the averaged seasonal indices. Panel 4: the final residual, a flat scattered noise band with no visible pattern.
The four-step classical decomposition pipeline: a centered moving average estimates trend, subtracting it exposes the seasonal wiggle, averaging by calendar month isolates twelve seasonal indices, and what's left after removing both is the residual.

Now Do It With statsmodels

Everything above is exactly what statsmodels.tsa.seasonal.seasonal_decompose does internally for the additive model. Run it and compare:

from statsmodels.tsa.seasonal import seasonal_decompose

add = seasonal_decompose(y, model="additive", period=12)

print(add.trend.iloc[11:14].round(1).tolist())     # [10008.7, 10114.6, 10219.4]
print(add.seasonal.iloc[:12].round(1).tolist())     # matches seasonal_idx by month
print(round(add.resid.dropna().std(), 2))           # 229.2

add.trend, add.seasonal, and add.resid match the hand-built trend, seasonal_full, and resid to the decimal — because seasonal_decompose(model="additive") runs the same 2×12 centered moving average, the same subtraction, and the same by-month averaging you just did manually. Knowing that means you can trust the library’s output and explain exactly what it computed, which matters the moment a decomposition looks wrong and you need to know where to look.

period=12 has to match the real cycle

period=12 tells seasonal_decompose how many points make up one full seasonal cycle — 12 months for Cyclepath’s yearly pattern. Get this wrong (say, period=4) and the moving average window no longer spans a full cycle, so it fails to cancel the seasonal wiggle, and every downstream component — trend, seasonal, residual — comes out wrong. The period always matches the data’s actual seasonal frequency: 12 for monthly data with yearly seasonality, 7 for daily data with weekly seasonality, 24 for hourly data with daily seasonality.


Practice Exercises

Exercise 1: Why half-weight at the endpoints?

The moving-average weights are [0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5] over 13 points, not 12 equal weights over 12 points. Why?

Hint

With an even period (12), there’s no single middle month in a 12-point window — the center falls exactly between two months. Using a 13-point window and giving the two outermost months half weight each effectively averages two overlapping 12-month windows (one centered slightly early, one slightly late) into a single, properly centered estimate. The weights still sum to 12 total “months” of influence (11 full months + 2 half months = 12), so the average is still a true 12-month average — just correctly centered on a single date.

Exercise 2: Recompute seasonal indices for a different period

Suppose you had daily data with a 7-day weekly cycle instead of monthly data with a 12-month cycle. What would change in Step 1 and Step 3?

Hint

Step 1’s moving average would use a 7-point centered window (odd, so no half-weight trick needed — a plain 7-point centered average works, since there’s a genuine middle day). Step 3 would group the detrended series by .index.dayofweek (or .day_name()) instead of .index.month, averaging each of the 7 weekdays across all the weeks in the data to get 7 seasonal indices instead of 12. Everything else — subtract, average, subtract again — stays the same; only the window size and the grouping key change to match the cycle length.

Exercise 3: What would a wrong period do?

If you ran seasonal_decompose(y, model="additive", period=6) on Cyclepath instead of period=12, what would you expect to see in the residual?

Hint

A 6-month period means the moving average and the by-group averaging both assume the pattern repeats every 6 months — but Cyclepath’s true cycle is 12 months. The seasonal indices would come out roughly halved and distorted (trying to fit a 12-month sine wave into a 6-month box), and the residual would no longer look like noise — it would still show a visible yearly wave, because half of the true seasonal signal never got removed. This is exactly the “bad residual” diagnostic from Lesson 1’s Exercise 2: a patterned residual means the decomposition’s assumptions were wrong.


Summary

You built a classical additive decomposition from scratch on Cyclepath, in four steps. Trend: a centered 2×12 moving average (13-point window, half weight at the endpoints) smooths away seasonality and noise, losing 6 months of data at each edge — 10,008.7 → 10,114.6 → 10,219.4 across Dec 2016–Feb 2017. Detrend: subtracting trend from the observed series exposes seasonal-plus-residual. Seasonal indices: grouping the detrended series by calendar month and averaging isolates twelve values — July at +3,307 (peak), January at -3,400 (trough) — that cancel year-to-year noise. Residual: whatever’s left after removing both, with a standard deviation of about 229 and no visible pattern. Every one of these numbers matched statsmodels.tsa.seasonal.seasonal_decompose(y, model="additive", period=12) exactly, because that’s precisely the computation it runs internally.

Key Concepts

  • 2×12 moving average — a 13-point centered window with half weight at the endpoints, needed because 12 (an even period) has no single center month.
  • Detrending — subtracting the trend estimate exposes the seasonal-plus-residual shape.
  • Seasonal indices — averaging the detrended series by calendar month isolates the repeating pattern, one value per period.
  • period parameter — must match the data’s true seasonal cycle length, or the decomposition silently produces a patterned residual.

Why This Matters

Knowing how to build a decomposition by hand means you’re never stuck trusting a library function you can’t explain. When a real decomposition looks wrong — a residual with a leftover wave, a trend that looks choppy — you now know exactly which of the four steps to inspect: is the period right? Is the moving-average window losing too much data at the edges? Is the by-month averaging being thrown off by an outlier year? That diagnostic instinct is worth more than the mechanics themselves, and it’s exactly what the next lesson builds on when it formalizes the choice between additive and multiplicative models.


Next Steps

Continue to Lesson 3 - Additive vs Multiplicative Models

Formalize the choice between constant-size and constant-percentage seasonal swings, with real evidence from Cyclepath and a contrasting series.

Back to Module Overview

Return to the Components and Decomposition module overview


Continue Building Your Skills

You’ve built additive decomposition from scratch and confirmed it matches seasonal_decompose exactly. Next you’ll formalize when the additive model is actually the right one to reach for — versus the multiplicative alternative — using real evidence instead of a guess, and see what happens to the residual when that choice is wrong.