Lesson 5 - Guided Project: Decomposing Cyclepath

Welcome to the Guided Project

This module took decomposition apart piece by piece: what trend, seasonality, and residual mean (Lesson 1), how to extract them by hand with a moving average (Lesson 2), how to prove a series is additive rather than multiplicative (Lesson 3), and how STL improves on the classical method (Lesson 4). Now you’ll run the whole pipeline on Cyclepath at once and, more importantly, read what it tells you — not just compute three components, but interpret each one as a fact about the bike-share business.

By the end of this project, you will be able to:

  • Run both classical and STL decomposition on a full series and compare their outputs
  • Confirm additive is the right model for a series using the swing-to-level test
  • Interpret a trend component as a growth rate and a seasonal component as a calendar pattern
  • Check a residual for leftover structure and recognize when a decomposition is done its job

Let’s decompose Cyclepath for the last time this module — and read every number it gives back.


Stage 1: Rebuild the Series

Same cyclepath() generator from Module 1, unchanged — every number below is reproducible from this one function.

import numpy as np, pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose, STL

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()
print(len(y), y.index[0].date(), y.index[-1].date())   # 96 2016-01-01 2023-12-01

Stage 2: Confirm Additive, With Evidence

Before decomposing, settle the additive-vs-multiplicative question the way Lesson 3 taught — don’t assume:

by_year = y.groupby(y.index.year)
swing = by_year.agg(lambda s: s.max() - s.min())
level = by_year.mean()
ratio = (swing / level * 100).round(1)
print(ratio.to_dict())
# {2016: 72.8, 2017: 68.1, 2018: 66.7, 2019: 52.8, 2020: 47.3, 2021: 50.0, 2022: 46.9, 2023: 44.8}

The ratio falls from 72.8% to 44.8% while the raw swing stays roughly flat — the additive signature, exactly as Lesson 3 established. Every decomposition below uses model="additive".


Stage 3: Decompose Two Ways

Run both the classical method (Lesson 2) and STL (Lesson 4) on the same series:

add = seasonal_decompose(y, model="additive", period=12)
stl = STL(y, period=12, robust=True).fit()

print("classical trend coverage:", add.trend.notna().sum(), "/ 96")   # 84 / 96
print("STL trend coverage:", stl.trend.notna().sum(), "/ 96")          # 96 / 96

print("classical resid std:", round(add.resid.dropna().std(), 2))      # 229.2
print("STL resid std:", round(stl.resid.std(), 2))                     # 215.25

Both agree on the shape; STL covers all 96 months instead of 84, and leaves a slightly smaller residual (215.25 vs. 229.20) — consistent with everything Lesson 4 showed. From here, the two are close enough that either is fair to interpret; STL’s full coverage makes it the more convenient one to read.

Four stacked panels sharing a time axis from 2016 to 2023. Top panel 'observed' shows the full wiggling, rising Cyclepath series. Second panel 'trend' shows a smooth line rising from about 8,900 to about 17,400 trips with no seasonal wiggle. Third panel 'seasonal' shows a repeating wave of constant height, peaking around July at roughly plus 3,300 and troughing around January at roughly minus 3,400, identical in every one of the eight years. Bottom panel 'residual' shows a flat, noisy scatter around zero with no visible pattern across the full eight years.
The full Cyclepath decomposition: trend climbs steadily from about 8,900 to 17,400 trips, seasonality repeats an identical July-peak, January-trough wave every year, and the residual is flat, unstructured noise across all eight years — the sign that trend and seasonality were both fully captured.

Stage 4: Interpret the Trend

print(round(stl.trend.iloc[0], 1), "->", round(stl.trend.iloc[-1], 1))   # 8945.9 -> 17412.8
print(round(stl.trend.iloc[-1] - stl.trend.iloc[0], 1))                   # 8466.9

The trend climbs from about 8,946 trips a month at the start of 2016 to about 17,413 by the end of 2023 — growth of roughly 8,467 trips a month, essentially doubling ridership over eight years with no reversals anywhere in between. That’s the story Module 1 told in prose (“ridership climbs year over year as the network grows”); now it’s a specific, extracted number you could plug into a growth-rate calculation or a capacity-planning conversation.


Stage 5: Interpret the Seasonality

seas = add.seasonal.iloc[:12]
print("peak:", seas.idxmax().month, round(seas.max(), 1))     # peak: 7 3307.3
print("trough:", seas.idxmin().month, round(seas.min(), 1))   # trough: 1 -3399.8

Every year, July runs about 3,307 trips above the trend line — the summer peak — and January runs about 3,400 trips below it — the winter trough. That’s a swing of roughly 6,700 trips between the best and worst month of a typical year, and because the model is additive, that swing applies as a roughly constant offset no matter how high the trend has climbed. This is the number that would drive a real operations decision — how many extra bikes to deploy each summer, how much to scale back winter maintenance staffing — read directly off the seasonal component instead of guessed from eyeballing a noisy raw series.


Stage 6: Interpret the Residual

r = stl.resid
print("max:", r.idxmax().date(), round(r.max(), 1))    # max: 2018-07-01 727.8
print("min:", r.idxmin().date(), round(r.min(), 1))     # min: 2020-05-01 -714.1
print("std:", round(r.std(), 2))                         # 215.25

The single largest positive surprise is July 2018 (+727.8 above what trend and seasonality predicted) and the largest negative one is May 2020 (-714.1 below). Neither is dramatically larger than the residual’s overall spread (std 215.25) — about 3.4 and 3.3 standard deviations respectively, unusual but not implausible in 84–96 observations of ordinary noise. There’s no repeating shape left in the residual and no cluster of consecutive extreme months, which is exactly the “residual looks like noise” check from Lesson 1: trend and seasonality between them have captured essentially everything predictable, and what’s left is the kind of unstructured variation any forecasting model will have to live with rather than eliminate.

A clean residual is where this module hands off

Module 3 asks a related but distinct question of the original series: is it stationary — free of trend and seasonality, with statistical properties that don’t drift over time? You’ve just shown Cyclepath clearly isn’t (it has both), which is precisely why Module 3 opens with differencing, a technique for removing exactly the trend and seasonal structure you spent this module identifying by name. Decomposition and stationarity are two lenses on the same underlying fact: a raw series with structure needs that structure stripped out — one way or another — before the ARIMA family of models can be fit to what remains.


Stage 7: The Takeaway

Step back and look at what this project produced. You now hold three things that carry forward into the rest of the course:

  1. A confirmed additive model — verified with the swing-to-level test, not assumed: the ratio fell from 72.8% to 44.8% while the raw swing stayed flat, the additive signature.
  2. Three named, interpretable components — trend growing from 8,946 to 17,413 (roughly doubling), a seasonal wave peaking +3,307 in July and troughing -3,400 in January, and a residual with std 215.25 and no leftover pattern.
  3. STL as the default extraction method — full 96-point coverage and a smaller residual than the classical moving average, with the robustness Lesson 4 demonstrated on an injected outlier.

That’s the whole point of Module 2: a raw series is not one signal, it’s several, and now you can name, extract, and interpret each one. Next up, Module 3 asks whether Cyclepath is stationary — and uses differencing to strip out exactly the trend and seasonality you just spent this module isolating.


Practice Exercises

Exercise 1: Compute the seasonal amplitude as a fraction of the final trend level

Using the seasonal peak (+3,307) and trough (-3,400) and the final trend level (17,413), what fraction of the current trend does the full seasonal swing represent?

Hint

The full swing is 3307.3 - (-3399.8) = 6707.1. As a fraction of the final trend level: 6707.1 / 17412.8 ≈ 0.385, or about 38.5% — close to, but not identical to, 2023’s measured ratio of 44.8% from Stage 2 (that number used the raw series’ peak-to-trough within 2023, which also includes noise, rather than the smoothed seasonal component alone). Both numbers tell the same story: seasonality is still a substantial swing, but a shrinking fraction of an ever-larger trend.

Exercise 2: Would a multiplicative model have caught July 2018 differently?

If you re-ran this decomposition with model="multiplicative" instead of "additive", would you expect the residual around July 2018 to look better, worse, or about the same?

Hint

Since Stage 2 already established Cyclepath is additive (constant absolute swing, not scaling with the level), forcing a multiplicative model would be the wrong tool — Lesson 3 showed exactly this scenario turns a clean residual into one with growing structure over time. July 2018’s specific residual might shift a little either way, but the overall residual would almost certainly look worse across the full eight years, not better, because the model would be fighting the series’ true structure the whole time, not just around one unusual month.

Exercise 3: What would you tell an operations team?

Using only the numbers from Stages 4 and 5 (trend growth and seasonal swing), write two sentences of advice for a Cyclepath operations team planning next summer’s bike deployment.

Hint

Something like: “Baseline demand is still climbing — expect roughly 8,500 more trips a month now than at the start of 2016, so capacity planned for 2016 levels is badly undersized. On top of that baseline, expect a July peak running about 3,300 trips above the current trend and a January trough about 3,400 below it — a swing of nearly 6,700 trips between the two extremes — so summer deployment should scale well beyond whatever the trend alone suggests.” That’s the practical payoff of decomposition: turning “ridership goes up and is seasonal” into numbers a staffing or inventory plan can actually use.


Summary

You decomposed the full Cyclepath series and interpreted every piece. Confirmed additive first, with the swing-to-level test from Lesson 3 (ratio 72.8% → 44.8%, flat raw swing). Decomposed two ways — classical (84/96 trend coverage, resid std 229.2) and STL (96/96 coverage, resid std 215.25) — and found them in close agreement. Interpreted trend as real growth: 8,946 → 17,413 trips a month, an 8,467-trip climb over eight years. Interpreted seasonality as a calendar-driven operational fact: July runs +3,307 above trend, January runs -3,400 below it, a roughly 6,700-trip swing every year. And checked the residual: std 215.25, largest deviations at July 2018 (+727.8) and May 2020 (-714.1), neither dramatic, no leftover pattern — the sign that trend and seasonality captured what there was to capture.

Key Concepts

  • Verify before decomposing — confirm additive vs. multiplicative with evidence (Lesson 3’s test) rather than assuming it.
  • Trend as growth rate — a decomposed trend component turns “ridership is climbing” into a specific number you can plan around.
  • Seasonal as calendar fact — a decomposed seasonal component turns “it’s seasonal” into “expect +3,307 in July, -3,400 in January.”
  • Clean residual = job done — no leftover pattern in the residual means trend and seasonality between them explained everything predictable.

Why This Matters

Decomposition is only useful if you read what it gives you back — three arrays of numbers are not, by themselves, an insight. This project turned Cyclepath’s decomposition into things a real team could act on: a growth number for capacity planning, a seasonal swing for staffing, and a residual clean enough to trust that nothing predictable was left on the table. That same discipline — decompose, verify the model choice, then interpret each component in the language of the actual problem — applies to every series you’ll ever decompose, not just Cyclepath. With trend and seasonality now named and measured, Module 3 turns to stationarity and differencing: the formal tools for stripping exactly this structure out of a series so the ARIMA family of models, starting in Module 5, has something well-behaved to work with.


Next Steps

Back to Course Overview

Module 3 - Stationarity and Differencing is coming soon. Check the course overview for what's live.

Back to Module Overview

Return to the Components and Decomposition module overview


Continue Building Your Skills

You now have Cyclepath fully decomposed and interpreted — a confirmed additive model, a trend that’s nearly doubled ridership, a seasonal swing worth almost 6,700 trips a year, and a residual clean enough to trust. Next, Module 3 asks a closely related question of the raw series itself: is it stationary? You’ll learn the formal test for it and the differencing technique that strips out exactly the trend and seasonality you just spent this module naming.