Lesson 5 - Guided Project: A SARIMA Forecast for Cyclepath

Welcome to the Guided Project

This is the capstone the whole course has been building toward. Since Module 1, one number has stood as the bar: the seasonal-naive baseline at 5.9% MAPE. Module 5 threw every non-seasonal ARIMA at it and lost — the best managed only 16%. Now you’ll fit a SARIMA end to end, prove it’s adequate, and finally clear that bar — not by a hair, but by more than fivefold. Then you’ll do what forecasting is actually for: retrain on all the data and predict a year that hasn’t happened yet.

By the end of this project, you will be able to:

Fit and diagnose a SARIMA in a single end-to-end workflow
Score a forecast against every baseline established across the course
Compare competing SARIMA specifications on both AIC and out-of-sample error
Retrain on the full series and produce a genuine forward forecast

Let’s clear the bar.

Stage 1: Fit and Diagnose

Rebuild Cyclepath, split off the test year, fit the model, and immediately check it’s adequate — the fit-then-diagnose workflow from Lessons 3 and 4, compressed:

import warnings; warnings.filterwarnings("ignore")
import numpy as np, pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]

model = SARIMAX(train, order=(1, 1, 0), seasonal_order=(1, 1, 0, 12)).fit(disp=False)
resid = model.resid.iloc[model.loglikelihood_burn:]
lb = acorr_ljungbox(resid, lags=[12], return_df=True)["lb_pvalue"].iloc[0]

print(round(model.aic, 2))    # 1056.25
print(round(lb, 4))           # 0.2091  -> passes, residuals are white noise

AIC 1056.25, Ljung-Box p = 0.209 — the model is fit and certified adequate (residuals white, as Lesson 4 established). Only now, with adequacy confirmed, is it worth scoring the forecast.

Stage 2: Score Against Every Baseline

Forecast the 12 test months and compare against the full lineage of baselines this course has built — the naive and seasonal-naive from Module 1, and the best non-seasonal ARIMA from Module 5:

from statsmodels.tsa.arima.model import ARIMA
def mape(a, f): return np.mean(np.abs((a - f) / a)) * 100

naive          = pd.Series(train.iloc[-1], index=test.index)
seasonal_naive = pd.Series(train.iloc[-12:].values, index=test.index)
arima          = ARIMA(train, order=(0,1,0), trend="t").fit().forecast(12)
sarima         = model.forecast(12)

print(f"naive           {mape(test, naive):.1f}%")   # 19.0%
print(f"seasonal-naive  {mape(test, seasonal_naive):.1f}%")   # 5.9%
print(f"non-seasonal    {mape(test, arima):.1f}%")   # 16.0%
print(f"SARIMA          {mape(test, sarima):.2f}%")  # 1.06%

naive           19.0%
seasonal-naive   5.9%
non-seasonal    16.0%
SARIMA           1.06%

There it is. The SARIMA forecasts the held-out year at 1.06% MAPE — 5.6 times better than the seasonal-naive baseline (5.9%) that has stood unbeaten since Module 1, and 15 times better than the best non-seasonal ARIMA (16.0%). This is the first model in the entire course to beat the seasonal-naive bar, and it doesn’t just edge past it — it leaves it far behind. In absolute terms, the SARIMA’s mean error is 175 trips per month against the baseline’s 998.

The whole course in one chart: the naive and non-seasonal ARIMA forecasts never beat the 5.9% seasonal-naive bar, but the SARIMA clears it more than fivefold at 1.06% — the payoff of modeling seasonality explicitly.

Stage 3: Compare Two Adequate SARIMAs

The featured model isn’t the only good one. The famous airline model — SARIMA(0,1,1)(0,1,1)[12], seasonal differencing plus MA terms instead of AR terms — also fits Cyclepath well. Compare them on both criteria that matter:

airline = SARIMAX(train, order=(0,1,1), seasonal_order=(0,1,1,12)).fit(disp=False)

print(f"featured (1,1,0)(1,1,0)  AIC={model.aic:.1f}  MAPE={mape(test, sarima):.2f}%")
print(f"airline  (0,1,1)(0,1,1)  AIC={airline.aic:.1f}  MAPE={mape(test, airline.forecast(12)):.2f}%")
# featured (1,1,0)(1,1,0)  AIC=1056.2  MAPE=1.06%
# airline  (0,1,1)(0,1,1)  AIC=1033.0  MAPE=1.33%

A familiar tension resurfaces: the airline model has the better AIC (1033.0 vs. 1056.2), but the featured model has the better test MAPE (1.06% vs. 1.33%) — the same AIC-versus-out-of-sample disagreement you saw in Module 5, now between two good models rather than as a warning sign. Both pass diagnostics, both crush the baseline; the choice between them is a reasonable judgment call. The featured model wins on the metric that matters most for deployment (held-out accuracy), so it’s the one to carry forward — but reporting that the airline model is a close, defensible alternative is exactly the honesty this course has emphasized throughout.

Stage 4: Forecast the Future

A validated model measured on a held-out year is trustworthy — so now use it for its actual purpose. Retrain on the full 96 months (no hold-out; you want every scrap of data for a real forward forecast) and predict all of 2024, a year Cyclepath hasn’t lived yet:

production = SARIMAX(y, order=(1, 1, 0), seasonal_order=(1, 1, 0, 12)).fit(disp=False)
fc2024 = production.get_forecast(steps=12).predicted_mean

print(f"{fc2024.loc['2024-01-01']:.0f}")   # 13800   <- winter low
print(f"{fc2024.loc['2024-07-01']:.0f}")   # 21346   <- summer peak
print(f"{fc2024.sum():.0f}")               # 213839  <- full-year total

The 2024 forecast has exactly the shape you’d expect and can now trust: a winter low near 13,800 in January, climbing to a summer peak of 21,346 in July, for a full-year total of 213,839 trips — about 5% above 2023’s actual 203,501, continuing the steady growth the trend captured. This is what all the work was for: not a backward-looking fit, but a credible, seasonally-shaped prediction of a year that hasn’t happened, from a model you’ve proven adequate and accurate.

Why retrain on the full series for the real forecast

The train/test split exists to measure a model honestly — you hold out data the model never saw, forecast it, and compare. But once you’ve measured the model and trust it, throwing away the most recent 12 months for the actual forward forecast would be wasteful: those are the freshest, most relevant observations. So the workflow is: split to validate, then refit on everything to forecast the genuine future. The validated error (1.06% MAPE) is your honest estimate of how well the full-data forecast will do — you measured it on data the model hadn’t seen, which is the whole point of the hold-out.

Stage 5: The Takeaway

Step back and see what this project — and the whole course — produced:

An adequate, validated model — SARIMA(1,1,0)(1,1,0)[12], with significant terms, white-noise residuals (Ljung-Box p = 0.209), and every diagnostic passing.
A decisive win over the baseline — 1.06% MAPE versus the 5.9% seasonal-naive bar that stood unbeaten for five modules: 5.6 times better, the first and only model in the course to clear it.
A genuine forward forecast — 2024 predicted from winter low (~13,800) to summer peak (21,346), retrained on all the data, ready to actually use.

The arc is complete. Module 1 built the series, the honest test split, and the baseline. Modules 2–4 decomposed, stationarized, and read the autocorrelation structure. Module 5 built the ARIMA family and proved, rigorously, that it wasn’t enough. This module added the one missing piece — explicit seasonal terms — and finally beat the bar. That’s the discipline of classical forecasting in miniature: understand the data, respect the baseline, add complexity only when you can prove it earns its place, and validate honestly before you trust a forecast.

Practice Exercises

Exercise 1: Why score only after diagnosing?

Stage 1 checked the Ljung-Box test before Stage 2 scored the forecast. Why is that ordering deliberate?

Hint

Because a good test-set score from an inadequate model can’t be trusted — if the residuals still contain structure (failing Ljung-Box), the model got lucky on this particular year rather than genuinely capturing the process, and its performance may not generalize. Diagnosing first ensures you’re scoring a model that’s actually sound, so its low MAPE reflects real skill, not a fluke. It’s the discipline from Lesson 4’s Exercise 3: when a good forecast and a diagnostic disagree, trust the diagnostic — so you check the diagnostic first, and only celebrate the forecast once the model has earned it.

Exercise 2: Choosing between the two SARIMAs

The airline model had a better AIC but the featured model had a better test MAPE. Give one reasonable argument for each choice.

Hint

For the featured model: it directly wins on out-of-sample accuracy (1.06% vs 1.33%), which is what you ultimately care about for forecasting, and held-out error is the most trustworthy guide (Module 5’s lesson). For the airline model: it has the better AIC, suggesting a better fit-to-complexity balance in-sample, and it’s the widely-validated canonical seasonal model with a long track record, so it may be more robust across different test years than a single hold-out reveals. Both are defensible — with only one test year, the MAPE difference could partly be luck, so a practitioner might even average the two forecasts. The key skill is recognizing there’s no single “correct” answer here, and reporting the tradeoff honestly rather than hiding it.

Exercise 3: Interpreting the 2024 forecast

The 2024 forecast totals 213,839 trips, about 5% above 2023. Where does that growth come from in the model, given the featured model has no explicit trend term?

Hint

The growth comes from the differencing, not an explicit trend coefficient. The non-seasonal difference (d=1) means the model forecasts changes and sums them back up, so a persistent average month-to-month increase carries the level upward — and the seasonal difference (D=1) compares each month to the same month last year, so a consistent year-over-year rise propagates forward. Together they let the model project continued growth without a dedicated drift term: the upward trajectory is baked into how the differenced series behaves, which the model learned from eight years of steady climbing. This is the same “differencing handles the trend” principle from Module 5, Lesson 3, now producing a real forward projection.

Summary

The capstone fit SARIMA(1,1,0)(1,1,0)[12] to Cyclepath, confirmed adequacy (Ljung-Box p = 0.209), and scored its forecast against every baseline in the course: naive (19.0%), best non-seasonal ARIMA (16.0%), and the seasonal-naive bar (5.9%). The SARIMA landed at 1.06% MAPE — 5.6 times better than the baseline, the first model in the course to beat it, with a mean error of 175 trips versus the baseline’s 998. Compared against the canonical airline model (0,1,1)(0,1,1)[12], the featured model won on test MAPE (1.06% vs 1.33%) though the airline model had a better AIC (1033 vs 1056) — both adequate, both crushing the baseline. Finally, retrained on all 96 months, the model forecast 2024 from a winter low near 13,800 to a July peak of 21,346, totaling 213,839 trips (~5% growth) — a genuine, validated forward prediction.

Key Concepts

Diagnose before scoring — confirm adequacy (Ljung-Box) before trusting a test-set forecast.
Beating the baseline — SARIMA at 1.06% MAPE clears the 5.9% seasonal-naive bar more than fivefold, the course’s turning point.
AIC vs. test error, again — the airline model wins AIC, the featured model wins MAPE; both are defensible, report the tradeoff.
Refit on full data for the real forecast — validate on a hold-out, then use every observation to forecast the genuine future.

Why This Matters

This is the destination of the entire course: a model that beats the honest baseline, passes every diagnostic, and produces a trustworthy forecast of a year that hasn’t happened. But the deeper lesson is the process that got here — building a baseline first, proving simpler models insufficient before adding complexity, validating on held-out data, and diagnosing rigorously before trusting. That discipline is what separates real forecasting from curve-fitting, and it transfers to any forecasting problem you’ll face, with any model, seasonal or not. You now have both the SARIMA toolkit and the judgment to use it well.

Next Steps

Back to Course Overview

Module 7 - Exponential Smoothing is coming soon. Check the course overview for what's live.

Back to Module Overview

Return to the Seasonality: SARIMA module overview

Continue Building Your Skills

You’ve reached the course’s turning point: a SARIMA that beats the seasonal-naive baseline more than fivefold, passes every diagnostic, and forecasts a genuine future year. From here, the remaining modules broaden your toolkit — exponential smoothing as a complementary forecasting approach, and rigorous backtesting to validate models across many windows rather than a single hold-out — but the core discipline you’ve built here, of respecting baselines and validating honestly, carries through all of it.

Previous lesson

Lesson 4 - Diagnostics: Is the Model Adequate?

Courses

DATATWEETS

Title here

Lesson 5 - Guided Project: A SARIMA Forecast for Cyclepath

Welcome to the Guided Project

Stage 1: Fit and Diagnose

Stage 2: Score Against Every Baseline

Stage 3: Compare Two Adequate SARIMAs

Stage 4: Forecast the Future

Stage 5: The Takeaway

Practice Exercises

Exercise 1: Why score only after diagnosing?

Exercise 2: Choosing between the two SARIMAs

Exercise 3: Interpreting the 2024 forecast

Summary

Key Concepts

Why This Matters

Next Steps

Back to Course Overview

Back to Module Overview

Continue Building Your Skills

Lesson 5 - Guided Project: A SARIMA Forecast for Cyclepath

Welcome to the Guided Project#

Stage 1: Fit and Diagnose#

Stage 2: Score Against Every Baseline#

Stage 3: Compare Two Adequate SARIMAs#

Stage 4: Forecast the Future#

Stage 5: The Takeaway#

Practice Exercises#

Exercise 1: Why score only after diagnosing?#

Exercise 2: Choosing between the two SARIMAs#

Exercise 3: Interpreting the 2024 forecast#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Back to Course Overview

Back to Module Overview

Continue Building Your Skills#

Welcome to the Guided Project

Stage 1: Fit and Diagnose

Stage 2: Score Against Every Baseline

Stage 3: Compare Two Adequate SARIMAs

Stage 4: Forecast the Future

Stage 5: The Takeaway

Practice Exercises

Exercise 1: Why score only after diagnosing?

Exercise 2: Choosing between the two SARIMAs

Exercise 3: Interpreting the 2024 forecast

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills