Lesson 3 - Fitting SARIMA with statsmodels

Welcome to Fitting SARIMA with statsmodels

You’ve built the seasonal terms in isolation; now you’ll fit a complete SARIMA with both seasonal and non-seasonal parts to Cyclepath. The tool is SARIMAX — the same class whose seasonal-free special case you used as ARIMA in Module 5, now with its seasonal arguments switched on. This lesson fits the specific model this whole module builds toward, reads its summary, and produces a forecast that finally looks like the seasonal series it’s forecasting.

By the end of this lesson, you will be able to:

Fit a SARIMA with SARIMAX using the order and seasonal_order arguments
Read a summary containing both non-seasonal and seasonal coefficients
Confirm the seasonal term is doing real work via its significance
Forecast the test year and see predictions that follow the seasonal shape

Let’s fit the model.

Fitting with SARIMAX

SARIMAX takes two order tuples: order=(p, d, q) for the non-seasonal part and seasonal_order=(P, D, Q, s) for the seasonal part. Fit SARIMA(1,1,0)(1,1,0)[12] — one non-seasonal AR term plus first differencing, one seasonal AR term plus seasonal differencing — to Cyclepath’s training set (first 84 months, the same split as Module 5):

import numpy as np, pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]

res = SARIMAX(train, order=(1, 1, 0), seasonal_order=(1, 1, 0, 12)).fit(disp=False)
print(round(res.aic, 2))     # 1056.25
print(res.nobs)              # 84

That’s the whole fitting call — one order, one seasonal_order, and .fit(). The model differenced the series both ways internally (d=1 and D=1), fit the two AR terms, and reported an AIC of 1056.25. (SARIMAX uses sensible defaults that enforce stationary and invertible coefficients, so you get clean, interpretable parameters without extra arguments.)

Reading the Summary

The summary’s coefficient table is where SARIMA’s structure becomes visible — a non-seasonal and a seasonal term, side by side:

print(res.summary().tables[1])

==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.4990      0.105     -4.743      0.000      -0.705      -0.293
ar.S.L12      -0.5093      0.093     -5.456      0.000      -0.692      -0.326
sigma2      1.464e+05   2.36e+04      6.211      0.000       1e+05    1.93e+05
==============================================================================

Two coefficients, both telling a clear story:

ar.L1 = -0.499, p = 0.000 — the non-seasonal AR term, highly significant. After differencing, each month’s change is negatively related to the previous month’s change (a common pattern in differenced series — an up-move tends to be followed by a partial pull-back).
ar.S.L12 = -0.509, p = 0.000 — the seasonal AR term, also highly significant. This is the term ARIMA never had: it connects each month directly to the same month one year earlier (after seasonal differencing).

Both p-values are 0.000 — a sharp contrast with Module 5’s non-seasonal ARIMA, where the terms came out insignificant because the model was the wrong shape. Here, both terms are pulling real weight, which is the first sign that SARIMA actually fits this series. The two coefficients being nearly equal (-0.499 and -0.509) is a coincidence of this particular data, but a tidy one: the model finds comparable structure at the short lag and the seasonal lag.

Forecasting the Test Year

Now the payoff. Forecast the 12 held-out months and compare a few against what actually happened:

fc = res.get_forecast(steps=12)
mean = fc.predicted_mean
ci = fc.conf_int()

for i in [0, 6, 11]:
    print(f"{mean.index[i].date()}  forecast {mean.iloc[i]:.0f}  actual {test.iloc[i]}")
# 2023-01-01  forecast 12969  actual 12941
# 2023-07-01  forecast 20354  actual 20533
# 2023-12-01  forecast 14518  actual 14272

Look at how closely these track: January forecast 12,969 vs. actual 12,941 (off by 28); July forecast 20,354 vs. actual 20,533 (off by 179, at the summer peak); December forecast 14,518 vs. actual 14,272. Crucially, the forecast rises and falls with the season — it climbs from a winter low through the July peak and back down, reproducing the yearly shape. Compare that to Module 5’s ARIMA(1,1,1), whose forecast was nearly flat (12,871 down to 12,400), blind to the seasonal swing. The seasonal AR term is exactly what makes this difference.

A line chart of Cyclepath ridership. The training portion rises with seasonal waves up to the end of 2022. Over the 2023 test region, the actual values (black line) trace a seasonal rise from a winter low to a July peak near 20,500 and back down. The SARIMA forecast (green line) sits almost exactly on top of the actual line, following every turn of the seasonal curve, with a light shaded confidence band around it. An annotation reads 'SARIMA forecast follows the seasonal shape — Jan off by 28, Jul off by 179'. — The SARIMA(1,1,0)(1,1,0)[12] forecast (green) tracks the actual 2023 seasonal curve (black) almost exactly — reproducing the winter trough, the July peak, and the descent. This is the seasonal shape a non-seasonal ARIMA could never produce.

get_forecast gives you the intervals too

As in Module 5, get_forecast returns both the point predictions (predicted_mean) and confidence intervals (conf_int()). Because this model actually fits the series well, its intervals stay far tighter than the non-seasonal model’s exploding fan — the better a model captures the real structure, the less residual uncertainty it has to spread across the forecast horizon. You’ll quantify exactly how well those forecasts perform, and confirm the model is statistically adequate, in the next two sections of this module.

Practice Exercises

Exercise 1: Change the order tuple

You want to fit SARIMA(0,1,1)(0,1,1)[12] (the airline model) instead. What exactly changes in the SARIMAX call?

Hint

Only the two tuples: SARIMAX(train, order=(0, 1, 1), seasonal_order=(0, 1, 1, 12)). The order becomes (0, 1, 1) — no non-seasonal AR, one difference, one non-seasonal MA — and the seasonal_order becomes (0, 1, 1, 12) — no seasonal AR, one seasonal difference, one seasonal MA, period 12. Everything else about the fitting call is identical; SARIMAX reads the structure entirely from those two tuples, which is what makes comparing different specifications so mechanical.

Exercise 2: Interpret a significant seasonal term

Why is it reassuring that ar.S.L12 came out significant (p = 0.000) here, when the terms in Module 5’s non-seasonal ARIMA were insignificant?

Hint

A significant seasonal term means it’s capturing real, non-random structure — the year-over-year relationship genuinely exists in the data and the model is using it, not just adding a parameter that could be zero. In Module 5, the non-seasonal terms came out insignificant precisely because the model had no way to capture the dominant (seasonal) structure, so its terms were straining at the wrong thing. The flip to strong significance here is direct evidence that SARIMA is the right shape for this series — the terms it adds are the terms the data actually needs.

Exercise 3: Predict the forecast shape

Without running code, how would you expect the forecast from SARIMA(1,1,0)(0,0,0)[12] (seasonal orders all zero) to look, compared to the model in this lesson?

Hint

With all seasonal orders zero, SARIMA(1,1,0)(0,0,0)[12] is just a non-seasonal ARIMA(1,1,0) — it has no seasonal terms at all, so its forecast would be roughly flat (or a smooth drift), exactly like Module 5’s models, missing the seasonal rise and fall entirely. The whole reason this lesson’s forecast follows the seasonal curve is the (1,1,0)[12] seasonal part; strip it out and you’re back to the non-seasonal failure. This is a useful sanity check: the seasonal shape in the forecast comes specifically from the seasonal orders, so a forecast that doesn’t follow the season means the seasonal terms aren’t there or aren’t working.

Summary

SARIMAX fits a full SARIMA from two tuples — order=(p,d,q) and seasonal_order=(P,D,Q,s). Fitting SARIMA(1,1,0)(1,1,0)[12] to Cyclepath’s training set gave a clean summary with two highly significant coefficients: ar.L1 = -0.499 (non-seasonal AR) and ar.S.L12 = -0.509 (seasonal AR), both at p = 0.000 — a sharp contrast with Module 5’s insignificant non-seasonal terms, and the first evidence SARIMA fits this series. The forecast tracks the seasonal shape closely: January off by 28, July (the peak) off by 179, December off by 246 — reproducing the winter-to-summer-to-winter curve that a non-seasonal ARIMA’s flat forecast never could. The seasonal AR term is precisely what makes that difference.

Key Concepts

SARIMAX(order, seasonal_order) — fit a full SARIMA from two tuples; the seasonal one carries (P, D, Q, s).
Reading a mixed summary — ar.L1 (non-seasonal) and ar.S.L12 (seasonal) sit side by side, distinguished by .S. and the lag.
Significant seasonal terms — both coefficients at p = 0.000 signal SARIMA is the right shape, unlike Module 5.
Seasonal forecast shape — the forecast follows the yearly rise and fall, driven specifically by the seasonal orders.

Why This Matters

This is the moment the course’s whole arc pays off: a model that produces a forecast actually shaped like the seasonal series it’s predicting, with every term statistically justified. Being able to fit a SARIMA in one call, read which terms are seasonal versus non-seasonal, and confirm from the significance column that the seasonal structure is real is the core practical skill of seasonal forecasting. But a good-looking forecast isn’t proof of adequacy — the next lesson runs the formal diagnostics (the Ljung-Box test Module 4’s preview model failed) to confirm this model’s residuals are genuinely white noise, not just that its forecast looks right.

Next Steps

Continue to Lesson 4 - Diagnostics: Is the Model Adequate?

Run the formal residual diagnostics — Ljung-Box and residual plots — to confirm the model is genuinely adequate, not just good-looking.

Back to Module Overview

Return to the Seasonality: SARIMA module overview

Continue Building Your Skills

You’ve fit a full SARIMA to Cyclepath, read its summary with seasonal and non-seasonal terms both significant, and seen a forecast that finally follows the seasonal shape. Next you’ll confirm the model is genuinely adequate — running the Ljung-Box residual test that the Module 4 preview model failed, and checking the residual plots that separate a model that looks right from one that is right.

Previous lesson

Lesson 2 - Seasonal AR and MA Terms

Next lesson

Lesson 4 - Diagnostics: Is the Model Adequate?

Courses

DATATWEETS

Title here

Lesson 3 - Fitting SARIMA with statsmodels

Welcome to Fitting SARIMA with statsmodels

Fitting with SARIMAX

Reading the Summary

Forecasting the Test Year

Practice Exercises

Exercise 1: Change the order tuple

Exercise 2: Interpret a significant seasonal term

Exercise 3: Predict the forecast shape

Summary

Key Concepts

Why This Matters

Next Steps

Continue to Lesson 4 - Diagnostics: Is the Model Adequate?

Back to Module Overview

Continue Building Your Skills

Lesson 3 - Fitting SARIMA with statsmodels

Welcome to Fitting SARIMA with statsmodels#

Fitting with SARIMAX#

Reading the Summary#

Forecasting the Test Year#

Practice Exercises#

Exercise 1: Change the order tuple#

Exercise 2: Interpret a significant seasonal term#

Exercise 3: Predict the forecast shape#

Summary#

Key Concepts#

Why This Matters#

Next Steps#

Continue to Lesson 4 - Diagnostics: Is the Model Adequate?

Back to Module Overview

Continue Building Your Skills#

Welcome to Fitting SARIMA with statsmodels

Fitting with SARIMAX

Reading the Summary

Forecasting the Test Year

Practice Exercises

Exercise 1: Change the order tuple

Exercise 2: Interpret a significant seasonal term

Exercise 3: Predict the forecast shape

Summary

Key Concepts

Why This Matters

Next Steps

Continue Building Your Skills