Lesson 3 - Fitting SARIMA with statsmodels
Welcome to Fitting SARIMA with statsmodels
You’ve built the seasonal terms in isolation; now you’ll fit a complete SARIMA with both seasonal and non-seasonal parts to Cyclepath. The tool is SARIMAX — the same class whose seasonal-free special case you used as ARIMA in Module 5, now with its seasonal arguments switched on. This lesson fits the specific model this whole module builds toward, reads its summary, and produces a forecast that finally looks like the seasonal series it’s forecasting.
By the end of this lesson, you will be able to:
- Fit a SARIMA with
SARIMAXusing theorderandseasonal_orderarguments - Read a summary containing both non-seasonal and seasonal coefficients
- Confirm the seasonal term is doing real work via its significance
- Forecast the test year and see predictions that follow the seasonal shape
Let’s fit the model.
Fitting with SARIMAX
SARIMAX takes two order tuples: order=(p, d, q) for the non-seasonal part and seasonal_order=(P, D, Q, s) for the seasonal part. Fit SARIMA(1,1,0)(1,1,0)[12] — one non-seasonal AR term plus first differencing, one seasonal AR term plus seasonal differencing — to Cyclepath’s training set (first 84 months, the same split as Module 5):
import numpy as np, pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96); rng = np.random.default_rng(42)
trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")
y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]
res = SARIMAX(train, order=(1, 1, 0), seasonal_order=(1, 1, 0, 12)).fit(disp=False)
print(round(res.aic, 2)) # 1056.25
print(res.nobs) # 84That’s the whole fitting call — one order, one seasonal_order, and .fit(). The model differenced the series both ways internally (d=1 and D=1), fit the two AR terms, and reported an AIC of 1056.25. (SARIMAX uses sensible defaults that enforce stationary and invertible coefficients, so you get clean, interpretable parameters without extra arguments.)
Reading the Summary
The summary’s coefficient table is where SARIMA’s structure becomes visible — a non-seasonal and a seasonal term, side by side:
print(res.summary().tables[1])==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -0.4990 0.105 -4.743 0.000 -0.705 -0.293
ar.S.L12 -0.5093 0.093 -5.456 0.000 -0.692 -0.326
sigma2 1.464e+05 2.36e+04 6.211 0.000 1e+05 1.93e+05
==============================================================================Two coefficients, both telling a clear story:
ar.L1= -0.499, p = 0.000 — the non-seasonal AR term, highly significant. After differencing, each month’s change is negatively related to the previous month’s change (a common pattern in differenced series — an up-move tends to be followed by a partial pull-back).ar.S.L12= -0.509, p = 0.000 — the seasonal AR term, also highly significant. This is the term ARIMA never had: it connects each month directly to the same month one year earlier (after seasonal differencing).
Both p-values are 0.000 — a sharp contrast with Module 5’s non-seasonal ARIMA, where the terms came out insignificant because the model was the wrong shape. Here, both terms are pulling real weight, which is the first sign that SARIMA actually fits this series. The two coefficients being nearly equal (-0.499 and -0.509) is a coincidence of this particular data, but a tidy one: the model finds comparable structure at the short lag and the seasonal lag.
Forecasting the Test Year
Now the payoff. Forecast the 12 held-out months and compare a few against what actually happened:
fc = res.get_forecast(steps=12)
mean = fc.predicted_mean
ci = fc.conf_int()
for i in [0, 6, 11]:
print(f"{mean.index[i].date()} forecast {mean.iloc[i]:.0f} actual {test.iloc[i]}")
# 2023-01-01 forecast 12969 actual 12941
# 2023-07-01 forecast 20354 actual 20533
# 2023-12-01 forecast 14518 actual 14272Look at how closely these track: January forecast 12,969 vs. actual 12,941 (off by 28); July forecast 20,354 vs. actual 20,533 (off by 179, at the summer peak); December forecast 14,518 vs. actual 14,272. Crucially, the forecast rises and falls with the season — it climbs from a winter low through the July peak and back down, reproducing the yearly shape. Compare that to Module 5’s ARIMA(1,1,1), whose forecast was nearly flat (12,871 down to 12,400), blind to the seasonal swing. The seasonal AR term is exactly what makes this difference.
get_forecast gives you the intervals too
As in Module 5, get_forecast returns both the point predictions (predicted_mean) and confidence intervals (conf_int()). Because this model actually fits the series well, its intervals stay far tighter than the non-seasonal model’s exploding fan — the better a model captures the real structure, the less residual uncertainty it has to spread across the forecast horizon. You’ll quantify exactly how well those forecasts perform, and confirm the model is statistically adequate, in the next two sections of this module.
Practice Exercises
Exercise 1: Change the order tuple
You want to fit SARIMA(0,1,1)(0,1,1)[12] (the airline model) instead. What exactly changes in the SARIMAX call?
Hint
Only the two tuples: SARIMAX(train, order=(0, 1, 1), seasonal_order=(0, 1, 1, 12)). The order becomes (0, 1, 1) — no non-seasonal AR, one difference, one non-seasonal MA — and the seasonal_order becomes (0, 1, 1, 12) — no seasonal AR, one seasonal difference, one seasonal MA, period 12. Everything else about the fitting call is identical; SARIMAX reads the structure entirely from those two tuples, which is what makes comparing different specifications so mechanical.
Exercise 2: Interpret a significant seasonal term
Why is it reassuring that ar.S.L12 came out significant (p = 0.000) here, when the terms in Module 5’s non-seasonal ARIMA were insignificant?
Hint
A significant seasonal term means it’s capturing real, non-random structure — the year-over-year relationship genuinely exists in the data and the model is using it, not just adding a parameter that could be zero. In Module 5, the non-seasonal terms came out insignificant precisely because the model had no way to capture the dominant (seasonal) structure, so its terms were straining at the wrong thing. The flip to strong significance here is direct evidence that SARIMA is the right shape for this series — the terms it adds are the terms the data actually needs.
Exercise 3: Predict the forecast shape
Without running code, how would you expect the forecast from SARIMA(1,1,0)(0,0,0)[12] (seasonal orders all zero) to look, compared to the model in this lesson?
Hint
With all seasonal orders zero, SARIMA(1,1,0)(0,0,0)[12] is just a non-seasonal ARIMA(1,1,0) — it has no seasonal terms at all, so its forecast would be roughly flat (or a smooth drift), exactly like Module 5’s models, missing the seasonal rise and fall entirely. The whole reason this lesson’s forecast follows the seasonal curve is the (1,1,0)[12] seasonal part; strip it out and you’re back to the non-seasonal failure. This is a useful sanity check: the seasonal shape in the forecast comes specifically from the seasonal orders, so a forecast that doesn’t follow the season means the seasonal terms aren’t there or aren’t working.
Summary
SARIMAX fits a full SARIMA from two tuples — order=(p,d,q) and seasonal_order=(P,D,Q,s). Fitting SARIMA(1,1,0)(1,1,0)[12] to Cyclepath’s training set gave a clean summary with two highly significant coefficients: ar.L1 = -0.499 (non-seasonal AR) and ar.S.L12 = -0.509 (seasonal AR), both at p = 0.000 — a sharp contrast with Module 5’s insignificant non-seasonal terms, and the first evidence SARIMA fits this series. The forecast tracks the seasonal shape closely: January off by 28, July (the peak) off by 179, December off by 246 — reproducing the winter-to-summer-to-winter curve that a non-seasonal ARIMA’s flat forecast never could. The seasonal AR term is precisely what makes that difference.
Key Concepts
- SARIMAX(order, seasonal_order) — fit a full SARIMA from two tuples; the seasonal one carries
(P, D, Q, s). - Reading a mixed summary —
ar.L1(non-seasonal) andar.S.L12(seasonal) sit side by side, distinguished by.S.and the lag. - Significant seasonal terms — both coefficients at p = 0.000 signal SARIMA is the right shape, unlike Module 5.
- Seasonal forecast shape — the forecast follows the yearly rise and fall, driven specifically by the seasonal orders.
Why This Matters
This is the moment the course’s whole arc pays off: a model that produces a forecast actually shaped like the seasonal series it’s predicting, with every term statistically justified. Being able to fit a SARIMA in one call, read which terms are seasonal versus non-seasonal, and confirm from the significance column that the seasonal structure is real is the core practical skill of seasonal forecasting. But a good-looking forecast isn’t proof of adequacy — the next lesson runs the formal diagnostics (the Ljung-Box test Module 4’s preview model failed) to confirm this model’s residuals are genuinely white noise, not just that its forecast looks right.
Next Steps
Continue to Lesson 4 - Diagnostics: Is the Model Adequate?
Run the formal residual diagnostics — Ljung-Box and residual plots — to confirm the model is genuinely adequate, not just good-looking.
Back to Module Overview
Return to the Seasonality: SARIMA module overview
Continue Building Your Skills
You’ve fit a full SARIMA to Cyclepath, read its summary with seasonal and non-seasonal terms both significant, and seen a forecast that finally follows the seasonal shape. Next you’ll confirm the model is genuinely adequate — running the Ljung-Box residual test that the Module 4 preview model failed, and checking the residual plots that separate a model that looks right from one that is right.