Lesson 4 - Fitting and Forecasting ARIMA with statsmodels

Welcome to Fitting and Forecasting ARIMA with statsmodels

You’ve fit models in the last three lessons, but only glanced at a coefficient or two. A fitted ARIMA carries far more information than that — a full statistical summary that tells you which terms are pulling their weight, how well the model fits, and, when you forecast, how uncertain each prediction is. This lesson slows down to read all of it properly, because a forecast without its uncertainty is worse than useless: it’s misleading.

By the end of this lesson, you will be able to:

  • Read an ARIMA model summary: coefficient table, standard errors, and p-values
  • Judge which coefficients are statistically significant and what that implies
  • Produce point forecasts and confidence intervals with get_forecast
  • Explain why forecast intervals widen with the horizon, and read that widening honestly

Let’s fit a model and read everything it tells us.


The Model Summary

Fit an ARIMA(1, 1, 1) with a drift term (trend="t") to Cyclepath’s training set — the first 84 months, holding out the last 12 as a test set, exactly the split from Module 1 — and print its summary:

import numpy as np, pandas as pd
from statsmodels.tsa.arima.model import ARIMA

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()
train, test = y.iloc[:-12], y.iloc[-12:]

res = ARIMA(train, order=(1, 1, 1), trend="t").fit()
print(res.summary())

The interesting part is the coefficient table:

                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            92.2648    264.984      0.348      0.728    -427.095     611.624
ar.L1          0.6561      0.103      6.399      0.000       0.455       0.857
ma.L1          0.0629      0.148      0.424      0.671      -0.227       0.353
sigma2      6.735e+05   1.25e+05      5.395      0.000    4.29e+05    9.18e+05

Each row is a fitted parameter. x1 is the drift (trend) term, ar.L1 the autoregressive coefficient, ma.L1 the moving-average coefficient, and sigma2 the estimated shock variance. The columns that matter most: coef (the estimate), and P>|z| (the p-value — the same significance logic as the ADF and Ljung-Box tests, applied to each coefficient, with the null being “this coefficient is really zero”).


Reading the Significance Column

The p-values tell a revealing story about this model:

  • ar.L1 = 0.656, p = 0.000 — highly significant. There’s genuine autoregressive persistence in the differenced series, and this term earns its place.
  • ma.L1 = 0.063, p = 0.671 — not significant. The MA term is close to zero and could plausibly be zero; it’s not contributing.
  • x1 (drift) = 92.26, p = 0.728 — not significant either, despite being numerically close to Cyclepath’s true trend slope of 90. With only 84 noisy points and a strong seasonal swing muddying things, the drift can’t be distinguished from zero at conventional confidence.

Two of the four parameters aren’t statistically significant. That’s a real signal — it hints that ARIMA(1, 1, 1) isn’t quite the right specification for this series, which makes sense: Cyclepath’s dominant structure is seasonal, and a non-seasonal ARIMA has no term built to capture that. Reading the significance column is how you catch this kind of mismatch early, rather than trusting a forecast blindly. (The capstone in Lesson 5 confronts this limitation head-on.)

AIC/BIC for comparison, p-values for inspection

The summary also reports AIC (1365.36 here) and BIC (1375.04) in its header — use these to compare whole models against each other, as Module 4’s preview did and as Lesson 5 will. The p-values in the coefficient table serve a different purpose: inspecting within one model whether each individual term is justified. A model can have a competitive AIC while still containing insignificant terms — both views are worth checking, and they answer different questions.


Forecasting with Confidence Intervals

A point forecast alone is a false promise of precision. Use get_forecast to get both the point prediction and a confidence interval that quantifies how uncertain it is:

fc = res.get_forecast(steps=12)
mean = fc.predicted_mean
ci = fc.conf_int()

print(f"{mean.iloc[0]:.0f}  [{ci.iloc[0,0]:.0f}, {ci.iloc[0,1]:.0f}]")     # 12871  [11262, 14479]
print(f"{mean.iloc[-1]:.0f}  [{ci.iloc[-1,0]:.0f}, {ci.iloc[-1,1]:.0f}]")   # 12400  [-2425, 27225]

The one-step-ahead forecast is 12,871 trips, with a 95% confidence interval of [11,262, 14,479] — about 3,200 wide. The twelve-step-ahead forecast is 12,400, but its interval has ballooned to [-2,425, 27,225] — nearly 30,000 wide, roughly 9× the step-1 interval, and wide enough to include negative trips (impossible for a count, a sign the interval is straining). This widening isn’t a defect; it’s honesty. The further ahead you forecast, the more unknown future shocks accumulate, and the interval grows to reflect genuinely increasing uncertainty.

A line chart showing the tail of a training series ending around month 84, then a point forecast line extending 12 steps to the right that stays relatively flat around 12,400. Around the forecast line is a shaded confidence band that starts narrow (about 3,200 wide at step 1) and fans out dramatically to nearly 30,000 wide by step 12, with the lower edge of the band dipping below zero near the end. The flat forecast is annotated 'non-seasonal ARIMA misses the seasonal swing', and the fanning band is annotated 'uncertainty grows with horizon'.
The ARIMA(1,1,1) forecast with 95% confidence intervals: the interval fans from about 3,200 wide at step 1 to nearly 30,000 at step 12 as uncertainty compounds. The flat point forecast also reveals the non-seasonal model's blind spot — it can't reproduce the seasonal rise and fall the real test period will show.

Notice, too, that the point forecast is nearly flat (12,871 down to 12,400) — a non-seasonal ARIMA has no way to reproduce the summer peak and winter trough that Cyclepath’s actual 2023 will show. Both the flat mean and the exploding interval are pointing at the same conclusion, which the capstone makes concrete.


Practice Exercises

Exercise 1: Interpret an insignificant term

Your ARIMA summary shows ma.L1 with a coefficient of 0.05 and a p-value of 0.83. What should you consider doing, and why?

Hint

A p-value of 0.83 means the MA term is nowhere near significant — it’s statistically indistinguishable from zero and isn’t contributing. You should consider dropping it: refit as a model with q reduced by one (e.g., ARIMA(1,1,0) instead of ARIMA(1,1,1)) and compare AIC. A simpler model that fits about as well is preferable — fewer parameters means less overfitting and more stable forecasts. This is the same parsimony principle from Lesson 1’s AR(1)-vs-AR(2) comparison, now applied by reading a p-value rather than comparing two fits directly.

Exercise 2: Why does the interval widen?

Explain in your own words why the step-12 confidence interval (~30,000 wide) is so much larger than the step-1 interval (~3,200 wide).

Hint

Each step into the future adds another unknown shock that the model can’t predict, and those uncertainties compound. At step 1, only one future shock is unknown, so the interval is relatively tight. By step 12, twelve future shocks have accumulated, each adding variance, and the interval has widened to reflect all of them — plus, because this is an integrated (d=1) model, the uncertainty in the level keeps growing rather than settling, which makes the fanning especially dramatic. A forecast interval that didn’t widen with the horizon would be lying about how much it actually knows.

Exercise 3: A negative lower bound

The step-12 interval was [-2,425, 27,225], and bike trips can’t be negative. Is the model broken?

Hint

Not broken, but straining. ARIMA assumes normally-distributed shocks, and a normal distribution is symmetric and unbounded — so a wide enough interval will eventually cross zero even when the quantity (a count) can’t actually be negative. It’s a sign the interval has grown very wide relative to the series level (here, the uncertainty is comparable to the value itself), which usually means the model is a poor fit for a long horizon on this series — exactly the case here, since a non-seasonal model is the wrong shape for Cyclepath. Practical fixes include modeling log(y) so forecasts stay positive, or (better here) using a model that actually fits — which is what SARIMA in Module 6 provides. The negative bound is a symptom worth heeding, not an error to ignore.


Summary

A fitted ARIMA’s .summary() exposes the full picture: a coefficient table with estimates, standard errors, and p-values, plus fit scores (AIC 1365.36, BIC 1375.04). On Cyclepath’s training set, ARIMA(1,1,1)+drift showed a significant AR term (ar.L1 = 0.656, p = 0.000) but an insignificant MA term (p = 0.671) and drift (p = 0.728) — a hint that a non-seasonal model isn’t the right specification for this seasonal series. Forecasting with get_forecast produced point predictions and confidence intervals that widened honestly with the horizon: from about 3,200 wide at step 1 to nearly 30,000 at step 12 (even crossing into impossible negative values), because unknown future shocks accumulate. Both the flat point forecast and the exploding interval point to the same limitation — one the capstone confronts directly.

Key Concepts

  • Coefficient tablecoef, std err, and P>|z| (p-value) for each parameter; read p-values to judge which terms are justified.
  • Significance as a diagnostic — insignificant terms (like the MA and drift here) hint at model misspecification.
  • AIC/BIC vs. p-values — AIC/BIC compare whole models; p-values inspect terms within one model.
  • Widening forecast intervals — uncertainty compounds with the horizon; an interval that doesn’t widen is dishonest.

Why This Matters

Producing a point forecast is easy; knowing whether to trust it is the actual skill, and it lives in the summary and the confidence intervals. Reading which coefficients are significant catches a misspecified model before it misleads you, and reading how fast the intervals fan out tells you how far ahead the model can credibly see. These habits — inspect the summary, always forecast with intervals, treat a wide or impossible interval as a warning — are what separate careful forecasting from point-prediction theater. Next, the capstone puts all of Module 5 together on Cyclepath, and confronts the honest result these diagnostics have been foreshadowing: a non-seasonal ARIMA, however carefully fit, can’t cleanly beat a simple seasonal baseline.


Next Steps

Continue to Lesson 5 - Guided Project: An ARIMA Forecast for Cyclepath

Fit ARIMA to Cyclepath, forecast the test year, and confront why a non-seasonal model can't beat the seasonal-naive baseline.

Back to Module Overview

Return to the AR, MA, ARMA, ARIMA module overview


Continue Building Your Skills

You can now fit an ARIMA, read its full summary to judge each coefficient, and forecast with honest, widening uncertainty intervals. Next, the capstone brings all of Module 5 together on Cyclepath — fitting real candidates, forecasting the held-out test year, and comparing against the seasonal-naive baseline from Module 1 — to reach the honest conclusion these diagnostics keep hinting at, and the exact reason the next module exists.