Lesson 1 - The Autoregressive (AR) Model

Welcome to the Autoregressive (AR) Model

Module 4 ended with a shortlist of orders and a promise: the models those orders describe come next. This is the first of them. The autoregressive model is the most intuitive place to start, because it formalizes an idea you’ve been circling since Module 1 — that a time series value is related to its own recent past. AR just makes that precise: today is a weighted sum of yesterday, the day before, and so on, plus fresh randomness.

By the end of this lesson, you will be able to:

  • Write the AR(p) equation and explain each term
  • Fit an AR(1) model with statsmodels and recover a known coefficient
  • Interpret the fitted coefficient and what it says about persistence
  • Describe the characteristic mean-reverting forecast behavior of an AR model

Let’s write the model down.


The AR(p) Equation

An autoregressive model of order p, written AR(p), predicts the current value from the previous p values:

y_t = c + phi_1 * y_{t-1} + phi_2 * y_{t-2} + ... + phi_p * y_{t-p} + e_t

Each phi (phi) is a coefficient weighting how much a particular past value contributes; c is a constant that sets the overall level; and e_t is the random shock at time t — the genuinely unpredictable part, assumed to be white noise. The simplest useful version is AR(1): y_t = c + phi_1 * y_{t-1} + e_t, today as a fraction of yesterday plus a constant plus noise. This is exactly the process you saw in Module 4’s AR(1) signature demonstration — now you’ll fit it rather than just read its autocorrelation.


Fitting an AR(1) and Recovering the Coefficient

The honest way to trust a fitting procedure is to run it on data where you already know the answer. Build a synthetic AR(1) with a known coefficient of phi = 0.7, then fit an AR(1) and see whether the fit recovers it:

import numpy as np
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.arima.model import ARIMA

rng = np.random.default_rng(11)
ar1 = ArmaProcess(np.array([1, -0.7]), np.array([1])).generate_sample(
    nsample=2000, distrvs=lambda size: rng.normal(0, 1, size)
)

res = ARIMA(ar1, order=(1, 0, 0), trend="n").fit()
print(round(res.params[0], 3))    # 0.701  <- the ar.L1 coefficient
print(round(res.aic, 2))          # 5686.89

The fitted ar.L1 coefficient is 0.701 — essentially the true 0.7 the data was built with, off only by sampling noise across 2,000 points. That’s the reassurance you want: ARIMA(order=(1,0,0)) — order (1, 0, 0) meaning one AR term, no differencing, no MA term — correctly recovered the autoregressive structure. The AIC of 5686.89 is a model-quality score you’ll use for comparison shortly; lower is better, exactly as in Module 4’s preview.

Why order=(1,0,0)?

statsmodels uses a single unified ARIMA class for the whole family, parameterized by order=(p, d, q): p AR terms, d differences, q MA terms. An AR(1) is just (1, 0, 0) — one AR term and nothing else. You’ll meet d (differencing) and q (MA terms) in the next two lessons; for now, (p, 0, 0) is a pure AR(p) model. Using one class for everything means the fitting, forecasting, and diagnostic tools you learn here work identically for every model in the rest of this module and the next.


Does an Extra Term Help? Comparing AR(1) to AR(2)

If AR(1) recovers the truth, what happens if you over-specify and fit an AR(2) to the same AR(1) data? The second coefficient should come out near zero, and the AIC should get slightly worse (higher), because you’ve added a parameter that isn’t earning its place:

res2 = ARIMA(ar1, order=(2, 0, 0), trend="n").fit()
print(np.round(res2.params[:2], 3))   # [0.707 -0.008]
print(round(res2.aic, 2))             # 5688.75

The second coefficient is -0.008 — indistinguishable from zero, correctly telling you there’s no genuine second-lag structure. And the AIC rose from 5686.89 to 5688.75: adding a useless parameter made the model score slightly worse, because AIC penalizes complexity that doesn’t buy enough improvement in fit. This is the whole logic of order selection in miniature — and exactly why Module 4 worked so hard to shortlist orders rather than throwing every term at the wall. Lesson 4 formalizes using AIC to choose.


The Signature Behavior: Forecasts Decay to the Mean

The defining feature of an AR model — the thing that separates it from the MA model in the next lesson — is how it forecasts. Build an AR(1) series centered on a mean of 50, fit it, and forecast six steps ahead:

rng = np.random.default_rng(11)   # reset seed so this block is reproducible on its own
ar1b = ArmaProcess(np.array([1, -0.7]), np.array([1])).generate_sample(
    nsample=300, distrvs=lambda size: rng.normal(0, 1, size)
) + 50

res_b = ARIMA(ar1b, order=(1, 0, 0), trend="c").fit()
print(np.round(res_b.forecast(steps=6), 2))
# [49.01 49.32 49.54 49.69 49.8  49.88]

The six-step forecast climbs steadily back toward the series mean of ~50: 49.01 → 49.32 → 49.54 → 49.69 → 49.80 → 49.88. Each step recovers about 70% of the remaining gap (that’s the phi = 0.7 at work), so the forecast approaches the mean geometrically but never quite reaches it in finite steps. This is mean reversion, and it’s the hallmark of an AR forecast: the influence of the last observed value fades smoothly with each step into the future, pulling the prediction back toward the long-run average. (trend="c" adds a constant so the model knows the mean isn’t zero.)

A line chart showing a wiggly AR(1) series ending at a value well below its mean line drawn at 50. From the last observed point, a forecast line curves smoothly upward across six future steps, getting closer to the mean line at each step but never quite reaching it, with the gap to the mean shrinking by about 70 percent each step. The curve is labeled 'forecast decays geometrically toward the mean'.
The AR(1) forecast (from a series ending below its mean of 50) climbs back toward that mean geometrically — 49.01, 49.32, 49.54, and so on — recovering about 70% of the remaining gap each step. Mean reversion is the signature of an autoregressive forecast.

Practice Exercises

Exercise 1: Interpret a coefficient

You fit an AR(1) to a series and get phi_1 = 0.95. What does that large coefficient tell you about the series, compared to one where phi_1 = 0.3?

Hint

A phi_1 of 0.95 means the series is highly persistent — today is 95% of yesterday (plus noise), so shocks fade very slowly and the series wanders far from its mean for long stretches before returning. Its forecast would decay toward the mean very slowly (recovering only 5% of the gap per step). A phi_1 of 0.3 is much less persistent — today is only 30% of yesterday, shocks die out quickly, and forecasts snap back toward the mean fast. The closer phi_1 is to 1, the longer the series’ memory; the closer to 0, the more it behaves like independent noise around its mean.

Exercise 2: Predict the AIC comparison

You have data genuinely generated by an AR(2) process. If you fit both an AR(1) and an AR(2), which should have the lower AIC, and why is this the opposite of what happened in the lesson?

Hint

The AR(2) should have the lower (better) AIC here, because the data genuinely has second-lag structure that an AR(1) can’t capture — so the extra parameter earns its place by improving the fit enough to outweigh the complexity penalty. This is the mirror image of the lesson, where the data was truly AR(1) and the extra AR(2) term was useless, making AIC rise. The principle is symmetric: AIC rewards a parameter when it captures real structure and penalizes it when it doesn’t — which is exactly what makes it a useful order-selection tool.

Exercise 3: Reason about the forecast

An AR(1) series with phi_1 = 0.7 and mean 50 is currently sitting at 80 (well above its mean). Without running code, roughly where will the one-step and two-step forecasts land?

Hint

Working from the mean: the current value is 30 above the mean (80 - 50). The one-step forecast recovers about 30% of that gap back toward the mean… actually it keeps phi_1 = 0.7 of the deviation, so the one-step forecast sits at roughly 50 + 0.7 × 30 = 71. The two-step forecast keeps 0.7 of that remaining deviation: 50 + 0.7 × 21 ≈ 64.7. Each step the deviation from the mean shrinks by a factor of 0.7, so the forecast decays 80 → ~71 → ~65 → … toward 50 — the same geometric mean reversion the lesson showed, just starting from above the mean instead of below it.


Summary

The autoregressive model AR(p) writes today’s value as a constant plus weighted contributions from the previous p values plus a random shock: y_t = c + phi_1 y_{t-1} + ... + phi_p y_{t-p} + e_t. Fit with statsmodels’ unified ARIMA class using order=(p, 0, 0), an AR(1) recovered a known coefficient of 0.7 almost exactly (0.701) from 2,000 synthetic points. Over-specifying to AR(2) produced a near-zero second coefficient (-0.008) and a slightly worse AIC (5688.75 vs. 5686.89), demonstrating in miniature why AIC is a useful order-selection tool. The signature behavior of an AR forecast is mean reversion: from a series ending below its mean of 50, the forecast climbed geometrically back — 49.01, 49.32, 49.54, … — recovering about 70% of the remaining gap each step.

Key Concepts

  • AR(p) equation — today as a constant plus weighted past values plus a shock.
  • Coefficient recovery — fitting an AR(1) to known-phi data returns that phi, confirming the procedure works.
  • AIC and over-specification — an unneeded extra term comes out near zero and raises AIC, the basis of order selection.
  • Mean-reverting forecasts — an AR forecast decays geometrically back toward the series mean; larger phi means slower decay and longer memory.

Why This Matters

The AR model is the foundation of the entire ARIMA family and, through it, of SARIMA in the next module — the “AR” and the seasonal-AR terms you’ll configure are exactly this machinery. Understanding that an AR forecast reverts to the mean, and that its coefficient measures persistence, is what lets you reason about a model’s behavior before you even run it — and what makes the contrast with the moving-average model, whose forecasts behave completely differently, so instructive. Next, you’ll build that MA model and see a forecast that goes flat almost immediately, the opposite of AR’s gradual decay.


Next Steps

Continue to Lesson 2 - The Moving Average (MA) Model

Build the MA(q) model — today as a sum of recent random shocks — and see its forecast go flat after q steps.

Back to Module Overview

Return to the AR, MA, ARMA, ARIMA module overview


Continue Building Your Skills

You’ve built and fit the autoregressive model, recovered a known coefficient, and seen its forecast revert smoothly to the mean. Next you’ll build its counterpart — the moving-average model, which depends on recent shocks rather than recent values — and discover that its forecast behaves in a completely different way, going flat almost immediately instead of decaying gradually.