Lesson 3 - ARMA and ARIMA: Combining and Differencing

Welcome to ARMA and ARIMA

You’ve built the two halves separately: AR (past values) in Lesson 1, MA (past shocks) in Lesson 2. This lesson assembles them. First, ARMA simply uses both kinds of terms at once. Then the crucial step: ARIMA adds the letter “I” — for integrated — which is nothing more exotic than the differencing you already did by hand in Module 3, now folded directly into the model. That single addition is what lets these models handle the trending, non-stationary series you’ve actually been working with all along.

By the end of this lesson, you will be able to:

  • Write the ARMA(p,q) model and explain when you’d want both term types
  • Explain what the “I” and the d parameter mean in ARIMA(p,d,q)
  • Connect ARIMA’s internal differencing to the manual differencing of Module 3
  • Fit an ARIMA directly to a raw, trending series with no manual differencing

Let’s combine the pieces.


ARMA: Both Kinds of Terms at Once

An ARMA(p, q) model has both p autoregressive terms and q moving-average terms:

y_t = c + phi_1 y_{t-1} + ... + phi_p y_{t-p}   (the AR part)
          + e_t + theta_1 e_{t-1} + ... + theta_q e_{t-q}   (the MA part)

Today is a weighted sum of recent values and recent shocks. Many real series are best described by a mix — some persistence carried through past values (AR) and some short-term echo of recent surprises (MA). This is exactly the case Module 4 flagged as harder to read: when both the ACF and PACF tail off rather than one cutting off cleanly, the process has both AR and MA components, and an ARMA model is what captures it. Crucially, though, ARMA assumes the series is stationary — it has no mechanism for a trend. That’s the gap ARIMA fills.


The “I”: Integration Is Just Differencing

ARMA works on stationary data. But every series you’ve cared about — Cyclepath above all — has a trend, which breaks stationarity (Module 3 proved this formally with the ADF test). Module 3’s fix was differencing: model the change rather than the level. ARIMA(p, d, q) builds that fix directly into the model. The d is the order of integration — the number of times the series is differenced before the ARMA(p, q) part is applied:

  • d = 0: no differencing — ARIMA(p, 0, q) is just ARMA(p, q), for already-stationary data.
  • d = 1: difference once — removes a linear trend, the most common case.
  • d = 2: difference twice — removes a quadratic (curving) trend.

“Integrated” is the technical term because differencing is the inverse of integration (summing) — the model differences to fit, then sums back up to forecast on the original scale. The practical takeaway is simpler: the d in ARIMA is exactly the differencing decision you made by hand in Module 3, now handed to the model as a parameter.

You already chose d in Module 3

Module 3 spent an entire module deciding how much to difference Cyclepath — comparing raw (fails ADF), d=1 (passes but leaves seasonality), and seasonal differencing. That work wasn’t separate from modeling; it was choosing the d (and, in Module 6, the seasonal D). When you write ARIMA(order=(p, 1, q)), the 1 is the first-differencing you validated back then. The ACF/PACF reading from Module 4 chose p and q; the stationarity work from Module 3 chose d. ARIMA just collects all three into one tuple.


Because the differencing is internal, you can hand ARIMA the raw, trending Cyclepath series and let the d parameter do the stationarizing — no manual .diff() required:

import numpy as np, pandas as pd
from statsmodels.tsa.arima.model import ARIMA

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()

res = ARIMA(y, order=(1, 1, 1), trend="n").fit()
print(res.nobs)                        # 96
print(round(res.arparams[0], 3))       # 0.663   <- ar.L1
print(round(res.maparams[0], 3))       # 0.081   <- ma.L1
print(round(res.aic, 2))               # 1557.27

ARIMA(y, order=(1, 1, 1)) fit the raw 96-point trending series directly: it differenced once internally (d=1), then fit an ARMA(1, 1) to the result, giving an AR coefficient of 0.663 and an MA coefficient of 0.081. You never called .diff() — the model handled it. This is the payoff of the unified formulation: the stationarity work, the AR terms, and the MA terms all live in one order=(p, d, q) tuple, and the model manages the differencing and the eventual “integrating back” for forecasts on the original scale.

A labeled diagram of the notation ARIMA(p, d, q) with three arrows pointing to boxes. The p arrow points to a box reading 'AR: number of past values, chosen from PACF (Module 4)'. The d arrow points to a box reading 'I: number of differences, chosen for stationarity (Module 3)'. The q arrow points to a box reading 'MA: number of past shocks, chosen from ACF (Module 4)'. Below, a horizontal pipeline shows: raw series, then a differencing box labeled 'difference d times', then an ARMA(p,q) box labeled 'fit AR and MA on the stationary series', then a forecast box labeled 'integrate back to original scale'.
ARIMA(p, d, q) in one picture: d differences the series to stationarity (Module 3's job), then an ARMA(p, q) fits the AR and MA terms (Module 4's job), and forecasts are integrated back to the original scale. Every parameter traces to work you've already done.

A Note on Manual vs. Internal Differencing

You might wonder whether fitting ARIMA(y, order=(1,1,1)) on the raw series is exactly the same as differencing by hand and fitting ARIMA(y.diff().dropna(), order=(1,0,1)). Conceptually yes, and the fitted coefficients come out very close — but not always bit-for-bit identical, because the two approaches handle the first few observations and the constant term slightly differently under the hood. In practice, prefer letting ARIMA do the differencing (d in the order tuple): it keeps the forecasts automatically on the original scale (no manual “un-differencing” to reverse the .diff()), and it’s less error-prone. Manual differencing remains useful for diagnosis — as you did in Modules 3 and 4 to inspect stationarity and read ACF/PACF — but for the final model, fold it into d.


Practice Exercises

Exercise 1: Translate a description into an order

A series has a clear linear trend, and after differencing once, its ACF cuts off after lag 1 while its PACF tails off. What ARIMA order (p, d, q) does this suggest?

Hint

(0, 1, 1). The linear trend calls for d = 1 (difference once to remove it). After differencing, “ACF cuts off after lag 1, PACF tails off” is the MA(1) signature from Module 4 — so q = 1 and p = 0. Putting it together: ARIMA(0, 1, 1). This is one of the most common real-world specifications, sometimes called “simple exponential smoothing with a trend” in disguise, and it’s exactly the kind of reading Module 4 trained you to do — now expressed as an ARIMA order.

Exercise 2: What does d=0 mean?

If you fit ARIMA(order=(2, 0, 1)), what are you assuming about the series, and what simpler name does this model have?

Hint

With d = 0, no differencing happens, so you’re assuming the series is already stationary — no trend to remove. ARIMA(2, 0, 1) is therefore just an ARMA(2, 1) model: two AR terms and one MA term on the raw series. You’d use d = 0 for a series that already passed the ADF test without differencing (or one you’d already differenced by hand and are now modeling). The ARIMA class simply treats ARMA as the special case of ARIMA where the integration order is zero.

Exercise 3: Why not always use a big d?

If differencing removes trends, why not just set d = 2 or d = 3 routinely to be safe?

Hint

Because over-differencing hurts, exactly as Module 3 showed with variance: each unnecessary difference amplifies noise and can introduce artificial structure (like a spurious negative lag-1 autocorrelation) that the ARMA part then has to waste terms explaining. A linear trend needs only d = 1; a curving (quadratic) trend needs d = 2; going beyond what the trend actually requires makes the model worse, not safer. The discipline is the same as Module 3’s: difference exactly as much as the stationarity evidence demands — usually d = 1 — and no more.


Summary

ARMA(p, q) combines autoregressive and moving-average terms in one model — today as a weighted sum of both recent values and recent shocks — but assumes the series is already stationary. ARIMA(p, d, q) adds an integration order d: the number of times the series is differenced internally before the ARMA part is fit, which is exactly Module 3’s differencing folded into the model (d=1 for a linear trend, d=2 for a quadratic one, d=0 for an already-stationary series where ARIMA reduces to ARMA). Because the differencing is internal, ARIMA fits a raw trending series directly: ARIMA(y, order=(1,1,1)) fit the 96-point Cyclepath series with no manual .diff(), differencing once internally and returning AR 0.663, MA 0.081. Every parameter traces to prior work: d from Module 3’s stationarity analysis, p and q from Module 4’s ACF/PACF reading.

Key Concepts

  • ARMA(p, q) — AR and MA terms together, for stationary data.
  • The “I” / integration order d — the number of internal differences; identical to Module 3’s differencing.
  • d maps to trend shape — d=1 for linear, d=2 for quadratic, d=0 for already-stationary (pure ARMA).
  • One unified order tuple(p, d, q) collects the stationarity decision and the AR/MA reading into a single specification ARIMA fits directly on the raw series.

Why This Matters

ARIMA is the workhorse of classical forecasting precisely because it unifies everything you’ve built: stationarize (d), model persistence (p), model shocks (q) — one model, one order tuple, fit directly on raw data. Understanding that d is just Module 3’s differencing means you never treat ARIMA as a black box; you know exactly what each of its three numbers does and where it came from. Next, you’ll fit these models properly with statsmodels — reading the full model summary, interpreting the coefficient table, and producing forecasts with the honest confidence intervals that quantify how uncertain a forecast really is.


Next Steps

Continue to Lesson 4 - Fitting and Forecasting ARIMA with statsmodels

Read a full model summary, interpret the coefficient table, and produce forecasts with confidence intervals.

Back to Module Overview

Return to the AR, MA, ARMA, ARIMA module overview


Continue Building Your Skills

You’ve assembled the full ARIMA model — AR terms, MA terms, and the differencing d that handles trends — and fit it directly to a raw trending series. Next you’ll work with a fitted model in depth: reading its summary output, interpreting the coefficient and significance columns, and producing forecasts that come with honest uncertainty intervals rather than bare point predictions.