Lesson 2 - The Moving Average (MA) Model
Welcome to the Moving Average (MA) Model
The autoregressive model in Lesson 1 built today out of past values. The moving-average model builds today out of something you can’t directly observe: past shocks — the random surprises that hit the series at each step. This is a genuinely different idea, and it’s worth being careful with the name: a “moving average” model has nothing to do with the rolling-mean smoothing you did back in Module 1. It’s a model where today’s value is a weighted blend of the most recent unpredictable jolts.
By the end of this lesson, you will be able to:
- Write the MA(q) equation and explain what a “shock” is
- Fit an MA(1) model with statsmodels and recover a known coefficient
- Distinguish the MA model from the rolling-average smoothing of Module 1
- Describe the flat-after-q-steps forecast behavior and contrast it with AR
Let’s write the model down.
The MA(q) Equation
A moving-average model of order q, written MA(q), predicts the current value from the last q random shocks:
y_t = c + e_t + theta_1 * e_{t-1} + theta_2 * e_{t-2} + ... + theta_q * e_{t-q}Here e_t is the current shock, e_{t-1} through e_{t-q} are the previous q shocks, each theta (theta) weights how much a past shock still echoes into today, and c is the constant mean level. A shock is the part of a past value that wasn’t predictable — the residual surprise at that step. The simplest version is MA(1): y_t = c + e_t + theta_1 * e_{t-1}, today as the current shock plus an echo of yesterday’s shock. This is the process behind Module 4’s MA(1) signature; now you’ll fit it.
Not the same as a rolling average
The “moving average” in MA(q) is an unfortunate name collision. Module 1’s rolling mean (y.rolling(12).mean()) averages past observed values to smooth a series for visualization. The MA(q) model is a forecasting model where today depends on past unobserved shocks — the random errors, not the values themselves. They share two words and nothing else. When this course says “MA model” it always means the shock-based forecasting model, never the smoothing operation.
Fitting an MA(1) and Recovering the Coefficient
Same discipline as Lesson 1: build a synthetic MA(1) with a known theta = 0.7, fit an MA(1), and check the recovery:
import numpy as np
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.arima.model import ARIMA
rng = np.random.default_rng(11)
ma1 = ArmaProcess(np.array([1]), np.array([1, 0.7])).generate_sample(
nsample=2000, distrvs=lambda size: rng.normal(0, 1, size)
)
res = ARIMA(ma1, order=(0, 0, 1), trend="n").fit()
print(round(res.params[0], 3)) # 0.716 <- the ma.L1 coefficient
print(round(res.aic, 2)) # 5685.87The fitted ma.L1 coefficient is 0.716 — close to the true 0.7, within sampling noise. The order (0, 0, 1) means zero AR terms, zero differences, one MA term — a pure MA(1). The fitting machinery is identical to the AR case; only the order tuple changed, which is the whole convenience of the unified ARIMA class.
Fitting the Wrong Model
A useful check: what if you fit an AR(1) to data that’s genuinely MA(1)? The AR model is the wrong shape for this data, and AIC should say so loudly:
res_wrong = ARIMA(ma1, order=(1, 0, 0), trend="n").fit()
print(round(res_wrong.aic, 2)) # 6011.41Fitting an AR(1) to MA(1) data gives an AIC of 6011.41 — dramatically worse than the correct MA(1)’s 5685.87, a gap of over 320 points. This is the same lesson as Module 4’s AR/MA signatures, now from the fitting side: the two model types are genuinely different, and using the wrong one leaves real structure unexplained. AIC catches it. (In practice you’d have chosen MA over AR before fitting, by reading the ACF/PACF signatures from Module 4 — but it’s reassuring that the fit-quality score agrees.)
The Signature Behavior: Forecasts Go Flat After q Steps
Here’s where MA differs most sharply from AR. Build an MA(1) centered on a mean of 50, fit it, and forecast six steps:
rng = np.random.default_rng(11) # reset seed so this block is reproducible on its own
ma1b = ArmaProcess(np.array([1]), np.array([1, 0.7])).generate_sample(
nsample=300, distrvs=lambda size: rng.normal(0, 1, size)
) + 50
res_b = ARIMA(ma1b, order=(0, 0, 1), trend="c").fit()
print(np.round(res_b.forecast(steps=6), 2))
# [49.77 50.04 50.04 50.04 50.04 50.04]The first forecast (49.77) leans slightly off the mean, because the model still knows yesterday’s shock and can use its echo. But from the second step onward, the forecast is completely flat at the mean (50.04) — it doesn’t decay gradually like AR, it drops to the mean immediately. The reason is structural: an MA(1) forecast can only use shocks it has actually observed, and it has observed only up through the last data point. One step ahead, it still has yesterday’s shock to work with; two or more steps ahead, every relevant shock is in the unknown future, so the best guess is simply the mean. An MA(q) forecast is informative for exactly q steps, then flat forever after.
Practice Exercises
Exercise 1: How many informative steps?
You fit an MA(3) model. For how many steps ahead will its forecast differ from the series mean, and what happens after that?
Hint
An MA(3) forecast is informative for exactly 3 steps — because it depends on the last 3 shocks, and at forecast steps 1, 2, and 3 it still has one or more observed shocks to work with. From step 4 onward, every shock in the model’s window is in the unobserved future, so the forecast collapses to the mean and stays flat. In general, an MA(q) forecast differs from the mean for q steps and is flat thereafter — which makes pure MA models poorly suited to long-horizon forecasting of anything but short-memory series.
Exercise 2: AR or MA for a long forecast?
You need to forecast 24 steps into the future and want the forecast to reflect the series’ recent trajectory for as long as possible. Between a pure AR and a pure MA model (setting seasonality aside), which structure gives you more informative long-horizon forecasts?
Hint
The AR model gives more informative long-horizon forecasts. An AR forecast decays gradually toward the mean, so even 24 steps out it still carries some faint influence from the last observed value — the memory fades but never abruptly vanishes. An MA(q) forecast, by contrast, goes completely flat after just q steps, so unless q is very large, a 24-step MA forecast is just the flat mean for most of its horizon. This is one practical reason AR terms (and their seasonal cousins in SARIMA) tend to do the heavy lifting for extended forecasts.
Exercise 3: The name trap
A colleague says “let’s just use a moving average model” and starts computing y.rolling(6).mean(). What’s the misunderstanding?
Hint
They’ve confused the MA(q) forecasting model with a rolling-average smoother. y.rolling(6).mean() averages the last 6 observed values to smooth the series — it’s the Module 1 visualization tool. The MA(q) model fits weights on the last q unobserved shocks (random errors) and is estimated by maximum likelihood, not computed by averaging values. They share the words “moving average” and nothing else. If the colleague actually wants a moving-average forecasting model, they’d fit ARIMA(order=(0, 0, q)), not compute a rolling mean.
Summary
The moving-average model MA(q) writes today’s value as a constant plus weighted contributions from the last q random shocks: y_t = c + e_t + theta_1 e_{t-1} + ... + theta_q e_{t-q}. Fit with order=(0, 0, q), an MA(1) recovered a known coefficient of 0.7 as 0.716 from 2,000 synthetic points. Fitting the wrong model — an AR(1) on MA(1) data — produced a much worse AIC (6011.41 vs. 5685.87), confirming from the fitting side that AR and MA are genuinely distinct. The signature MA forecast behavior is flat after q steps: an MA(1) forecast leaned slightly off the mean at step 1 (49.77) using the last observed shock, then went completely flat at the mean (50.04) from step 2 on — because beyond q steps there are no observed shocks left to use. This is the opposite of AR’s gradual mean reversion, and the two behaviors are why the model types suit different data.
Key Concepts
- MA(q) equation — today as a constant plus weighted recent shocks (unobserved random errors).
- Shock, not value — MA depends on past prediction errors, not past observed values; this is what distinguishes it from AR.
- Not a rolling average — the MA model is unrelated to
rolling().mean()smoothing despite the shared name. - Flat-after-q forecasts — an MA(q) forecast is informative for exactly
qsteps, then equals the mean forever.
Why This Matters
Understanding both model types — and especially their opposite forecasting behaviors — is what makes reading Module 4’s ACF/PACF signatures actionable: once you’ve identified a process as AR or MA, you know not just which model to fit but how its forecast will behave. That intuition carries directly into the combined and seasonal models ahead, where AR-like and MA-like terms coexist. Next, you’ll combine AR and MA into ARMA, then add the differencing step that turns ARMA into the full ARIMA — the model you’ll actually fit to Cyclepath.
Next Steps
Continue to Lesson 3 - ARMA and ARIMA: Combining and Differencing
Combine AR and MA into ARMA, then add the integration (differencing) step that makes it ARIMA and lets it handle trends.
Back to Module Overview
Return to the AR, MA, ARMA, ARIMA module overview
Continue Building Your Skills
You’ve now built both halves of the classical model — the autoregressive part that depends on past values, and the moving-average part that depends on past shocks — and seen how differently they forecast. Next you’ll combine them into ARMA and add the differencing step from Module 3 directly into the model, producing the full ARIMA that can handle the trending series you’ve actually been working with.