Lesson 3 - Exploring and Visualizing a Series
Welcome to Exploring and Visualizing a Series
You now have Cyclepath sitting in a pandas Series with a proper datetime index. The temptation is to jump straight to a model — but the most valuable thing you can do first costs one line of code and takes ten seconds: plot it. A line plot of value against time is the single most informative diagnostic in all of forecasting. It instantly reveals whether the series trends, whether it has seasonality, where the outliers are, whether the level suddenly shifts, and whether the variance grows over time — the exact things summary statistics smooth away and hide. Every technique in the rest of this course is a response to something you can see in the plot. So before we decompose, difference, or fit anything, this lesson teaches you to look.
By the end of this lesson, you will be able to:
- Explain why the line plot is the first and most important diagnostic for any series
- Read trend, seasonality, outliers, and variance changes directly off a plot
- Overlay a rolling mean to isolate the trend from the seasonal wiggles
- Use a by-month view to confirm seasonality and read off its period
Let’s start with why a summary table isn’t enough.
Summary Stats Orient You — But Can’t Show Shape
We’ll use the same seeded Cyclepath generator throughout, so every number matches:
import numpy as np, pandas as pd
def cyclepath():
idx = pd.date_range("2016-01-01", periods=96, freq="MS")
t = np.arange(96); rng = np.random.default_rng(42)
trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")
y = cyclepath()The natural first move is y.describe(), and it is worth doing. It tells you the series runs for 96 months, that ridership ranges from a minimum of 5,907 trips up to a maximum of 20,533, and that the average month sees about 13,267 trips:
y.describe()
# count 96
# min 5907
# mean ~13267
# max 20533
# ...That orients you — you now know the rough scale and spread. But notice what it can’t tell you: whether those 13,267-trip averages come early or late, whether the highs cluster in summer, whether the series is climbing or flat. Summary statistics have no notion of order — they’d report the exact same min, mean, and max if you shuffled all 96 months into random chaos. Two series with an identical mean can look completely different over time: one flat and noisy, one climbing steadily, one collapsing then recovering. The mean can’t distinguish them. Shape lives in the sequence, and the only way to see the sequence is to plot it.
The Line Plot: Reading Trend and Seasonality
So plot it. In pandas this is genuinely one line, because the datetime index becomes the x-axis automatically:
import matplotlib.pyplot as plt
y.plot(figsize=(10,4), title="Cyclepath monthly trips")
plt.ylabel("trips"); plt.show()Since this is a written lesson we can’t render the image inline, so here is precisely what that plot shows and how to read each feature — this is the vocabulary you’ll apply to every series you ever meet:
- The overall slope is the trend. The wiggling line climbs steadily from around 6,000 trips in early 2016 up to roughly 20,000 by 2023 — a clear, roughly linear upward drift over the eight years. An upward slope like this means the series is not stationary (Module 3’s word for it); the mean level keeps rising.
- The regular repeating peaks and troughs are seasonality. On top of that climb, the line rises and falls in a smooth wave that repeats every 12 months — a summer high and a winter low, year after year. The distance from one summer peak to the next is the period (12 months here). Once you see a clean repeating cycle, you know a seasonal model is in your future.
- The thickness of the wiggle is the noise. The small jitter around the smooth wave is the random component — the
noiseterm in the generator. Its spread looks roughly constant here, which is worth checking: a fan-shaped series (wiggles growing bigger over time) signals changing variance you’d need to stabilize.
The one habit: always plot first
Make plotting the first thing you do with any new series — before describe(), before any model, before any transformation. A single line plot catches problems a summary table silently hides: a sudden level shift when a sensor was recalibrated, a run of frozen values from a logging bug, an outlier spike from a data-entry error, or variance that fans out over time. Each of these would quietly wreck a model, and each is obvious the instant you look. Plotting first isn’t a nicety — it’s the cheapest bug-catcher in forecasting.
Overlay a Rolling Mean to Isolate the Trend
The line plot shows trend and seasonality tangled together. To see the trend on its own, overlay a rolling mean — a moving average over a full seasonal cycle. Averaging over exactly 12 months cancels the summer-up/winter-down swings and leaves the slow climb underneath:
ax = y.plot(figsize=(10,4), label="trips", title="Cyclepath with 12-month rolling mean")
y.rolling(12).mean().plot(ax=ax, label="rolling mean (12)")
ax.legend(); ax.set_ylabel("trips"); plt.show()The smooth line that results is the trend, and it matches what you measured back in Lesson 2: it rises from about 9,446 trips at the start to about 16,958 by the end. Because the window is a full year, the seasonal wave averages out and you’re left with the underlying direction. (The first 11 months are NaN — a 12-month window needs 12 points before it can produce a value.)
You can also overlay a rolling standard deviation to check whether the spread is stable over time:
y.rolling(12).std().plot(figsize=(10,4), title="12-month rolling std")
plt.show()If that line stays roughly flat, the variance is stable; if it trends up, the series is getting noisier over time and you’ll eventually need to stabilize it (often with a log transform). This is your first taste of a stationarity check — the question of whether a series’ statistical properties stay constant over time — which we’ll make formal in Module 3. For now, just knowing to look at whether the mean and spread drift is enough.
A Seasonal View: Confirming the Period
The line plot suggests a 12-month cycle; a by-month view confirms it and shows the average shape of the year. Group every observation by its calendar month and average:
y.groupby(y.index.month).mean()This collapses all eight years into twelve numbers — the typical January, the typical February, and so on. Plotted, it traces a clean arc: lowest in the winter months, highest in the summer months, a single smooth hump across the year. That single-peak shape confirms two things at once: seasonality is real (the months differ systematically, not randomly), and its period is 12 — one full cycle per year. That number isn’t trivia; it’s the exact value you’ll hand to seasonal models later (the m=12 in SARIMA), so reading it correctly off this view matters. If the arc had shown two humps a year, the period would be 6, and everything downstream would change.
Practice Exercises
Exercise 1: Why isn’t the mean enough?
A teammate reports that Cyclepath “averages about 13,000 trips a month” and wants to move straight to modeling. Why do you insist on plotting the series first?
Hint
Because a mean has no notion of order — it would report the same ~13,267 whether the series is flat, climbing, or collapsing and recovering. The plot reveals that Cyclepath actually climbs from ~6,000 to ~20,000 with a strong yearly wave, plus any outliers or variance changes. That shape — trend and seasonality — is the entire raw material of the forecast, and none of it appears in a summary statistic. Plotting first is the cheapest way to see what you’re actually modeling and to catch data problems before they wreck a model.
Exercise 2: Read the plot
Looking at Cyclepath’s line plot, you see the line climbing from ~6k to ~20k and rising and falling in a smooth wave that repeats every 12 months. Name each feature and say what it implies for modeling.
Hint
The overall upward slope is the trend — the series is non-stationary and its mean level keeps rising, so you’ll need to remove or model that drift. The repeating 12-month wave is seasonality with a period of 12, which tells you to reach for a seasonal model and hand it m=12. The small jitter around the wave is noise; because its spread looks roughly constant, the variance is stable and you don’t need a log transform yet. Every one of those modeling decisions comes straight from reading the picture.
Exercise 3: Why average over exactly 12 months?
When you overlay a rolling mean to see the trend, why choose a 12-month window specifically rather than, say, 6 or 20?
Hint
Because the seasonal period is 12. Averaging over exactly one full cycle makes the summer highs and winter lows cancel out, leaving only the slow trend — here rising from ~9,446 to ~16,958. A window of 6 (half a cycle) wouldn’t cancel the season, so the smoothed line would still wobble; a window of 20 spans more than a cycle unevenly and blurs the trend. Match the rolling window to the period, and the wiggles vanish while the trend survives.
Summary
Before modeling a time series, you look at it. Summary statistics orient you — Cyclepath runs 96 months, ranges from 5,907 to 20,533 trips, and averages about 13,267 — but they have no notion of order and can’t show shape; two series with the same mean can look nothing alike. The line plot is the fix, and the first habit of forecasting: it reveals trend (Cyclepath’s climb from ~6k to ~20k), seasonality (the smooth 12-month wave), noise, outliers, and variance changes, all at a glance. Overlaying a 12-month rolling mean isolates the trend (rising ~9,446 → ~16,958) by cancelling the seasonal swings, and a rolling standard deviation checks whether variance is stable — a first look at stationarity. A by-month view confirms seasonality and reads off its period of 12, the exact value seasonal models will need. Plot first, always.
Key Concepts
- Plot first — the line plot is the cheapest, most informative diagnostic; it shows shape that summary stats can’t.
- Trend, seasonality, noise — the slope, the repeating cycle, and the jitter, all readable directly off the plot.
- Rolling mean — averaging over one full period (12 months) isolates the trend by cancelling the seasonal swings.
- Seasonal period — a by-month view confirms seasonality and gives the period (12) you’ll feed to seasonal models.
Why This Matters
Almost every forecasting decision you’ll make downstream — whether to difference, whether to log-transform, which seasonal period to set, where to suspect a broken sensor — is a response to something you can see in the plot but not in a table. Analysts who skip the plot routinely model outliers as signal, miss a level shift, or feed the wrong period to a seasonal model, and only discover it when the forecast fails. Building the plot-first habit now means every technique in the rest of the course starts from evidence rather than assumption. Next, you’ll turn that understanding into a proper experiment: splitting the series chronologically and building the naive baselines every real model has to beat.
Next Steps
Continue to Lesson 4 - Splitting Time Series and Baselines
Split a series chronologically without leaking the future, then build the naive and seasonal-naive baselines that every real forecast has to beat.
Back to Module Overview
Return to the Time Series Foundations module overview
Continue Building Your Skills
You can now read a time series the way a forecaster does — plotting it first, reading trend and seasonality straight off the line, isolating the trend with a rolling mean, and confirming the seasonal period with a by-month view. Next you’ll put that reading to work in a real experiment: splitting Cyclepath by time so you never leak the future, and building the naive baselines that keep every fancier model honest.