Lesson 3 - Exploring and Visualizing a Series

Welcome to Exploring and Visualizing a Series

You now have Cyclepath sitting in a pandas Series with a proper datetime index. The temptation is to jump straight to a model — but the most valuable thing you can do first costs one line of code and takes ten seconds: plot it. A line plot of value against time is the single most informative diagnostic in all of forecasting. It instantly reveals whether the series trends, whether it has seasonality, where the outliers are, whether the level suddenly shifts, and whether the variance grows over time — the exact things summary statistics smooth away and hide. Every technique in the rest of this course is a response to something you can see in the plot. So before we decompose, difference, or fit anything, this lesson teaches you to look.

By the end of this lesson, you will be able to:

  • Explain why the line plot is the first and most important diagnostic for any series
  • Read trend, seasonality, outliers, and variance changes directly off a plot
  • Overlay a rolling mean to isolate the trend from the seasonal wiggles
  • Use a by-month view to confirm seasonality and read off its period

Let’s start with why a summary table isn’t enough.


Summary Stats Orient You — But Can’t Show Shape

We’ll use the same seeded Cyclepath generator throughout, so every number matches:

import numpy as np, pandas as pd

def cyclepath():
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")
    t = np.arange(96); rng = np.random.default_rng(42)
    trend = 9000 + 90*t; seasonal = 3200*np.sin(2*np.pi*(t-3)/12); noise = rng.normal(0,350,96)
    return pd.Series(np.round(trend+seasonal+noise).astype(int), index=idx, name="trips")

y = cyclepath()

The natural first move is y.describe(), and it is worth doing. It tells you the series runs for 96 months, that ridership ranges from a minimum of 5,907 trips up to a maximum of 20,533, and that the average month sees about 13,267 trips:

y.describe()
# count       96
# min       5907
# mean     ~13267
# max      20533
# ...

That orients you — you now know the rough scale and spread. But notice what it can’t tell you: whether those 13,267-trip averages come early or late, whether the highs cluster in summer, whether the series is climbing or flat. Summary statistics have no notion of order — they’d report the exact same min, mean, and max if you shuffled all 96 months into random chaos. Two series with an identical mean can look completely different over time: one flat and noisy, one climbing steadily, one collapsing then recovering. The mean can’t distinguish them. Shape lives in the sequence, and the only way to see the sequence is to plot it.


The Line Plot: Reading Trend and Seasonality

So plot it. In pandas this is genuinely one line, because the datetime index becomes the x-axis automatically:

import matplotlib.pyplot as plt

y.plot(figsize=(10,4), title="Cyclepath monthly trips")
plt.ylabel("trips"); plt.show()

Since this is a written lesson we can’t render the image inline, so here is precisely what that plot shows and how to read each feature — this is the vocabulary you’ll apply to every series you ever meet:

  • The overall slope is the trend. The wiggling line climbs steadily from around 6,000 trips in early 2016 up to roughly 20,000 by 2023 — a clear, roughly linear upward drift over the eight years. An upward slope like this means the series is not stationary (Module 3’s word for it); the mean level keeps rising.
  • The regular repeating peaks and troughs are seasonality. On top of that climb, the line rises and falls in a smooth wave that repeats every 12 months — a summer high and a winter low, year after year. The distance from one summer peak to the next is the period (12 months here). Once you see a clean repeating cycle, you know a seasonal model is in your future.
  • The thickness of the wiggle is the noise. The small jitter around the smooth wave is the random component — the noise term in the generator. Its spread looks roughly constant here, which is worth checking: a fan-shaped series (wiggles growing bigger over time) signals changing variance you’d need to stabilize.
A rising, wiggling monthly-rides time series climbing from about six thousand to about twenty thousand trips, with a dashed trend line running up through the middle and an arc marking one yearly seasonal bump; three boxes note that the series can't be shuffled, is autocorrelated, and must be split by time.
Reading a series off its line plot: the overall upward slope is the trend, the smooth wave that repeats every twelve months is the seasonality, and the small jitter around it is noise — all three are visible at a glance, and none of them show up in a summary table.

The one habit: always plot first

Make plotting the first thing you do with any new series — before describe(), before any model, before any transformation. A single line plot catches problems a summary table silently hides: a sudden level shift when a sensor was recalibrated, a run of frozen values from a logging bug, an outlier spike from a data-entry error, or variance that fans out over time. Each of these would quietly wreck a model, and each is obvious the instant you look. Plotting first isn’t a nicety — it’s the cheapest bug-catcher in forecasting.


Overlay a Rolling Mean to Isolate the Trend

The line plot shows trend and seasonality tangled together. To see the trend on its own, overlay a rolling mean — a moving average over a full seasonal cycle. Averaging over exactly 12 months cancels the summer-up/winter-down swings and leaves the slow climb underneath:

ax = y.plot(figsize=(10,4), label="trips", title="Cyclepath with 12-month rolling mean")
y.rolling(12).mean().plot(ax=ax, label="rolling mean (12)")
ax.legend(); ax.set_ylabel("trips"); plt.show()

The smooth line that results is the trend, and it matches what you measured back in Lesson 2: it rises from about 9,446 trips at the start to about 16,958 by the end. Because the window is a full year, the seasonal wave averages out and you’re left with the underlying direction. (The first 11 months are NaN — a 12-month window needs 12 points before it can produce a value.)

You can also overlay a rolling standard deviation to check whether the spread is stable over time:

y.rolling(12).std().plot(figsize=(10,4), title="12-month rolling std")
plt.show()

If that line stays roughly flat, the variance is stable; if it trends up, the series is getting noisier over time and you’ll eventually need to stabilize it (often with a log transform). This is your first taste of a stationarity check — the question of whether a series’ statistical properties stay constant over time — which we’ll make formal in Module 3. For now, just knowing to look at whether the mean and spread drift is enough.


A Seasonal View: Confirming the Period

The line plot suggests a 12-month cycle; a by-month view confirms it and shows the average shape of the year. Group every observation by its calendar month and average:

y.groupby(y.index.month).mean()

This collapses all eight years into twelve numbers — the typical January, the typical February, and so on. Plotted, it traces a clean arc: lowest in the winter months, highest in the summer months, a single smooth hump across the year. That single-peak shape confirms two things at once: seasonality is real (the months differ systematically, not randomly), and its period is 12 — one full cycle per year. That number isn’t trivia; it’s the exact value you’ll hand to seasonal models later (the m=12 in SARIMA), so reading it correctly off this view matters. If the arc had shown two humps a year, the period would be 6, and everything downstream would change.


Practice Exercises

Exercise 1: Why isn’t the mean enough?

A teammate reports that Cyclepath “averages about 13,000 trips a month” and wants to move straight to modeling. Why do you insist on plotting the series first?

Hint

Because a mean has no notion of order — it would report the same ~13,267 whether the series is flat, climbing, or collapsing and recovering. The plot reveals that Cyclepath actually climbs from ~6,000 to ~20,000 with a strong yearly wave, plus any outliers or variance changes. That shape — trend and seasonality — is the entire raw material of the forecast, and none of it appears in a summary statistic. Plotting first is the cheapest way to see what you’re actually modeling and to catch data problems before they wreck a model.

Exercise 2: Read the plot

Looking at Cyclepath’s line plot, you see the line climbing from ~6k to ~20k and rising and falling in a smooth wave that repeats every 12 months. Name each feature and say what it implies for modeling.

Hint

The overall upward slope is the trend — the series is non-stationary and its mean level keeps rising, so you’ll need to remove or model that drift. The repeating 12-month wave is seasonality with a period of 12, which tells you to reach for a seasonal model and hand it m=12. The small jitter around the wave is noise; because its spread looks roughly constant, the variance is stable and you don’t need a log transform yet. Every one of those modeling decisions comes straight from reading the picture.

Exercise 3: Why average over exactly 12 months?

When you overlay a rolling mean to see the trend, why choose a 12-month window specifically rather than, say, 6 or 20?

Hint

Because the seasonal period is 12. Averaging over exactly one full cycle makes the summer highs and winter lows cancel out, leaving only the slow trend — here rising from ~9,446 to ~16,958. A window of 6 (half a cycle) wouldn’t cancel the season, so the smoothed line would still wobble; a window of 20 spans more than a cycle unevenly and blurs the trend. Match the rolling window to the period, and the wiggles vanish while the trend survives.


Summary

Before modeling a time series, you look at it. Summary statistics orient you — Cyclepath runs 96 months, ranges from 5,907 to 20,533 trips, and averages about 13,267 — but they have no notion of order and can’t show shape; two series with the same mean can look nothing alike. The line plot is the fix, and the first habit of forecasting: it reveals trend (Cyclepath’s climb from ~6k to ~20k), seasonality (the smooth 12-month wave), noise, outliers, and variance changes, all at a glance. Overlaying a 12-month rolling mean isolates the trend (rising ~9,446 → ~16,958) by cancelling the seasonal swings, and a rolling standard deviation checks whether variance is stable — a first look at stationarity. A by-month view confirms seasonality and reads off its period of 12, the exact value seasonal models will need. Plot first, always.

Key Concepts

  • Plot first — the line plot is the cheapest, most informative diagnostic; it shows shape that summary stats can’t.
  • Trend, seasonality, noise — the slope, the repeating cycle, and the jitter, all readable directly off the plot.
  • Rolling mean — averaging over one full period (12 months) isolates the trend by cancelling the seasonal swings.
  • Seasonal period — a by-month view confirms seasonality and gives the period (12) you’ll feed to seasonal models.

Why This Matters

Almost every forecasting decision you’ll make downstream — whether to difference, whether to log-transform, which seasonal period to set, where to suspect a broken sensor — is a response to something you can see in the plot but not in a table. Analysts who skip the plot routinely model outliers as signal, miss a level shift, or feed the wrong period to a seasonal model, and only discover it when the forecast fails. Building the plot-first habit now means every technique in the rest of the course starts from evidence rather than assumption. Next, you’ll turn that understanding into a proper experiment: splitting the series chronologically and building the naive baselines every real model has to beat.


Next Steps

Continue to Lesson 4 - Splitting Time Series and Baselines

Split a series chronologically without leaking the future, then build the naive and seasonal-naive baselines that every real forecast has to beat.

Back to Module Overview

Return to the Time Series Foundations module overview


Continue Building Your Skills

You can now read a time series the way a forecaster does — plotting it first, reading trend and seasonality straight off the line, isolating the trend with a rolling mean, and confirming the seasonal period with a by-month view. Next you’ll put that reading to work in a real experiment: splitting Cyclepath by time so you never leak the future, and building the naive baselines that keep every fancier model honest.