Lesson 2 - The Datetime Index and Resampling

Welcome to The Datetime Index and Resampling

In Lesson 1 you learned why a time series is a different kind of thing. Now you’ll learn how to hold one — the everyday pandas mechanics you’ll reach for in every lesson that follows. The trick is that pandas doesn’t treat dates as ordinary labels: when a Series is indexed by a DatetimeIndex, pandas understands the calendar. It knows that December is followed by January, that twelve months make a year, that a “quarter” and a “week” are real spans. That calendar awareness is what lets you resample a series to a new frequency, roll a window across it, and align two series on real time — none of which would work if your dates were just strings. This lesson builds the Cyclepath series, inspects its index, and puts those two workhorse operations — resampling and rolling — to work.

By the end of this lesson, you will be able to:

  • Build and inspect a DatetimeIndex and read its frequency
  • Reproduce the Cyclepath series that runs through the whole course
  • Resample a series to a coarser frequency with an aggregation like .sum()
  • Smooth a series with a rolling window and explain what a 12-month mean reveals

Let’s start with the index that makes all of this possible.


The DatetimeIndex: Why the Index Matters

In pandas, a time series is simply a Series (or DataFrame) whose index is a DatetimeIndex — an ordered set of timestamps rather than plain integer positions. That index isn’t decoration; it carries a frequency, a code that says how the timestamps are spaced. "MS" means month start, "YS" means year start, "D" means daily. Because the index is made of real calendar dates and pandas knows their spacing, it can do things a normal integer index can’t: aggregate every point in a year, average a trailing twelve months, or line up two series that share dates. Everything in this lesson — and most of this course — leans on that.

You build a datetime index most often with pd.date_range, which generates evenly spaced timestamps at a given frequency. Here is the canonical Cyclepath generator — the exact code that defines the series you’ll model in every lesson. Reproduce it faithfully; the fixed seed means your numbers will match the lesson to the digit.

import numpy as np
import pandas as pd

def cyclepath():
    """Monthly bike-share trips: 8 years, upward trend + summer seasonality."""
    idx = pd.date_range("2016-01-01", periods=96, freq="MS")   # 96 months
    t = np.arange(96)
    rng = np.random.default_rng(42)
    trend = 9000 + 90 * t
    seasonal = 3200 * np.sin(2 * np.pi * (t - 3) / 12)
    noise = rng.normal(0, 350, 96)
    return pd.Series(np.round(trend + seasonal + noise).astype(int), index=idx, name="trips")

y = cyclepath()
print(y.index.freqstr, len(y), y.index[0].date(), y.index[-1].date())
print(y.head(3).tolist())
MS 96 2016-01-01 2023-12-01
[5907, 5955, 7843]

Read that output slowly. y.index.freqstr is MS — pandas knows these are month-start timestamps, one per month. There are 96 of them, running from 2016-01-01 to 2023-12-01: exactly eight years of monthly data, January 2016 through December 2023. The first three trip counts are [5907, 5955, 7843] — low winter months, already climbing toward spring. That single Series, index and values together, is Cyclepath.

Frequency strings you’ll see all course

Pandas names frequencies with short codes. D = daily, W = weekly, MS = month start, M (or ME) = month end, QS/Q = quarter, YS/Y = year. The S suffix means the timestamp lands on the start of the period rather than the end — Cyclepath uses MS, so each month is stamped on the 1st. The frequency lives on the index (y.index.freqstr), and it’s what lets pandas resample and roll correctly: without it, pandas would only see a pile of dates and wouldn’t know that twelve of them make a year.


Resampling: Change the Frequency

Resampling changes the frequency of a series — you regroup the timestamps into a coarser or finer calendar bucket. Going to a coarser frequency (monthly → yearly) is downsampling, and because you’re collapsing many points into one, you must tell pandas how to combine them with an aggregation like .sum() or .mean(). Going to a finer frequency (monthly → daily) is upsampling, which instead creates gaps you have to fill or interpolate. Our focus is downsampling with a sum: roll the twelve monthly trip counts in each year up into one yearly total.

yearly = y.resample("YS").sum()
print(yearly.to_string())
2016-01-01    113347
2017-01-01    127335
2018-01-01    140950
2019-01-01    153225
2020-01-01    165816
2021-01-01    177929
2022-01-01    191528
2023-01-01    203501

resample("YS") groups the index into year-start buckets, and .sum() adds the twelve monthly values in each bucket into a single number. Notice pandas did the calendar bookkeeping for you: it knew which months belong to 2016, which to 2017, and stamped each total on January 1st of its year. Now read down the column — 113347, 127335, 140950, …, 203501. Every year is larger than the one before it, a steady climb from about 113k trips to over 203k. That relentless year-over-year growth is the trend you met in Lesson 1, made unmistakable by collapsing away the within-year wiggle. Resampling to a coarser frequency is one of the fastest ways to see a trend.


Rolling Windows: Smooth at the Same Frequency

A rolling window is different from resampling: it keeps the original frequency but slides a fixed-width window across the series, computing a statistic over each window. .rolling(12).mean() takes every trailing 12-month span and averages it, producing a smoothed series at the same monthly frequency — one output point per input point (once there’s enough history).

ma = y.rolling(12).mean()
print(round(ma.iloc[11]))                # first non-NaN (needs 12 points)
print(round(ma.dropna().iloc[0]), round(ma.dropna().iloc[-1]))
9446
9446 16958

Two things to unpack. First, the first 11 values are NaN: a 12-month window needs 12 points to fill, so the earliest month a full window exists is index 11 (the 12th point), where the rolling mean first becomes valid at 9446. That’s why ma.iloc[11] is the first real number and ma.dropna().iloc[0] is the same 9446. Second, and this is the point: a 12-month rolling mean averages exactly one full seasonal cycle, so the summer peaks and winter troughs cancel out. What’s left is the trend, laid bare — the smoothed series rises from about 9,446 early on to about 16,958 at the end. The seasonality is gone; the slow upward drift is all that remains. Choosing a window equal to your seasonal period is the classic move for isolating trend.

Two panels. Left, 'Resample changes the frequency': twelve small monthly bars are aggregated by .resample('YS').sum() into a single tall yearly-total bar, showing many points collapsing into one coarser bucket. Right, 'Rolling is a sliding window': a .rolling(12).mean() window slides across the wiggly monthly series, averaging each 12-month span so the summer peaks and winter troughs cancel and a smooth rising trend line emerges. A note underneath says both operations rely on the DatetimeIndex so pandas knows what a year or twelve months actually means.
Resampling regroups the timestamps into a coarser frequency and aggregates (twelve months summed into one yearly total), while a rolling window keeps the frequency and slides a 12-month average across the series to cancel seasonality and expose the trend — both only work because the DatetimeIndex tells pandas what a year is.

Practice Exercises

Exercise 1: Read the frequency

You load a series and y.index.freqstr returns MS, while len(y) is 96 and the last index is 2023-12-01. Without printing anything else, how many years of data do you have and how are the months stamped?

Hint

MS is month start, so every timestamp lands on the 1st of its month. 96 monthly points is 96 ÷ 12 = 8 years, and since the last point is December 2023, the series runs January 2016 through December 2023. Reading the frequency and length tells you the span and spacing before you plot a single point — that’s exactly what the index is for.

Exercise 2: Downsample or smooth?

You want a single number for total trips in each calendar year. Do you reach for .resample(...) or .rolling(...), and what aggregation do you use?

Hint

Use y.resample("YS").sum(). You’re changing the frequency — collapsing twelve monthly points into one yearly bucket — which is downsampling, and it needs an aggregation to combine the points; a total means .sum(). .rolling(...) would keep the monthly frequency and just smooth it, giving you one value per month, not one per year. Resample when you want fewer, coarser points; roll when you want the same points, smoothed.

Exercise 3: Why 12, and why the NaNs?

You run y.rolling(12).mean() and notice the first 11 values are NaN, and that the summer bumps have vanished from the smoothed line. Explain both.

Hint

The NaNs appear because a 12-wide window can’t be filled until 12 points exist, so the first valid value is at index 11 (the 12th month). The seasonality vanishes because a 12-month window spans exactly one full yearly cycle: averaging a complete cycle cancels the peaks against the troughs, leaving only the trend. That’s why a window equal to the seasonal period is the go-to choice for isolating trend from a monthly series.


Summary

In pandas, a time series is a Series indexed by a DatetimeIndex, and that index carries a frequency (Cyclepath’s is MS, month start) that makes pandas calendar-aware — the reason resampling, rolling, and alignment work at all. You reproduced the canonical Cyclepath generator: 96 monthly points from January 2016 to December 2023, seeded so every number matches the lesson. Resampling changes the frequency — y.resample("YS").sum() downsampled the monthly series into yearly totals (113,347 up to 203,501), making the year-over-year trend unmistakable, and always needs an aggregation when you coarsen. Rolling windows keep the frequency but slide a fixed span across the series — y.rolling(12).mean() left the first 11 months NaN (a 12-window needs 12 points) and, by averaging one full seasonal cycle, canceled the seasonality to reveal the trend rising from about 9,446 to 16,958.

Key Concepts

  • DatetimeIndex — an index of real timestamps that carries a frequency and makes pandas understand the calendar.
  • Frequency strings — short codes like MS, YS, D that say how the timestamps are spaced.
  • Resampling — changing the frequency; downsampling to a coarser bucket needs an aggregation like .sum().
  • Rolling window — a sliding fixed-width window at the same frequency; a period-length mean cancels seasonality and shows trend.

Why This Matters

Almost every time-series operation you’ll write — decomposition, differencing, backtesting, forecasting — assumes your data sits on a proper DatetimeIndex with a known frequency, and quietly misbehaves when it doesn’t. Resampling and rolling are the two levers you’ll pull constantly: resample to change the granularity of a question (“give me yearly totals”), roll to smooth away noise or seasonality and expose the signal underneath. Getting comfortable with them now means that when you decompose Cyclepath in the next module, or align a forecast against actuals later, the mechanics are second nature and you can focus on the modeling. Next, you’ll use these tools to actually explore and visualize the series and read its structure by eye.


Next Steps

Continue to Lesson 3 - Exploring and Visualizing a Series

Plot Cyclepath and read its structure by eye: line plots, seasonal views, and the visual cues that tell you what to model.

Back to Module Overview

Return to the Time Series Foundations module overview


Continue Building Your Skills

You can now hold a real time series in pandas: build and inspect a DatetimeIndex, read its frequency, resample it to a coarser frequency with an aggregation, and smooth it with a rolling window that cancels seasonality to reveal the trend. These are the everyday mechanics behind every technique to come. Next you’ll put them to visual use — plotting Cyclepath, viewing it season by season, and learning to read trend, seasonality, and noise straight off a chart before you model a thing.