Lesson 20 - Window Functions and Rolling Operations

Analyzing Data Over Time Windows

You can work with hierarchical indexes. Now you will learn Window Functions—calculating statistics across sliding windows of rows for trend analysis and time series operations.

By the end of this lesson, you will be able to:

  • Use .rolling() for moving averages and windows
  • Apply .shift() for lag and lead values
  • Calculate differences with .diff()
  • Compute percentage changes with .pct_change()
  • Use cumulative functions (.cumsum(), .cummax(), etc.)
  • Apply .expanding() windows

Window functions are essential for financial analysis, trend detection, and any time-based data analysis. Let’s master these powerful tools.


Why Window Functions?

Window functions perform calculations across a sliding window of rows:

  • Moving averages for smoothing trends
  • Lag/lead values for comparing periods
  • Cumulative sums for running totals
  • Percentage changes for growth rates

Common applications:

  • Stock price analysis (moving averages, daily returns)
  • Sales trends (comparing to previous period)
  • Running totals (cumulative revenue)
  • Weather data (rolling temperature averages)

Sample Data

Let’s create daily sales data for demonstration:

import pandas as pd
import numpy as np

# Create daily sales data
dates = pd.date_range('2024-01-01', periods=20, freq='D')
sales = pd.DataFrame({
    'date': dates,
    'daily_sales': [120, 135, 128, 142, 155, 148, 162, 158, 165, 172,
                    168, 180, 175, 188, 192, 185, 195, 202, 198, 210]
})

print("Daily Sales Data:")
print(sales.head(10))

This dataset has 20 days of sales—perfect for demonstrating window operations.


Rolling Windows

Calculate statistics across a sliding window of rows.

Calculate Moving Average

.rolling(window=n) creates a sliding window of n rows:

# 3-day moving average
sales['MA_3'] = sales['daily_sales'].rolling(window=3).mean()

# 7-day moving average
sales['MA_7'] = sales['daily_sales'].rolling(window=7).mean()

print("Sales with Moving Averages:")
print(sales.head(10))

Output:

        date  daily_sales       MA_3       MA_7
0 2024-01-01          120        NaN        NaN
1 2024-01-02          135        NaN        NaN
2 2024-01-03          128  127.67        NaN
3 2024-01-04          142  135.00        NaN
4 2024-01-05          155  141.67        NaN
5 2024-01-06          148  148.33        NaN
6 2024-01-07          162  155.00  141.43
7 2024-01-08          158  156.00  146.86
8 2024-01-09          165  161.67  151.14
9 2024-01-10          172  165.00  155.71

Notice: First 2 rows of MA_3 are NaN (not enough data), first 6 rows of MA_7 are NaN.

Moving averages smooth out short-term fluctuations and highlight longer-term trends.

Other Rolling Aggregations

Rolling windows support many aggregations:

# Rolling sum (total of last 3 days)
sales['rolling_sum_3'] = sales['daily_sales'].rolling(window=3).sum()

# Rolling max (highest in last 3 days)
sales['rolling_max_3'] = sales['daily_sales'].rolling(window=3).max()

# Rolling min (lowest in last 3 days)
sales['rolling_min_3'] = sales['daily_sales'].rolling(window=3).min()

# Rolling standard deviation (volatility)
sales['rolling_std_3'] = sales['daily_sales'].rolling(window=3).std()

print("Various Rolling Calculations:")
print(sales[['date', 'daily_sales', 'rolling_sum_3', 'rolling_max_3', 'rolling_min_3']].head(10))

Shift, Diff, and Percentage Change

Compare values across time periods.

Shift - Lag and Lead Values

.shift() moves data up or down by n rows:

# Lag: previous day's sales (shift down by 1)
sales['previous_day'] = sales['daily_sales'].shift(1)

# Lead: next day's sales (shift up by 1)
sales['next_day'] = sales['daily_sales'].shift(-1)

# 2 days ago
sales['two_days_ago'] = sales['daily_sales'].shift(2)

print("Sales with Shifted Values:")
print(sales[['date', 'daily_sales', 'previous_day', 'next_day']].head(10))

Output:

        date  daily_sales  previous_day  next_day
0 2024-01-01          120           NaN     135.0
1 2024-01-02          135         120.0     128.0
2 2024-01-03          128         135.0     142.0
3 2024-01-04          142         128.0     155.0
4 2024-01-05          155         142.0     148.0
5 2024-01-06          148         155.0     162.0

Use case: Compare today’s sales with yesterday’s sales.

Diff - Calculate Differences

.diff() calculates the difference from the previous row:

# Daily change (difference from previous day)
sales['daily_change'] = sales['daily_sales'].diff()

# 2-day change
sales['change_2days'] = sales['daily_sales'].diff(2)

print("Sales with Differences:")
print(sales[['date', 'daily_sales', 'daily_change', 'change_2days']].head(10))

Output:

        date  daily_sales  daily_change  change_2days
0 2024-01-01          120           NaN           NaN
1 2024-01-02          135          15.0           NaN
2 2024-01-03          128          -7.0           8.0
3 2024-01-04          142          14.0           7.0
4 2024-01-05          155          13.0          27.0
5 2024-01-06          148          -7.0           6.0

Find Days with Biggest Changes

# Find days with biggest increase
biggest_increase = sales.nlargest(3, 'daily_change')[['date', 'daily_sales', 'daily_change']]
print("Top 3 Days with Biggest Increase:")
print(biggest_increase)

Percentage Change

.pct_change() calculates percentage change from previous row:

# Daily percentage change
sales['pct_change'] = sales['daily_sales'].pct_change() * 100  # Convert to percentage

print("Sales with Percentage Change:")
print(sales[['date', 'daily_sales', 'pct_change']].head(10))

Output:

        date  daily_sales  pct_change
0 2024-01-01          120         NaN
1 2024-01-02          135       12.50
2 2024-01-03          128       -5.19
3 2024-01-04          142       10.94
4 2024-01-05          155        9.15
5 2024-01-06          148       -4.52

Cumulative Functions

Calculate running totals and cumulative statistics.

Cumulative Sum

.cumsum() creates running totals:

# Cumulative sales (running total)
sales['cumulative_sales'] = sales['daily_sales'].cumsum()

print("Sales with Cumulative Total:")
print(sales[['date', 'daily_sales', 'cumulative_sales']].head(10))

Output:

        date  daily_sales  cumulative_sales
0 2024-01-01          120               120
1 2024-01-02          135               255
2 2024-01-03          128               383
3 2024-01-04          142               525
4 2024-01-05          155               680
5 2024-01-06          148               828

Perfect for tracking total sales to date!

Other Cumulative Functions

# Create test data
performance = pd.DataFrame({
    'employee': ['James', 'Maria', 'David', 'Anna', 'Michael'],
    'score': [85, 92, 78, 95, 88]
})

# Cumulative maximum (best score so far)
performance['best_so_far'] = performance['score'].cummax()

# Cumulative minimum (worst score so far)
performance['worst_so_far'] = performance['score'].cummin()

print("Employee Performance with Cumulative Max/Min:")
print(performance)

Output:

  employee  score  best_so_far  worst_so_far
0    James     85           85            85
1    Maria     92           92            85
2    David     78           92            78
3     Anna     95           95            78
4  Michael     88           95            78

Practical Use - Revenue Tracking

Track monthly revenue and cumulative annual revenue:

# Monthly revenue
revenue = pd.DataFrame({
    'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'monthly_revenue': [50000, 55000, 52000, 60000, 58000, 65000]
})

# Cumulative revenue (YTD - Year To Date)
revenue['ytd_revenue'] = revenue['monthly_revenue'].cumsum()

# Progress toward annual goal
annual_goal = 400000
revenue['progress_pct'] = (revenue['ytd_revenue'] / annual_goal * 100).round(1)

print("Revenue Tracking:")
print(revenue)

Output:

  month  monthly_revenue  ytd_revenue  progress_pct
0   Jan            50000        50000          12.5
1   Feb            55000       105000          26.2
2   Mar            52000       157000          39.2
3   Apr            60000       217000          54.2
4   May            58000       275000          68.8
5   Jun            65000       340000          85.0

Expanding Windows

Growing windows from start to current row.

Expanding vs Rolling

  • Rolling: Fixed window size (e.g., last 3 days)
  • Expanding: Growing window from start to current row
# Student test scores
tests = pd.DataFrame({
    'test_number': range(1, 11),
    'score': [75, 82, 78, 85, 88, 86, 90, 92, 89, 95]
})

# Rolling average (last 3 tests only)
tests['rolling_avg'] = tests['score'].rolling(window=3).mean()

# Expanding average (all tests from start to current)
tests['expanding_avg'] = tests['score'].expanding().mean()

print("Student Test Scores:")
print(tests)

Output:

   test_number  score  rolling_avg  expanding_avg
0            1     75          NaN          75.00
1            2     82          NaN          78.50
2            3     78    78.33          78.33
3            4     85    81.67          80.00
4            5     88    83.67          81.60
5            6     86    86.33          82.33
6            7     90    88.00          83.43
7            8     92    89.33          84.50
8            9     89    90.33          85.00
9           10     95    92.00          86.00

Interpretation:

  • Rolling average: Recent performance (last 3 tests)
  • Expanding average: Overall GPA-like average (all tests so far)

Expanding Min/Max

Track best and worst performances over time:

# Best score so far (expanding max)
tests['best_score_so_far'] = tests['score'].expanding().max()

# Worst score so far (expanding min)
tests['worst_score_so_far'] = tests['score'].expanding().min()

print("Test Scores with Expanding Min/Max:")
print(tests[['test_number', 'score', 'best_score_so_far', 'worst_score_so_far']])

Combining Window Functions

Apply multiple window functions for comprehensive analysis.

Stock Price Analysis Example

# Stock price data
stock = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=15, freq='D'),
    'price': [100, 102, 101, 105, 107, 106, 108, 110, 109, 112, 115, 113, 116, 118, 117]
})

# 5-day moving average
stock['MA_5'] = stock['price'].rolling(window=5).mean()

# Daily change
stock['daily_change'] = stock['price'].diff()

# Daily percentage change
stock['pct_change'] = stock['price'].pct_change() * 100

# Cumulative return from start
stock['cumulative_return'] = ((stock['price'] / stock['price'].iloc[0]) - 1) * 100

# 5-day high
stock['high_5d'] = stock['price'].rolling(window=5).max()

# 5-day low
stock['low_5d'] = stock['price'].rolling(window=5).min()

print("Stock Analysis:")
print(stock[['date', 'price', 'MA_5', 'daily_change', 'pct_change', 'cumulative_return']].tail(10))

This combines multiple techniques for comprehensive stock analysis!


Practice Exercises

Apply window functions with these exercises.

Exercise 1: Temperature Trend Analysis

Analyze temperature trends using window functions:

temps = pd.DataFrame({
    'day': range(1, 11),
    'temperature': [25, 27, 26, 24, 22, 23, 25, 28, 27, 26]
})
  1. Calculate 3-day moving average
  2. Find the daily temperature change
  3. Identify the day with the biggest temperature drop
  4. Calculate cumulative average temperature
# Your code here

Hint

Use .diff() to find daily change, then .idxmin() to find the day with the biggest drop (most negative value)!

Exercise 2: Cumulative Sales Target

Track progress toward a quarterly sales target:

weekly = pd.DataFrame({
    'week': range(1, 13),
    'sales': [7500, 8200, 7800, 9100, 8500, 8900, 9200, 8700, 9500, 9300, 8800, 9400]
})
  1. Calculate cumulative sales
  2. Calculate percentage of quarterly target (100,000)
  3. Find the week when 50% target was reached
  4. Calculate week-over-week growth rate
# Your code here

Exercise 3: Student Grade Improvement

Analyze student improvement using window functions:

exams = pd.DataFrame({
    'exam': range(1, 9),
    'score': [68, 72, 70, 75, 78, 80, 82, 85]
})
  1. Calculate the change from previous exam
  2. Calculate overall average using expanding window
  3. Find if the student is improving (compare last 3 vs first 3 exams)
  4. Calculate the best score achieved so far at each exam
# Your code here

Summary

You now master window functions and rolling operations. Let’s review the key concepts.

Key Concepts

Rolling Windows

  • Fixed-size sliding window
  • Calculate statistics over last n rows
  • Perfect for smoothing trends
  • First n-1 values are NaN

Shift Operations

  • Move data up (lead) or down (lag)
  • Compare with previous/next periods
  • Essential for period-over-period analysis
  • Creates NaN at boundaries

Diff and Percentage Change

  • .diff(): Absolute change from previous
  • .pct_change(): Percentage change from previous
  • Useful for growth rates and trends
  • First value is always NaN

Cumulative Functions

  • Running totals and cumulative statistics
  • No NaN values (starts from first row)
  • Perfect for YTD tracking
  • Multiple variants: sum, max, min, product

Expanding Windows

  • Growing window from start
  • Overall statistics (all-time averages)
  • Different from rolling (which is fixed size)
  • No NaN values

Syntax Reference

# Rolling windows
df['MA_3'] = df['value'].rolling(window=3).mean()
df['rolling_sum'] = df['value'].rolling(window=3).sum()
df['rolling_max'] = df['value'].rolling(window=3).max()
df['rolling_std'] = df['value'].rolling(window=3).std()

# Shift - lag and lead
df['previous'] = df['value'].shift(1)    # Lag
df['next'] = df['value'].shift(-1)       # Lead
df['lag_2'] = df['value'].shift(2)       # 2 rows ago

# Diff - differences
df['change'] = df['value'].diff()
df['change_2'] = df['value'].diff(2)

# Percentage change
df['pct_change'] = df['value'].pct_change() * 100

# Cumulative functions
df['cumsum'] = df['value'].cumsum()
df['cummax'] = df['value'].cummax()
df['cummin'] = df['value'].cummin()
df['cumprod'] = df['value'].cumprod()

# Expanding windows
df['expanding_avg'] = df['value'].expanding().mean()
df['expanding_max'] = df['value'].expanding().max()

Common Patterns

Pattern 1: Moving Average for Smoothing

df['MA_7'] = df['sales'].rolling(window=7).mean()

Pattern 2: Compare with Previous Period

df['prev_month'] = df['revenue'].shift(1)
df['month_over_month'] = df['revenue'] - df['prev_month']
df['growth_rate'] = df['revenue'].pct_change() * 100

Pattern 3: Cumulative Progress

df['ytd_sales'] = df['monthly_sales'].cumsum()
df['progress_pct'] = (df['ytd_sales'] / annual_target) * 100

Pattern 4: Stock Price Analysis

df['MA_50'] = df['price'].rolling(window=50).mean()
df['MA_200'] = df['price'].rolling(window=200).mean()
df['daily_return'] = df['price'].pct_change() * 100
df['cumulative_return'] = ((df['price'] / df['price'].iloc[0]) - 1) * 100

Rolling vs Expanding

FeatureRollingExpanding
Window SizeFixed (last n rows)Growing (start to current)
Use CaseRecent trendsOverall trends
ExampleLast 7 days averageOverall GPA
First ValuesNaN until window filledNo NaN

Important Reminders

  • Rolling creates NaN: First n-1 values are NaN (not enough data)
  • Shift creates NaN: At start (lag) or end (lead) of data
  • Diff removes one row: First value is always NaN
  • Cumulative starts immediately: No NaN values
  • Window size matters: Larger windows smooth more but lose detail
  • Order matters: Sort your data by time before applying window functions

When to Use Each

  • .rolling(): Smoothing noisy data, detecting recent trends
  • .shift(): Comparing periods (month-over-month, year-over-year)
  • .diff(): Finding changes or differences
  • .pct_change(): Growth rates, returns
  • .cumsum(): Running totals, YTD metrics
  • .expanding(): Overall averages, all-time highs/lows

Next Steps

You can now apply window functions for time series analysis and trend detection. In the next lesson, you will complete a Final Project—applying all 20 lessons to a comprehensive real-world data analysis.

Continue to Lesson 21 - Final Project

Apply all Pandas skills to a comprehensive real-world project

Back to Lesson 19 - MultiIndex and Hierarchical Data

Review working with multi-level indexes


Master Time Series Analysis

Window functions are essential tools for any time-based analysis—from financial data to sales trends to scientific measurements.

Use window functions to uncover trends and patterns in your time series data!