Lesson 20 - Window Functions and Rolling Operations
Analyzing Data Over Time Windows
You can work with hierarchical indexes. Now you will learn Window Functions—calculating statistics across sliding windows of rows for trend analysis and time series operations.
By the end of this lesson, you will be able to:
- Use
.rolling()for moving averages and windows - Apply
.shift()for lag and lead values - Calculate differences with
.diff() - Compute percentage changes with
.pct_change() - Use cumulative functions (
.cumsum(),.cummax(), etc.) - Apply
.expanding()windows
Window functions are essential for financial analysis, trend detection, and any time-based data analysis. Let’s master these powerful tools.
Why Window Functions?
Window functions perform calculations across a sliding window of rows:
- Moving averages for smoothing trends
- Lag/lead values for comparing periods
- Cumulative sums for running totals
- Percentage changes for growth rates
Common applications:
- Stock price analysis (moving averages, daily returns)
- Sales trends (comparing to previous period)
- Running totals (cumulative revenue)
- Weather data (rolling temperature averages)
Sample Data
Let’s create daily sales data for demonstration:
import pandas as pd
import numpy as np
# Create daily sales data
dates = pd.date_range('2024-01-01', periods=20, freq='D')
sales = pd.DataFrame({
'date': dates,
'daily_sales': [120, 135, 128, 142, 155, 148, 162, 158, 165, 172,
168, 180, 175, 188, 192, 185, 195, 202, 198, 210]
})
print("Daily Sales Data:")
print(sales.head(10))This dataset has 20 days of sales—perfect for demonstrating window operations.
Rolling Windows
Calculate statistics across a sliding window of rows.
Calculate Moving Average
.rolling(window=n) creates a sliding window of n rows:
# 3-day moving average
sales['MA_3'] = sales['daily_sales'].rolling(window=3).mean()
# 7-day moving average
sales['MA_7'] = sales['daily_sales'].rolling(window=7).mean()
print("Sales with Moving Averages:")
print(sales.head(10))Output:
date daily_sales MA_3 MA_7
0 2024-01-01 120 NaN NaN
1 2024-01-02 135 NaN NaN
2 2024-01-03 128 127.67 NaN
3 2024-01-04 142 135.00 NaN
4 2024-01-05 155 141.67 NaN
5 2024-01-06 148 148.33 NaN
6 2024-01-07 162 155.00 141.43
7 2024-01-08 158 156.00 146.86
8 2024-01-09 165 161.67 151.14
9 2024-01-10 172 165.00 155.71Notice: First 2 rows of MA_3 are NaN (not enough data), first 6 rows of MA_7 are NaN.
Moving averages smooth out short-term fluctuations and highlight longer-term trends.
Other Rolling Aggregations
Rolling windows support many aggregations:
# Rolling sum (total of last 3 days)
sales['rolling_sum_3'] = sales['daily_sales'].rolling(window=3).sum()
# Rolling max (highest in last 3 days)
sales['rolling_max_3'] = sales['daily_sales'].rolling(window=3).max()
# Rolling min (lowest in last 3 days)
sales['rolling_min_3'] = sales['daily_sales'].rolling(window=3).min()
# Rolling standard deviation (volatility)
sales['rolling_std_3'] = sales['daily_sales'].rolling(window=3).std()
print("Various Rolling Calculations:")
print(sales[['date', 'daily_sales', 'rolling_sum_3', 'rolling_max_3', 'rolling_min_3']].head(10))Shift, Diff, and Percentage Change
Compare values across time periods.
Shift - Lag and Lead Values
.shift() moves data up or down by n rows:
# Lag: previous day's sales (shift down by 1)
sales['previous_day'] = sales['daily_sales'].shift(1)
# Lead: next day's sales (shift up by 1)
sales['next_day'] = sales['daily_sales'].shift(-1)
# 2 days ago
sales['two_days_ago'] = sales['daily_sales'].shift(2)
print("Sales with Shifted Values:")
print(sales[['date', 'daily_sales', 'previous_day', 'next_day']].head(10))Output:
date daily_sales previous_day next_day
0 2024-01-01 120 NaN 135.0
1 2024-01-02 135 120.0 128.0
2 2024-01-03 128 135.0 142.0
3 2024-01-04 142 128.0 155.0
4 2024-01-05 155 142.0 148.0
5 2024-01-06 148 155.0 162.0Use case: Compare today’s sales with yesterday’s sales.
Diff - Calculate Differences
.diff() calculates the difference from the previous row:
# Daily change (difference from previous day)
sales['daily_change'] = sales['daily_sales'].diff()
# 2-day change
sales['change_2days'] = sales['daily_sales'].diff(2)
print("Sales with Differences:")
print(sales[['date', 'daily_sales', 'daily_change', 'change_2days']].head(10))Output:
date daily_sales daily_change change_2days
0 2024-01-01 120 NaN NaN
1 2024-01-02 135 15.0 NaN
2 2024-01-03 128 -7.0 8.0
3 2024-01-04 142 14.0 7.0
4 2024-01-05 155 13.0 27.0
5 2024-01-06 148 -7.0 6.0Find Days with Biggest Changes
# Find days with biggest increase
biggest_increase = sales.nlargest(3, 'daily_change')[['date', 'daily_sales', 'daily_change']]
print("Top 3 Days with Biggest Increase:")
print(biggest_increase)Percentage Change
.pct_change() calculates percentage change from previous row:
# Daily percentage change
sales['pct_change'] = sales['daily_sales'].pct_change() * 100 # Convert to percentage
print("Sales with Percentage Change:")
print(sales[['date', 'daily_sales', 'pct_change']].head(10))Output:
date daily_sales pct_change
0 2024-01-01 120 NaN
1 2024-01-02 135 12.50
2 2024-01-03 128 -5.19
3 2024-01-04 142 10.94
4 2024-01-05 155 9.15
5 2024-01-06 148 -4.52Cumulative Functions
Calculate running totals and cumulative statistics.
Cumulative Sum
.cumsum() creates running totals:
# Cumulative sales (running total)
sales['cumulative_sales'] = sales['daily_sales'].cumsum()
print("Sales with Cumulative Total:")
print(sales[['date', 'daily_sales', 'cumulative_sales']].head(10))Output:
date daily_sales cumulative_sales
0 2024-01-01 120 120
1 2024-01-02 135 255
2 2024-01-03 128 383
3 2024-01-04 142 525
4 2024-01-05 155 680
5 2024-01-06 148 828Perfect for tracking total sales to date!
Other Cumulative Functions
# Create test data
performance = pd.DataFrame({
'employee': ['James', 'Maria', 'David', 'Anna', 'Michael'],
'score': [85, 92, 78, 95, 88]
})
# Cumulative maximum (best score so far)
performance['best_so_far'] = performance['score'].cummax()
# Cumulative minimum (worst score so far)
performance['worst_so_far'] = performance['score'].cummin()
print("Employee Performance with Cumulative Max/Min:")
print(performance)Output:
employee score best_so_far worst_so_far
0 James 85 85 85
1 Maria 92 92 85
2 David 78 92 78
3 Anna 95 95 78
4 Michael 88 95 78Practical Use - Revenue Tracking
Track monthly revenue and cumulative annual revenue:
# Monthly revenue
revenue = pd.DataFrame({
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'monthly_revenue': [50000, 55000, 52000, 60000, 58000, 65000]
})
# Cumulative revenue (YTD - Year To Date)
revenue['ytd_revenue'] = revenue['monthly_revenue'].cumsum()
# Progress toward annual goal
annual_goal = 400000
revenue['progress_pct'] = (revenue['ytd_revenue'] / annual_goal * 100).round(1)
print("Revenue Tracking:")
print(revenue)Output:
month monthly_revenue ytd_revenue progress_pct
0 Jan 50000 50000 12.5
1 Feb 55000 105000 26.2
2 Mar 52000 157000 39.2
3 Apr 60000 217000 54.2
4 May 58000 275000 68.8
5 Jun 65000 340000 85.0Expanding Windows
Growing windows from start to current row.
Expanding vs Rolling
- Rolling: Fixed window size (e.g., last 3 days)
- Expanding: Growing window from start to current row
# Student test scores
tests = pd.DataFrame({
'test_number': range(1, 11),
'score': [75, 82, 78, 85, 88, 86, 90, 92, 89, 95]
})
# Rolling average (last 3 tests only)
tests['rolling_avg'] = tests['score'].rolling(window=3).mean()
# Expanding average (all tests from start to current)
tests['expanding_avg'] = tests['score'].expanding().mean()
print("Student Test Scores:")
print(tests)Output:
test_number score rolling_avg expanding_avg
0 1 75 NaN 75.00
1 2 82 NaN 78.50
2 3 78 78.33 78.33
3 4 85 81.67 80.00
4 5 88 83.67 81.60
5 6 86 86.33 82.33
6 7 90 88.00 83.43
7 8 92 89.33 84.50
8 9 89 90.33 85.00
9 10 95 92.00 86.00Interpretation:
- Rolling average: Recent performance (last 3 tests)
- Expanding average: Overall GPA-like average (all tests so far)
Expanding Min/Max
Track best and worst performances over time:
# Best score so far (expanding max)
tests['best_score_so_far'] = tests['score'].expanding().max()
# Worst score so far (expanding min)
tests['worst_score_so_far'] = tests['score'].expanding().min()
print("Test Scores with Expanding Min/Max:")
print(tests[['test_number', 'score', 'best_score_so_far', 'worst_score_so_far']])Combining Window Functions
Apply multiple window functions for comprehensive analysis.
Stock Price Analysis Example
# Stock price data
stock = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=15, freq='D'),
'price': [100, 102, 101, 105, 107, 106, 108, 110, 109, 112, 115, 113, 116, 118, 117]
})
# 5-day moving average
stock['MA_5'] = stock['price'].rolling(window=5).mean()
# Daily change
stock['daily_change'] = stock['price'].diff()
# Daily percentage change
stock['pct_change'] = stock['price'].pct_change() * 100
# Cumulative return from start
stock['cumulative_return'] = ((stock['price'] / stock['price'].iloc[0]) - 1) * 100
# 5-day high
stock['high_5d'] = stock['price'].rolling(window=5).max()
# 5-day low
stock['low_5d'] = stock['price'].rolling(window=5).min()
print("Stock Analysis:")
print(stock[['date', 'price', 'MA_5', 'daily_change', 'pct_change', 'cumulative_return']].tail(10))This combines multiple techniques for comprehensive stock analysis!
Practice Exercises
Apply window functions with these exercises.
Exercise 1: Temperature Trend Analysis
Analyze temperature trends using window functions:
temps = pd.DataFrame({
'day': range(1, 11),
'temperature': [25, 27, 26, 24, 22, 23, 25, 28, 27, 26]
})- Calculate 3-day moving average
- Find the daily temperature change
- Identify the day with the biggest temperature drop
- Calculate cumulative average temperature
# Your code hereHint
Use .diff() to find daily change, then .idxmin() to find the day with the biggest drop (most negative value)!
Exercise 2: Cumulative Sales Target
Track progress toward a quarterly sales target:
weekly = pd.DataFrame({
'week': range(1, 13),
'sales': [7500, 8200, 7800, 9100, 8500, 8900, 9200, 8700, 9500, 9300, 8800, 9400]
})- Calculate cumulative sales
- Calculate percentage of quarterly target (100,000)
- Find the week when 50% target was reached
- Calculate week-over-week growth rate
# Your code hereExercise 3: Student Grade Improvement
Analyze student improvement using window functions:
exams = pd.DataFrame({
'exam': range(1, 9),
'score': [68, 72, 70, 75, 78, 80, 82, 85]
})- Calculate the change from previous exam
- Calculate overall average using expanding window
- Find if the student is improving (compare last 3 vs first 3 exams)
- Calculate the best score achieved so far at each exam
# Your code hereSummary
You now master window functions and rolling operations. Let’s review the key concepts.
Key Concepts
Rolling Windows
- Fixed-size sliding window
- Calculate statistics over last n rows
- Perfect for smoothing trends
- First n-1 values are NaN
Shift Operations
- Move data up (lead) or down (lag)
- Compare with previous/next periods
- Essential for period-over-period analysis
- Creates NaN at boundaries
Diff and Percentage Change
.diff(): Absolute change from previous.pct_change(): Percentage change from previous- Useful for growth rates and trends
- First value is always NaN
Cumulative Functions
- Running totals and cumulative statistics
- No NaN values (starts from first row)
- Perfect for YTD tracking
- Multiple variants: sum, max, min, product
Expanding Windows
- Growing window from start
- Overall statistics (all-time averages)
- Different from rolling (which is fixed size)
- No NaN values
Syntax Reference
# Rolling windows
df['MA_3'] = df['value'].rolling(window=3).mean()
df['rolling_sum'] = df['value'].rolling(window=3).sum()
df['rolling_max'] = df['value'].rolling(window=3).max()
df['rolling_std'] = df['value'].rolling(window=3).std()
# Shift - lag and lead
df['previous'] = df['value'].shift(1) # Lag
df['next'] = df['value'].shift(-1) # Lead
df['lag_2'] = df['value'].shift(2) # 2 rows ago
# Diff - differences
df['change'] = df['value'].diff()
df['change_2'] = df['value'].diff(2)
# Percentage change
df['pct_change'] = df['value'].pct_change() * 100
# Cumulative functions
df['cumsum'] = df['value'].cumsum()
df['cummax'] = df['value'].cummax()
df['cummin'] = df['value'].cummin()
df['cumprod'] = df['value'].cumprod()
# Expanding windows
df['expanding_avg'] = df['value'].expanding().mean()
df['expanding_max'] = df['value'].expanding().max()Common Patterns
Pattern 1: Moving Average for Smoothing
df['MA_7'] = df['sales'].rolling(window=7).mean()Pattern 2: Compare with Previous Period
df['prev_month'] = df['revenue'].shift(1)
df['month_over_month'] = df['revenue'] - df['prev_month']
df['growth_rate'] = df['revenue'].pct_change() * 100Pattern 3: Cumulative Progress
df['ytd_sales'] = df['monthly_sales'].cumsum()
df['progress_pct'] = (df['ytd_sales'] / annual_target) * 100Pattern 4: Stock Price Analysis
df['MA_50'] = df['price'].rolling(window=50).mean()
df['MA_200'] = df['price'].rolling(window=200).mean()
df['daily_return'] = df['price'].pct_change() * 100
df['cumulative_return'] = ((df['price'] / df['price'].iloc[0]) - 1) * 100Rolling vs Expanding
| Feature | Rolling | Expanding |
|---|---|---|
| Window Size | Fixed (last n rows) | Growing (start to current) |
| Use Case | Recent trends | Overall trends |
| Example | Last 7 days average | Overall GPA |
| First Values | NaN until window filled | No NaN |
Important Reminders
- Rolling creates NaN: First n-1 values are NaN (not enough data)
- Shift creates NaN: At start (lag) or end (lead) of data
- Diff removes one row: First value is always NaN
- Cumulative starts immediately: No NaN values
- Window size matters: Larger windows smooth more but lose detail
- Order matters: Sort your data by time before applying window functions
When to Use Each
.rolling(): Smoothing noisy data, detecting recent trends.shift(): Comparing periods (month-over-month, year-over-year).diff(): Finding changes or differences.pct_change(): Growth rates, returns.cumsum(): Running totals, YTD metrics.expanding(): Overall averages, all-time highs/lows
Next Steps
You can now apply window functions for time series analysis and trend detection. In the next lesson, you will complete a Final Project—applying all 20 lessons to a comprehensive real-world data analysis.
Continue to Lesson 21 - Final Project
Apply all Pandas skills to a comprehensive real-world project
Back to Lesson 19 - MultiIndex and Hierarchical Data
Review working with multi-level indexes
Master Time Series Analysis
Window functions are essential tools for any time-based analysis—from financial data to sales trends to scientific measurements.
Use window functions to uncover trends and patterns in your time series data!