Lesson 20 - Window Functions and Rolling Operations

Analyzing Data Over Time Windows

You can work with hierarchical indexes. Now you will learn Window Functions—calculating statistics across sliding windows of rows for trend analysis and time series operations.

By the end of this lesson, you will be able to:

Use .rolling() for moving averages and windows
Apply .shift() for lag and lead values
Calculate differences with .diff()
Compute percentage changes with .pct_change()
Use cumulative functions (.cumsum(), .cummax(), etc.)
Apply .expanding() windows

Window functions are essential for financial analysis, trend detection, and any time-based data analysis. Let’s master these powerful tools.

Why Window Functions?

Window functions perform calculations across a sliding window of rows:

Moving averages for smoothing trends
Lag/lead values for comparing periods
Cumulative sums for running totals
Percentage changes for growth rates

Common applications:

Stock price analysis (moving averages, daily returns)
Sales trends (comparing to previous period)
Running totals (cumulative revenue)
Weather data (rolling temperature averages)

Sample Data

Let’s create daily sales data for demonstration:

import pandas as pd
import numpy as np

# Create daily sales data
dates = pd.date_range('2024-01-01', periods=20, freq='D')
sales = pd.DataFrame({
    'date': dates,
    'daily_sales': [120, 135, 128, 142, 155, 148, 162, 158, 165, 172,
                    168, 180, 175, 188, 192, 185, 195, 202, 198, 210]
})

print("Daily Sales Data:")
print(sales.head(10))

This dataset has 20 days of sales—perfect for demonstrating window operations.

Rolling Windows

Calculate statistics across a sliding window of rows.

Calculate Moving Average

.rolling(window=n) creates a sliding window of n rows:

# 3-day moving average
sales['MA_3'] = sales['daily_sales'].rolling(window=3).mean()

# 7-day moving average
sales['MA_7'] = sales['daily_sales'].rolling(window=7).mean()

print("Sales with Moving Averages:")
print(sales.head(10))

Output:

        date  daily_sales       MA_3       MA_7
0 2024-01-01          120        NaN        NaN
1 2024-01-02          135        NaN        NaN
2 2024-01-03          128  127.67        NaN
3 2024-01-04          142  135.00        NaN
4 2024-01-05          155  141.67        NaN
5 2024-01-06          148  148.33        NaN
6 2024-01-07          162  155.00  141.43
7 2024-01-08          158  156.00  146.86
8 2024-01-09          165  161.67  151.14
9 2024-01-10          172  165.00  155.71

Notice: First 2 rows of MA_3 are NaN (not enough data), first 6 rows of MA_7 are NaN.

Moving averages smooth out short-term fluctuations and highlight longer-term trends.

Other Rolling Aggregations

Rolling windows support many aggregations:

# Rolling sum (total of last 3 days)
sales['rolling_sum_3'] = sales['daily_sales'].rolling(window=3).sum()

# Rolling max (highest in last 3 days)
sales['rolling_max_3'] = sales['daily_sales'].rolling(window=3).max()

# Rolling min (lowest in last 3 days)
sales['rolling_min_3'] = sales['daily_sales'].rolling(window=3).min()

# Rolling standard deviation (volatility)
sales['rolling_std_3'] = sales['daily_sales'].rolling(window=3).std()

print("Various Rolling Calculations:")
print(sales[['date', 'daily_sales', 'rolling_sum_3', 'rolling_max_3', 'rolling_min_3']].head(10))

Shift, Diff, and Percentage Change

Compare values across time periods.

Shift - Lag and Lead Values

.shift() moves data up or down by n rows:

# Lag: previous day's sales (shift down by 1)
sales['previous_day'] = sales['daily_sales'].shift(1)

# Lead: next day's sales (shift up by 1)
sales['next_day'] = sales['daily_sales'].shift(-1)

# 2 days ago
sales['two_days_ago'] = sales['daily_sales'].shift(2)

print("Sales with Shifted Values:")
print(sales[['date', 'daily_sales', 'previous_day', 'next_day']].head(10))

Output:

        date  daily_sales  previous_day  next_day
0 2024-01-01          120           NaN     135.0
1 2024-01-02          135         120.0     128.0
2 2024-01-03          128         135.0     142.0
3 2024-01-04          142         128.0     155.0
4 2024-01-05          155         142.0     148.0
5 2024-01-06          148         155.0     162.0

Use case: Compare today’s sales with yesterday’s sales.

Diff - Calculate Differences

.diff() calculates the difference from the previous row:

# Daily change (difference from previous day)
sales['daily_change'] = sales['daily_sales'].diff()

# 2-day change
sales['change_2days'] = sales['daily_sales'].diff(2)

print("Sales with Differences:")
print(sales[['date', 'daily_sales', 'daily_change', 'change_2days']].head(10))

Output:

        date  daily_sales  daily_change  change_2days
0 2024-01-01          120           NaN           NaN
1 2024-01-02          135          15.0           NaN
2 2024-01-03          128          -7.0           8.0
3 2024-01-04          142          14.0           7.0
4 2024-01-05          155          13.0          27.0
5 2024-01-06          148          -7.0           6.0

Find Days with Biggest Changes

# Find days with biggest increase
biggest_increase = sales.nlargest(3, 'daily_change')[['date', 'daily_sales', 'daily_change']]
print("Top 3 Days with Biggest Increase:")
print(biggest_increase)

Percentage Change

.pct_change() calculates percentage change from previous row:

# Daily percentage change
sales['pct_change'] = sales['daily_sales'].pct_change() * 100  # Convert to percentage

print("Sales with Percentage Change:")
print(sales[['date', 'daily_sales', 'pct_change']].head(10))

Output:

        date  daily_sales  pct_change
0 2024-01-01          120         NaN
1 2024-01-02          135       12.50
2 2024-01-03          128       -5.19
3 2024-01-04          142       10.94
4 2024-01-05          155        9.15
5 2024-01-06          148       -4.52

Cumulative Functions

Calculate running totals and cumulative statistics.

Cumulative Sum

.cumsum() creates running totals:

# Cumulative sales (running total)
sales['cumulative_sales'] = sales['daily_sales'].cumsum()

print("Sales with Cumulative Total:")
print(sales[['date', 'daily_sales', 'cumulative_sales']].head(10))

Output:

        date  daily_sales  cumulative_sales
0 2024-01-01          120               120
1 2024-01-02          135               255
2 2024-01-03          128               383
3 2024-01-04          142               525
4 2024-01-05          155               680
5 2024-01-06          148               828

Perfect for tracking total sales to date!

Other Cumulative Functions

# Create test data
performance = pd.DataFrame({
    'employee': ['James', 'Maria', 'David', 'Anna', 'Michael'],
    'score': [85, 92, 78, 95, 88]
})

# Cumulative maximum (best score so far)
performance['best_so_far'] = performance['score'].cummax()

# Cumulative minimum (worst score so far)
performance['worst_so_far'] = performance['score'].cummin()

print("Employee Performance with Cumulative Max/Min:")
print(performance)

Output:

  employee  score  best_so_far  worst_so_far
0    James     85           85            85
1    Maria     92           92            85
2    David     78           92            78
3     Anna     95           95            78
4  Michael     88           95            78

Practical Use - Revenue Tracking

Track monthly revenue and cumulative annual revenue:

# Monthly revenue
revenue = pd.DataFrame({
    'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'monthly_revenue': [50000, 55000, 52000, 60000, 58000, 65000]
})

# Cumulative revenue (YTD - Year To Date)
revenue['ytd_revenue'] = revenue['monthly_revenue'].cumsum()

# Progress toward annual goal
annual_goal = 400000
revenue['progress_pct'] = (revenue['ytd_revenue'] / annual_goal * 100).round(1)

print("Revenue Tracking:")
print(revenue)

Output:

  month  monthly_revenue  ytd_revenue  progress_pct
0   Jan            50000        50000          12.5
1   Feb            55000       105000          26.2
2   Mar            52000       157000          39.2
3   Apr            60000       217000          54.2
4   May            58000       275000          68.8
5   Jun            65000       340000          85.0

Expanding Windows

Growing windows from start to current row.

Expanding vs Rolling

Rolling: Fixed window size (e.g., last 3 days)
Expanding: Growing window from start to current row

# Student test scores
tests = pd.DataFrame({
    'test_number': range(1, 11),
    'score': [75, 82, 78, 85, 88, 86, 90, 92, 89, 95]
})

# Rolling average (last 3 tests only)
tests['rolling_avg'] = tests['score'].rolling(window=3).mean()

# Expanding average (all tests from start to current)
tests['expanding_avg'] = tests['score'].expanding().mean()

print("Student Test Scores:")
print(tests)

Output:

   test_number  score  rolling_avg  expanding_avg
0            1     75          NaN          75.00
1            2     82          NaN          78.50
2            3     78    78.33          78.33
3            4     85    81.67          80.00
4            5     88    83.67          81.60
5            6     86    86.33          82.33
6            7     90    88.00          83.43
7            8     92    89.33          84.50
8            9     89    90.33          85.00
9           10     95    92.00          86.00

Interpretation:

Rolling average: Recent performance (last 3 tests)
Expanding average: Overall GPA-like average (all tests so far)

Expanding Min/Max

Track best and worst performances over time:

# Best score so far (expanding max)
tests['best_score_so_far'] = tests['score'].expanding().max()

# Worst score so far (expanding min)
tests['worst_score_so_far'] = tests['score'].expanding().min()

print("Test Scores with Expanding Min/Max:")
print(tests[['test_number', 'score', 'best_score_so_far', 'worst_score_so_far']])

Combining Window Functions

Apply multiple window functions for comprehensive analysis.

Stock Price Analysis Example

# Stock price data
stock = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=15, freq='D'),
    'price': [100, 102, 101, 105, 107, 106, 108, 110, 109, 112, 115, 113, 116, 118, 117]
})

# 5-day moving average
stock['MA_5'] = stock['price'].rolling(window=5).mean()

# Daily change
stock['daily_change'] = stock['price'].diff()

# Daily percentage change
stock['pct_change'] = stock['price'].pct_change() * 100

# Cumulative return from start
stock['cumulative_return'] = ((stock['price'] / stock['price'].iloc[0]) - 1) * 100

# 5-day high
stock['high_5d'] = stock['price'].rolling(window=5).max()

# 5-day low
stock['low_5d'] = stock['price'].rolling(window=5).min()

print("Stock Analysis:")
print(stock[['date', 'price', 'MA_5', 'daily_change', 'pct_change', 'cumulative_return']].tail(10))

This combines multiple techniques for comprehensive stock analysis!

Practice Exercises

Apply window functions with these exercises.

Exercise 1: Temperature Trend Analysis

Analyze temperature trends using window functions:

temps = pd.DataFrame({
    'day': range(1, 11),
    'temperature': [25, 27, 26, 24, 22, 23, 25, 28, 27, 26]
})

Calculate 3-day moving average
Find the daily temperature change
Identify the day with the biggest temperature drop
Calculate cumulative average temperature

# Your code here

Hint

Use .diff() to find daily change, then .idxmin() to find the day with the biggest drop (most negative value)!

Exercise 2: Cumulative Sales Target

Track progress toward a quarterly sales target:

weekly = pd.DataFrame({
    'week': range(1, 13),
    'sales': [7500, 8200, 7800, 9100, 8500, 8900, 9200, 8700, 9500, 9300, 8800, 9400]
})

Calculate cumulative sales
Calculate percentage of quarterly target (100,000)
Find the week when 50% target was reached
Calculate week-over-week growth rate

# Your code here

Exercise 3: Student Grade Improvement

Analyze student improvement using window functions:

exams = pd.DataFrame({
    'exam': range(1, 9),
    'score': [68, 72, 70, 75, 78, 80, 82, 85]
})

Calculate the change from previous exam
Calculate overall average using expanding window
Find if the student is improving (compare last 3 vs first 3 exams)
Calculate the best score achieved so far at each exam

# Your code here

Summary

You now master window functions and rolling operations. Let’s review the key concepts.

Key Concepts

Rolling Windows

Fixed-size sliding window
Calculate statistics over last n rows
Perfect for smoothing trends
First n-1 values are NaN

Shift Operations

Move data up (lead) or down (lag)
Compare with previous/next periods
Essential for period-over-period analysis
Creates NaN at boundaries

Diff and Percentage Change

.diff(): Absolute change from previous
.pct_change(): Percentage change from previous
Useful for growth rates and trends
First value is always NaN

Cumulative Functions

Running totals and cumulative statistics
No NaN values (starts from first row)
Perfect for YTD tracking
Multiple variants: sum, max, min, product

Expanding Windows

Growing window from start
Overall statistics (all-time averages)
Different from rolling (which is fixed size)
No NaN values

Syntax Reference

# Rolling windows
df['MA_3'] = df['value'].rolling(window=3).mean()
df['rolling_sum'] = df['value'].rolling(window=3).sum()
df['rolling_max'] = df['value'].rolling(window=3).max()
df['rolling_std'] = df['value'].rolling(window=3).std()

# Shift - lag and lead
df['previous'] = df['value'].shift(1)    # Lag
df['next'] = df['value'].shift(-1)       # Lead
df['lag_2'] = df['value'].shift(2)       # 2 rows ago

# Diff - differences
df['change'] = df['value'].diff()
df['change_2'] = df['value'].diff(2)

# Percentage change
df['pct_change'] = df['value'].pct_change() * 100

# Cumulative functions
df['cumsum'] = df['value'].cumsum()
df['cummax'] = df['value'].cummax()
df['cummin'] = df['value'].cummin()
df['cumprod'] = df['value'].cumprod()

# Expanding windows
df['expanding_avg'] = df['value'].expanding().mean()
df['expanding_max'] = df['value'].expanding().max()

Common Patterns

Pattern 1: Moving Average for Smoothing

df['MA_7'] = df['sales'].rolling(window=7).mean()

Pattern 2: Compare with Previous Period

df['prev_month'] = df['revenue'].shift(1)
df['month_over_month'] = df['revenue'] - df['prev_month']
df['growth_rate'] = df['revenue'].pct_change() * 100

Pattern 3: Cumulative Progress

df['ytd_sales'] = df['monthly_sales'].cumsum()
df['progress_pct'] = (df['ytd_sales'] / annual_target) * 100

Pattern 4: Stock Price Analysis

df['MA_50'] = df['price'].rolling(window=50).mean()
df['MA_200'] = df['price'].rolling(window=200).mean()
df['daily_return'] = df['price'].pct_change() * 100
df['cumulative_return'] = ((df['price'] / df['price'].iloc[0]) - 1) * 100

Rolling vs Expanding

Feature	Rolling	Expanding
Window Size	Fixed (last n rows)	Growing (start to current)
Use Case	Recent trends	Overall trends
Example	Last 7 days average	Overall GPA
First Values	NaN until window filled	No NaN

Important Reminders

Rolling creates NaN: First n-1 values are NaN (not enough data)
Shift creates NaN: At start (lag) or end (lead) of data
Diff removes one row: First value is always NaN
Cumulative starts immediately: No NaN values
Window size matters: Larger windows smooth more but lose detail
Order matters: Sort your data by time before applying window functions

When to Use Each

.rolling(): Smoothing noisy data, detecting recent trends
.shift(): Comparing periods (month-over-month, year-over-year)
.diff(): Finding changes or differences
.pct_change(): Growth rates, returns
.cumsum(): Running totals, YTD metrics
.expanding(): Overall averages, all-time highs/lows

Next Steps

You can now apply window functions for time series analysis and trend detection. In the next lesson, you will complete a Final Project—applying all 20 lessons to a comprehensive real-world data analysis.

Continue to Lesson 21 - Final Project

Apply all Pandas skills to a comprehensive real-world project

Back to Lesson 19 - MultiIndex and Hierarchical Data

Review working with multi-level indexes

Master Time Series Analysis

Window functions are essential tools for any time-based analysis—from financial data to sales trends to scientific measurements.

Use window functions to uncover trends and patterns in your time series data!

Lesson 19 - MultiIndex and Hierarchical Data

Lesson 21 - Final Project: Real-World Data Analysis

Courses

DATATWEETS

Title here

Lesson 20 - Window Functions and Rolling Operations

Analyzing Data Over Time Windows

Why Window Functions?

Sample Data

Rolling Windows

Calculate Moving Average

Other Rolling Aggregations

Shift, Diff, and Percentage Change

Shift - Lag and Lead Values

Diff - Calculate Differences

Find Days with Biggest Changes

Percentage Change

Cumulative Functions

Cumulative Sum

Other Cumulative Functions

Practical Use - Revenue Tracking

Expanding Windows

Expanding vs Rolling

Expanding Min/Max

Combining Window Functions

Stock Price Analysis Example

Practice Exercises

Exercise 1: Temperature Trend Analysis

Exercise 2: Cumulative Sales Target

Exercise 3: Student Grade Improvement

Summary

Key Concepts

Syntax Reference

Common Patterns

Rolling vs Expanding

Important Reminders

When to Use Each

Next Steps

Continue to Lesson 21 - Final Project

Back to Lesson 19 - MultiIndex and Hierarchical Data

Master Time Series Analysis

Lesson 20 - Window Functions and Rolling Operations

Analyzing Data Over Time Windows#

Why Window Functions?#

Sample Data#

Rolling Windows#

Calculate Moving Average#

Other Rolling Aggregations#

Shift, Diff, and Percentage Change#

Shift - Lag and Lead Values#

Diff - Calculate Differences#

Find Days with Biggest Changes#

Percentage Change#

Cumulative Functions#

Cumulative Sum#

Other Cumulative Functions#

Practical Use - Revenue Tracking#

Expanding Windows#

Expanding vs Rolling#

Expanding Min/Max#

Combining Window Functions#

Stock Price Analysis Example#

Practice Exercises#

Exercise 1: Temperature Trend Analysis#

Exercise 2: Cumulative Sales Target#

Exercise 3: Student Grade Improvement#

Summary#

Key Concepts#

Syntax Reference#

Common Patterns#

Rolling vs Expanding#

Important Reminders#

When to Use Each#

Next Steps#

Continue to Lesson 21 - Final Project

Back to Lesson 19 - MultiIndex and Hierarchical Data

Master Time Series Analysis#

Analyzing Data Over Time Windows

Why Window Functions?

Sample Data

Rolling Windows

Calculate Moving Average

Other Rolling Aggregations

Shift, Diff, and Percentage Change

Shift - Lag and Lead Values

Diff - Calculate Differences

Find Days with Biggest Changes

Percentage Change

Cumulative Functions

Cumulative Sum

Other Cumulative Functions

Practical Use - Revenue Tracking

Expanding Windows

Expanding vs Rolling

Expanding Min/Max

Combining Window Functions

Stock Price Analysis Example

Practice Exercises

Exercise 1: Temperature Trend Analysis

Exercise 2: Cumulative Sales Target

Exercise 3: Student Grade Improvement

Summary

Key Concepts

Syntax Reference

Common Patterns

Rolling vs Expanding

Important Reminders

When to Use Each

Next Steps

Master Time Series Analysis