Lesson 16 - Pivot Tables

Reshaping Data for Analysis

You can group data and calculate statistics. Now you will learn Pivot Tables—the powerful spreadsheet-like tool for reshaping and summarizing data in a cross-tabulation format.

By the end of this lesson, you will be able to:

Create pivot tables with pd.pivot_table()
Understand index, columns, and values parameters
Apply different aggregation functions
Add margins for row/column totals
Distinguish between .pivot() and .pivot_table()
Create multi-level pivot tables

Pivot tables are essential for business intelligence, data exploration, and creating summary reports. Let’s master this crucial skill.

Understanding Pivot Tables

A pivot table summarizes data by grouping and aggregating, then reshapes it from long format to a cross-tabulation (wide format).

Key Components:

Index: Row labels (what goes down the left side)
Columns: Column headers (what goes across the top)
Values: Data to aggregate (numbers to summarize)
Aggfunc: How to aggregate (sum, mean, count, etc.)

Pivot tables answer questions like “What is total revenue by region and product?” or “How many sales per month by category?”

Sample Data

Let’s create sales data for demonstration:

import pandas as pd
import numpy as np

# Sales data
sales = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=12, freq='M'),
    'region': ['North', 'South', 'North', 'South', 'East', 'West',
               'North', 'South', 'East', 'West', 'North', 'South'],
    'product': ['Laptop', 'Laptop', 'Mouse', 'Mouse', 'Laptop', 'Mouse',
                'Keyboard', 'Keyboard', 'Laptop', 'Laptop', 'Mouse', 'Keyboard'],
    'quantity': [10, 15, 50, 30, 12, 40, 25, 20, 18, 14, 45, 22],
    'revenue': [12000, 18000, 1500, 900, 14400, 1200, 2000, 1600, 21600, 16800, 1350, 1760]
})

print("Sales Data:")
print(sales)
print(f"\nShape: {sales.shape}")

This dataset has 12 transactions across multiple regions and products—perfect for pivot table demonstrations.

Basic Pivot Tables

Create a basic pivot table showing total revenue by region and product.

Creating Your First Pivot Table

# Basic pivot: Revenue by region and product
pivot_basic = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum'
)

print("Revenue by Region and Product:")
print(pivot_basic)

Output:

product     Keyboard   Laptop    Mouse
region
East             NaN  36000.0      NaN
North         2000.0  12000.0   2850.0
South         3360.0  18000.0    900.0
West             NaN  16800.0   1200.0

Notice: NaN means no sales for that combination (e.g., East region sold no Keyboards).

Filling Missing Values

# Fill NaN with 0 for clearer presentation
pivot_filled = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0
)

print("Revenue with zeros:")
print(pivot_filled)

Output:

product     Keyboard   Laptop   Mouse
region
East               0  36000.0       0
North         2000.0  12000.0  2850.0
South         3360.0  18000.0   900.0
West               0  16800.0  1200.0

Much cleaner! Zero clearly indicates “no sales” rather than missing data.

Different Aggregation Functions

# Average revenue instead of total
pivot_avg = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='mean',
    fill_value=0
)

print("Average Revenue by Region and Product:")
print(pivot_avg.round(2))

Common aggregation functions: 'sum', 'mean', 'count', 'min', 'max', 'std', 'median'

Multiple Aggregations

Apply multiple aggregation functions at once to see different perspectives.

Multiple Functions on Single Column

# Multiple aggregations
pivot_multi = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc=['sum', 'mean', 'count'],
    fill_value=0
)

print("Multiple Aggregations:")
print(pivot_multi)

This creates a multi-level column structure showing sum, mean, and count for each region-product combination.

Aggregating Multiple Columns

# Aggregate multiple values
pivot_multi_vals = pd.pivot_table(
    sales,
    values=['revenue', 'quantity'],
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0
)

print("Revenue and Quantity:")
print(pivot_multi_vals)

Different Functions for Different Columns

# Different aggfunc for different columns
pivot_diff_agg = pd.pivot_table(
    sales,
    values=['revenue', 'quantity'],
    index='region',
    aggfunc={'revenue': 'sum', 'quantity': 'mean'},
    fill_value=0
)

print("Total Revenue & Average Quantity by Region:")
print(pivot_diff_agg.round(2))

This calculates total revenue and average quantity per region—useful when different metrics need different aggregations.

Adding Margins and Totals

The margins parameter adds row and column totals (subtotals and grand total).

Adding Row and Column Totals

# Add margins (totals)
pivot_margins = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0,
    margins=True,
    margins_name='Total'
)

print("Pivot Table with Totals:")
print(pivot_margins)

Output:

product     Keyboard   Laptop   Mouse    Total
region
East               0  36000.0       0  36000.0
North         2000.0  12000.0  2850.0  16850.0
South         3360.0  18000.0   900.0  22260.0
West               0  16800.0  1200.0  18000.0
Total         5360.0  82800.0  4950.0  93110.0

The ‘Total’ row and column show sums across all categories—extremely useful for reports!

Margins with Multiple Aggregations

# Margins with multiple aggregations
pivot_margins_multi = pd.pivot_table(
    sales,
    values=['revenue', 'quantity'],
    index='region',
    aggfunc='sum',
    fill_value=0,
    margins=True
)

print("Revenue & Quantity with Totals:")
print(pivot_margins_multi)

Understanding .pivot() vs .pivot_table()

Both reshape data, but they have important differences.

When to Use .pivot()

.pivot() works only when there are NO duplicate index-column combinations. It does not aggregate.

# Simple data with unique combinations
simple_data = pd.DataFrame({
    'date': ['2023-01', '2023-01', '2023-02', '2023-02'],
    'product': ['Laptop', 'Mouse', 'Laptop', 'Mouse'],
    'revenue': [12000, 1500, 15000, 1800]
})

# Using .pivot() - no aggregation needed
pivoted = simple_data.pivot(
    index='date',
    columns='product',
    values='revenue'
)

print("Using .pivot():")
print(pivoted)

Output:

product   Laptop  Mouse
date
2023-01    12000   1500
2023-02    15000   1800

Works perfectly when each date-product combination appears only once!

When .pivot() Fails

# Data with duplicates
duplicate_data = pd.DataFrame({
    'date': ['2023-01', '2023-01', '2023-01'],  # Duplicate!
    'product': ['Laptop', 'Laptop', 'Mouse'],
    'revenue': [12000, 15000, 1500]
})

# Try .pivot() - will error!
try:
    duplicate_data.pivot(index='date', columns='product', values='revenue')
except ValueError as e:
    print(f".pivot() error: {e}")

# Use .pivot_table() instead - aggregates automatically
pivoted_dups = pd.pivot_table(
    duplicate_data,
    index='date',
    columns='product',
    values='revenue',
    aggfunc='sum'
)

print("\n.pivot_table() handles duplicates:")
print(pivoted_dups)

.pivot_table() automatically sums the two Laptop sales (12000 + 15000 = 27000).

Key Differences

Feature	`.pivot()`	`.pivot_table()`
Handles duplicates	No (raises error)	Yes (aggregates)
Aggregation required	No	Yes
Margins/totals	No	Yes
Multiple aggfuncs	No	Yes
Use case	Simple reshape	Complex analysis

Rule of thumb: Use .pivot_table() when in doubt—it’s more flexible and safer!

Practice Exercises

Apply pivot table operations with these exercises.

Exercise 1: Basic Pivot Table

Given employee data:

employees = pd.DataFrame({
    'name': ['James', 'Maria', 'David', 'Anna', 'Michael', 'Sophie'],
    'department': ['IT', 'HR', 'IT', 'Sales', 'Sales', 'HR'],
    'city': ['New York', 'London', 'New York', 'London', 'New York', 'London'],
    'salary': [50000, 45000, 55000, 48000, 52000, 47000]
})

Create a pivot table showing average salary by department and city
Add margins to see overall averages
Count employees in each department-city combination

# Your code here

Exercise 2: Multiple Aggregations

Given product sales data:

product_sales = pd.DataFrame({
    'month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'] * 2,
    'product': ['A', 'B', 'A', 'B', 'A', 'B'] * 2,
    'region': ['North'] * 6 + ['South'] * 6,
    'sales': [100, 150, 120, 140, 110, 160, 90, 130, 95, 125, 88, 145],
    'profit': [20, 30, 24, 28, 22, 32, 18, 26, 19, 25, 17.6, 29]
})

Create a pivot showing total sales by product and region
Show both sum and mean for sales and profit
Calculate profit margin (profit/sales) for each combination

# Your code here

Hint

For profit margin, create separate pivot tables for sales and profit, then divide them!

Exercise 3: Time Series Pivot

Given monthly data:

monthly_data = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=24, freq='M'),
    'category': ['Electronics', 'Clothing'] * 12,
    'revenue': np.random.randint(5000, 15000, 24)
})
monthly_data['month'] = monthly_data['date'].dt.month
monthly_data['quarter'] = monthly_data['date'].dt.quarter

Create a pivot showing revenue by category and quarter
Show revenue by category and month
Add margins to see quarterly and category totals

# Your code here

Summary

You now master pivot tables in pandas. Let’s review the key concepts.

Key Concepts

Basic Pivot Table

Reshapes data from long to wide format
Groups by index and columns
Aggregates values with specified function
Creates cross-tabulation layout

Components

values: Column to aggregate
index: Row labels
columns: Column headers
aggfunc: Aggregation function
fill_value: Replace NaN with specific value
margins: Add row/column totals

Multiple Aggregations

Pass list to aggfunc for multiple statistics
Specify different functions per column with dictionary
Creates multi-level column structure

Syntax Reference

# Basic pivot table
pd.pivot_table(df, values='col', index='row_col', columns='col_col', aggfunc='sum')

# With fill_value
pd.pivot_table(df, values='col', index='row', columns='col',
               aggfunc='sum', fill_value=0)

# Multiple aggregations
pd.pivot_table(df, values='col', index='row', columns='col',
               aggfunc=['sum', 'mean', 'count'])

# Multiple values
pd.pivot_table(df, values=['col1', 'col2'], index='row',
               columns='col', aggfunc='sum')

# Different aggfunc per column
pd.pivot_table(df, values=['col1', 'col2'], index='row',
               aggfunc={'col1': 'sum', 'col2': 'mean'})

# With margins (totals)
pd.pivot_table(df, values='col', index='row', columns='col',
               aggfunc='sum', margins=True, margins_name='Total')

Common Aggregation Functions

Function	Description	Use Case
`'sum'`	Total of all values	Total revenue, total sales
`'mean'`	Average value	Average price, average score
`'count'`	Number of non-null values	Count transactions
`'min'` / `'max'`	Minimum/maximum value	Lowest/highest price
`'median'`	Middle value	Median income
`'std'`	Standard deviation	Revenue variability
Custom function	Any function	Domain-specific calculations

.pivot() vs .pivot_table()

Use .pivot() when:

Data has unique index-column combinations
No aggregation needed
Simple reshape operation
Slightly faster for simple cases

Use .pivot_table() when:

Data may have duplicates
Need aggregation (sum, mean, etc.)
Want margins/totals
More flexible and safer (recommended default)

Important Reminders

Fill NaN: Use fill_value=0 for clearer presentation
Margins: Set margins=True to add totals
Aggregation: Choose appropriate function for your data type
Duplicates: .pivot_table() handles them, .pivot() doesn’t
Multi-level: Multiple aggregations create hierarchical columns
Performance: For very large datasets, consider GroupBy instead

Next Steps

You can now create powerful pivot tables for data summarization. In the next lesson, you will learn Concatenating DataFrames—combining multiple DataFrames by stacking rows or columns.

Continue to Lesson 17 - Concatenating DataFrames

Learn to combine multiple DataFrames by stacking

Back to Lesson 15 - GroupBy and Aggregation

Review splitting data and calculating category-based statistics

Master Data Summarization

Pivot tables are the spreadsheet analyst’s favorite tool—now you have them in pandas with even more power and flexibility.

Use pivot tables to create insightful summary reports and cross-tabulations!

Lesson 15 - GroupBy and Aggregation

Lesson 17 - Concatenating DataFrames

Courses

DATATWEETS

Title here

Lesson 16 - Pivot Tables

Reshaping Data for Analysis

Understanding Pivot Tables

Sample Data

Basic Pivot Tables

Creating Your First Pivot Table

Filling Missing Values

Different Aggregation Functions

Multiple Aggregations

Multiple Functions on Single Column

Aggregating Multiple Columns

Different Functions for Different Columns

Adding Margins and Totals

Adding Row and Column Totals

Margins with Multiple Aggregations

Understanding .pivot() vs .pivot_table()

When to Use .pivot()

When .pivot() Fails

Key Differences

Practice Exercises

Exercise 1: Basic Pivot Table

Exercise 2: Multiple Aggregations

Exercise 3: Time Series Pivot

Summary

Key Concepts

Syntax Reference

Common Aggregation Functions

.pivot() vs .pivot_table()

Important Reminders

Next Steps

Continue to Lesson 17 - Concatenating DataFrames

Back to Lesson 15 - GroupBy and Aggregation

Master Data Summarization

Lesson 16 - Pivot Tables

Reshaping Data for Analysis#

Understanding Pivot Tables#

Sample Data#

Basic Pivot Tables#

Creating Your First Pivot Table#

Filling Missing Values#

Different Aggregation Functions#

Multiple Aggregations#

Multiple Functions on Single Column#

Aggregating Multiple Columns#

Different Functions for Different Columns#

Adding Margins and Totals#

Adding Row and Column Totals#

Margins with Multiple Aggregations#

Understanding .pivot() vs .pivot_table()#

When to Use .pivot()#

When .pivot() Fails#

Key Differences#

Practice Exercises#

Exercise 1: Basic Pivot Table#

Exercise 2: Multiple Aggregations#

Exercise 3: Time Series Pivot#

Summary#

Key Concepts#

Syntax Reference#

Common Aggregation Functions#

.pivot() vs .pivot_table()#

Important Reminders#

Next Steps#

Continue to Lesson 17 - Concatenating DataFrames

Back to Lesson 15 - GroupBy and Aggregation

Master Data Summarization#

Reshaping Data for Analysis

Understanding Pivot Tables

Sample Data

Basic Pivot Tables

Creating Your First Pivot Table

Filling Missing Values

Different Aggregation Functions

Multiple Aggregations

Multiple Functions on Single Column

Aggregating Multiple Columns

Different Functions for Different Columns

Adding Margins and Totals

Adding Row and Column Totals

Margins with Multiple Aggregations

Understanding .pivot() vs .pivot_table()

When to Use .pivot()

When .pivot() Fails

Key Differences

Practice Exercises

Exercise 1: Basic Pivot Table

Exercise 2: Multiple Aggregations

Exercise 3: Time Series Pivot

Summary

Key Concepts

Syntax Reference

Common Aggregation Functions

.pivot() vs .pivot_table()

Important Reminders

Next Steps

Master Data Summarization