Lesson 16 - Pivot Tables
Reshaping Data for Analysis
You can group data and calculate statistics. Now you will learn Pivot Tables—the powerful spreadsheet-like tool for reshaping and summarizing data in a cross-tabulation format.
By the end of this lesson, you will be able to:
- Create pivot tables with
pd.pivot_table() - Understand index, columns, and values parameters
- Apply different aggregation functions
- Add margins for row/column totals
- Distinguish between
.pivot()and.pivot_table() - Create multi-level pivot tables
Pivot tables are essential for business intelligence, data exploration, and creating summary reports. Let’s master this crucial skill.
Understanding Pivot Tables
A pivot table summarizes data by grouping and aggregating, then reshapes it from long format to a cross-tabulation (wide format).
Key Components:
- Index: Row labels (what goes down the left side)
- Columns: Column headers (what goes across the top)
- Values: Data to aggregate (numbers to summarize)
- Aggfunc: How to aggregate (sum, mean, count, etc.)
Pivot tables answer questions like “What is total revenue by region and product?” or “How many sales per month by category?”
Sample Data
Let’s create sales data for demonstration:
import pandas as pd
import numpy as np
# Sales data
sales = pd.DataFrame({
'date': pd.date_range('2023-01-01', periods=12, freq='M'),
'region': ['North', 'South', 'North', 'South', 'East', 'West',
'North', 'South', 'East', 'West', 'North', 'South'],
'product': ['Laptop', 'Laptop', 'Mouse', 'Mouse', 'Laptop', 'Mouse',
'Keyboard', 'Keyboard', 'Laptop', 'Laptop', 'Mouse', 'Keyboard'],
'quantity': [10, 15, 50, 30, 12, 40, 25, 20, 18, 14, 45, 22],
'revenue': [12000, 18000, 1500, 900, 14400, 1200, 2000, 1600, 21600, 16800, 1350, 1760]
})
print("Sales Data:")
print(sales)
print(f"\nShape: {sales.shape}")This dataset has 12 transactions across multiple regions and products—perfect for pivot table demonstrations.
Basic Pivot Tables
Create a basic pivot table showing total revenue by region and product.
Creating Your First Pivot Table
# Basic pivot: Revenue by region and product
pivot_basic = pd.pivot_table(
sales,
values='revenue',
index='region',
columns='product',
aggfunc='sum'
)
print("Revenue by Region and Product:")
print(pivot_basic)Output:
product Keyboard Laptop Mouse
region
East NaN 36000.0 NaN
North 2000.0 12000.0 2850.0
South 3360.0 18000.0 900.0
West NaN 16800.0 1200.0Notice: NaN means no sales for that combination (e.g., East region sold no Keyboards).
Filling Missing Values
# Fill NaN with 0 for clearer presentation
pivot_filled = pd.pivot_table(
sales,
values='revenue',
index='region',
columns='product',
aggfunc='sum',
fill_value=0
)
print("Revenue with zeros:")
print(pivot_filled)Output:
product Keyboard Laptop Mouse
region
East 0 36000.0 0
North 2000.0 12000.0 2850.0
South 3360.0 18000.0 900.0
West 0 16800.0 1200.0Much cleaner! Zero clearly indicates “no sales” rather than missing data.
Different Aggregation Functions
# Average revenue instead of total
pivot_avg = pd.pivot_table(
sales,
values='revenue',
index='region',
columns='product',
aggfunc='mean',
fill_value=0
)
print("Average Revenue by Region and Product:")
print(pivot_avg.round(2))Common aggregation functions: 'sum', 'mean', 'count', 'min', 'max', 'std', 'median'
Multiple Aggregations
Apply multiple aggregation functions at once to see different perspectives.
Multiple Functions on Single Column
# Multiple aggregations
pivot_multi = pd.pivot_table(
sales,
values='revenue',
index='region',
columns='product',
aggfunc=['sum', 'mean', 'count'],
fill_value=0
)
print("Multiple Aggregations:")
print(pivot_multi)This creates a multi-level column structure showing sum, mean, and count for each region-product combination.
Aggregating Multiple Columns
# Aggregate multiple values
pivot_multi_vals = pd.pivot_table(
sales,
values=['revenue', 'quantity'],
index='region',
columns='product',
aggfunc='sum',
fill_value=0
)
print("Revenue and Quantity:")
print(pivot_multi_vals)Different Functions for Different Columns
# Different aggfunc for different columns
pivot_diff_agg = pd.pivot_table(
sales,
values=['revenue', 'quantity'],
index='region',
aggfunc={'revenue': 'sum', 'quantity': 'mean'},
fill_value=0
)
print("Total Revenue & Average Quantity by Region:")
print(pivot_diff_agg.round(2))This calculates total revenue and average quantity per region—useful when different metrics need different aggregations.
Adding Margins and Totals
The margins parameter adds row and column totals (subtotals and grand total).
Adding Row and Column Totals
# Add margins (totals)
pivot_margins = pd.pivot_table(
sales,
values='revenue',
index='region',
columns='product',
aggfunc='sum',
fill_value=0,
margins=True,
margins_name='Total'
)
print("Pivot Table with Totals:")
print(pivot_margins)Output:
product Keyboard Laptop Mouse Total
region
East 0 36000.0 0 36000.0
North 2000.0 12000.0 2850.0 16850.0
South 3360.0 18000.0 900.0 22260.0
West 0 16800.0 1200.0 18000.0
Total 5360.0 82800.0 4950.0 93110.0The ‘Total’ row and column show sums across all categories—extremely useful for reports!
Margins with Multiple Aggregations
# Margins with multiple aggregations
pivot_margins_multi = pd.pivot_table(
sales,
values=['revenue', 'quantity'],
index='region',
aggfunc='sum',
fill_value=0,
margins=True
)
print("Revenue & Quantity with Totals:")
print(pivot_margins_multi)Understanding .pivot() vs .pivot_table()
Both reshape data, but they have important differences.
When to Use .pivot()
.pivot() works only when there are NO duplicate index-column combinations. It does not aggregate.
# Simple data with unique combinations
simple_data = pd.DataFrame({
'date': ['2023-01', '2023-01', '2023-02', '2023-02'],
'product': ['Laptop', 'Mouse', 'Laptop', 'Mouse'],
'revenue': [12000, 1500, 15000, 1800]
})
# Using .pivot() - no aggregation needed
pivoted = simple_data.pivot(
index='date',
columns='product',
values='revenue'
)
print("Using .pivot():")
print(pivoted)Output:
product Laptop Mouse
date
2023-01 12000 1500
2023-02 15000 1800Works perfectly when each date-product combination appears only once!
When .pivot() Fails
# Data with duplicates
duplicate_data = pd.DataFrame({
'date': ['2023-01', '2023-01', '2023-01'], # Duplicate!
'product': ['Laptop', 'Laptop', 'Mouse'],
'revenue': [12000, 15000, 1500]
})
# Try .pivot() - will error!
try:
duplicate_data.pivot(index='date', columns='product', values='revenue')
except ValueError as e:
print(f".pivot() error: {e}")
# Use .pivot_table() instead - aggregates automatically
pivoted_dups = pd.pivot_table(
duplicate_data,
index='date',
columns='product',
values='revenue',
aggfunc='sum'
)
print("\n.pivot_table() handles duplicates:")
print(pivoted_dups).pivot_table() automatically sums the two Laptop sales (12000 + 15000 = 27000).
Key Differences
| Feature | .pivot() | .pivot_table() |
|---|---|---|
| Handles duplicates | No (raises error) | Yes (aggregates) |
| Aggregation required | No | Yes |
| Margins/totals | No | Yes |
| Multiple aggfuncs | No | Yes |
| Use case | Simple reshape | Complex analysis |
Rule of thumb: Use .pivot_table() when in doubt—it’s more flexible and safer!
Practice Exercises
Apply pivot table operations with these exercises.
Exercise 1: Basic Pivot Table
Given employee data:
employees = pd.DataFrame({
'name': ['James', 'Maria', 'David', 'Anna', 'Michael', 'Sophie'],
'department': ['IT', 'HR', 'IT', 'Sales', 'Sales', 'HR'],
'city': ['New York', 'London', 'New York', 'London', 'New York', 'London'],
'salary': [50000, 45000, 55000, 48000, 52000, 47000]
})- Create a pivot table showing average salary by department and city
- Add margins to see overall averages
- Count employees in each department-city combination
# Your code hereExercise 2: Multiple Aggregations
Given product sales data:
product_sales = pd.DataFrame({
'month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'] * 2,
'product': ['A', 'B', 'A', 'B', 'A', 'B'] * 2,
'region': ['North'] * 6 + ['South'] * 6,
'sales': [100, 150, 120, 140, 110, 160, 90, 130, 95, 125, 88, 145],
'profit': [20, 30, 24, 28, 22, 32, 18, 26, 19, 25, 17.6, 29]
})- Create a pivot showing total sales by product and region
- Show both sum and mean for sales and profit
- Calculate profit margin (profit/sales) for each combination
# Your code hereHint
For profit margin, create separate pivot tables for sales and profit, then divide them!
Exercise 3: Time Series Pivot
Given monthly data:
monthly_data = pd.DataFrame({
'date': pd.date_range('2023-01-01', periods=24, freq='M'),
'category': ['Electronics', 'Clothing'] * 12,
'revenue': np.random.randint(5000, 15000, 24)
})
monthly_data['month'] = monthly_data['date'].dt.month
monthly_data['quarter'] = monthly_data['date'].dt.quarter- Create a pivot showing revenue by category and quarter
- Show revenue by category and month
- Add margins to see quarterly and category totals
# Your code hereSummary
You now master pivot tables in pandas. Let’s review the key concepts.
Key Concepts
Basic Pivot Table
- Reshapes data from long to wide format
- Groups by index and columns
- Aggregates values with specified function
- Creates cross-tabulation layout
Components
values: Column to aggregateindex: Row labelscolumns: Column headersaggfunc: Aggregation functionfill_value: Replace NaN with specific valuemargins: Add row/column totals
Multiple Aggregations
- Pass list to
aggfuncfor multiple statistics - Specify different functions per column with dictionary
- Creates multi-level column structure
Syntax Reference
# Basic pivot table
pd.pivot_table(df, values='col', index='row_col', columns='col_col', aggfunc='sum')
# With fill_value
pd.pivot_table(df, values='col', index='row', columns='col',
aggfunc='sum', fill_value=0)
# Multiple aggregations
pd.pivot_table(df, values='col', index='row', columns='col',
aggfunc=['sum', 'mean', 'count'])
# Multiple values
pd.pivot_table(df, values=['col1', 'col2'], index='row',
columns='col', aggfunc='sum')
# Different aggfunc per column
pd.pivot_table(df, values=['col1', 'col2'], index='row',
aggfunc={'col1': 'sum', 'col2': 'mean'})
# With margins (totals)
pd.pivot_table(df, values='col', index='row', columns='col',
aggfunc='sum', margins=True, margins_name='Total')Common Aggregation Functions
| Function | Description | Use Case |
|---|---|---|
'sum' | Total of all values | Total revenue, total sales |
'mean' | Average value | Average price, average score |
'count' | Number of non-null values | Count transactions |
'min' / 'max' | Minimum/maximum value | Lowest/highest price |
'median' | Middle value | Median income |
'std' | Standard deviation | Revenue variability |
| Custom function | Any function | Domain-specific calculations |
.pivot() vs .pivot_table()
Use .pivot() when:
- Data has unique index-column combinations
- No aggregation needed
- Simple reshape operation
- Slightly faster for simple cases
Use .pivot_table() when:
- Data may have duplicates
- Need aggregation (sum, mean, etc.)
- Want margins/totals
- More flexible and safer (recommended default)
Important Reminders
- Fill NaN: Use
fill_value=0for clearer presentation - Margins: Set
margins=Trueto add totals - Aggregation: Choose appropriate function for your data type
- Duplicates:
.pivot_table()handles them,.pivot()doesn’t - Multi-level: Multiple aggregations create hierarchical columns
- Performance: For very large datasets, consider GroupBy instead
Next Steps
You can now create powerful pivot tables for data summarization. In the next lesson, you will learn Concatenating DataFrames—combining multiple DataFrames by stacking rows or columns.
Continue to Lesson 17 - Concatenating DataFrames
Learn to combine multiple DataFrames by stacking
Back to Lesson 15 - GroupBy and Aggregation
Review splitting data and calculating category-based statistics
Master Data Summarization
Pivot tables are the spreadsheet analyst’s favorite tool—now you have them in pandas with even more power and flexibility.
Use pivot tables to create insightful summary reports and cross-tabulations!