Lesson 16 - Pivot Tables

Reshaping Data for Analysis

You can group data and calculate statistics. Now you will learn Pivot Tables—the powerful spreadsheet-like tool for reshaping and summarizing data in a cross-tabulation format.

By the end of this lesson, you will be able to:

  • Create pivot tables with pd.pivot_table()
  • Understand index, columns, and values parameters
  • Apply different aggregation functions
  • Add margins for row/column totals
  • Distinguish between .pivot() and .pivot_table()
  • Create multi-level pivot tables

Pivot tables are essential for business intelligence, data exploration, and creating summary reports. Let’s master this crucial skill.


Understanding Pivot Tables

A pivot table summarizes data by grouping and aggregating, then reshapes it from long format to a cross-tabulation (wide format).

Key Components:

  • Index: Row labels (what goes down the left side)
  • Columns: Column headers (what goes across the top)
  • Values: Data to aggregate (numbers to summarize)
  • Aggfunc: How to aggregate (sum, mean, count, etc.)

Pivot tables answer questions like “What is total revenue by region and product?” or “How many sales per month by category?”


Sample Data

Let’s create sales data for demonstration:

import pandas as pd
import numpy as np

# Sales data
sales = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=12, freq='M'),
    'region': ['North', 'South', 'North', 'South', 'East', 'West',
               'North', 'South', 'East', 'West', 'North', 'South'],
    'product': ['Laptop', 'Laptop', 'Mouse', 'Mouse', 'Laptop', 'Mouse',
                'Keyboard', 'Keyboard', 'Laptop', 'Laptop', 'Mouse', 'Keyboard'],
    'quantity': [10, 15, 50, 30, 12, 40, 25, 20, 18, 14, 45, 22],
    'revenue': [12000, 18000, 1500, 900, 14400, 1200, 2000, 1600, 21600, 16800, 1350, 1760]
})

print("Sales Data:")
print(sales)
print(f"\nShape: {sales.shape}")

This dataset has 12 transactions across multiple regions and products—perfect for pivot table demonstrations.


Basic Pivot Tables

Create a basic pivot table showing total revenue by region and product.

Creating Your First Pivot Table

# Basic pivot: Revenue by region and product
pivot_basic = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum'
)

print("Revenue by Region and Product:")
print(pivot_basic)

Output:

product     Keyboard   Laptop    Mouse
region
East             NaN  36000.0      NaN
North         2000.0  12000.0   2850.0
South         3360.0  18000.0    900.0
West             NaN  16800.0   1200.0

Notice: NaN means no sales for that combination (e.g., East region sold no Keyboards).

Filling Missing Values

# Fill NaN with 0 for clearer presentation
pivot_filled = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0
)

print("Revenue with zeros:")
print(pivot_filled)

Output:

product     Keyboard   Laptop   Mouse
region
East               0  36000.0       0
North         2000.0  12000.0  2850.0
South         3360.0  18000.0   900.0
West               0  16800.0  1200.0

Much cleaner! Zero clearly indicates “no sales” rather than missing data.

Different Aggregation Functions

# Average revenue instead of total
pivot_avg = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='mean',
    fill_value=0
)

print("Average Revenue by Region and Product:")
print(pivot_avg.round(2))

Common aggregation functions: 'sum', 'mean', 'count', 'min', 'max', 'std', 'median'


Multiple Aggregations

Apply multiple aggregation functions at once to see different perspectives.

Multiple Functions on Single Column

# Multiple aggregations
pivot_multi = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc=['sum', 'mean', 'count'],
    fill_value=0
)

print("Multiple Aggregations:")
print(pivot_multi)

This creates a multi-level column structure showing sum, mean, and count for each region-product combination.

Aggregating Multiple Columns

# Aggregate multiple values
pivot_multi_vals = pd.pivot_table(
    sales,
    values=['revenue', 'quantity'],
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0
)

print("Revenue and Quantity:")
print(pivot_multi_vals)

Different Functions for Different Columns

# Different aggfunc for different columns
pivot_diff_agg = pd.pivot_table(
    sales,
    values=['revenue', 'quantity'],
    index='region',
    aggfunc={'revenue': 'sum', 'quantity': 'mean'},
    fill_value=0
)

print("Total Revenue & Average Quantity by Region:")
print(pivot_diff_agg.round(2))

This calculates total revenue and average quantity per region—useful when different metrics need different aggregations.


Adding Margins and Totals

The margins parameter adds row and column totals (subtotals and grand total).

Adding Row and Column Totals

# Add margins (totals)
pivot_margins = pd.pivot_table(
    sales,
    values='revenue',
    index='region',
    columns='product',
    aggfunc='sum',
    fill_value=0,
    margins=True,
    margins_name='Total'
)

print("Pivot Table with Totals:")
print(pivot_margins)

Output:

product     Keyboard   Laptop   Mouse    Total
region
East               0  36000.0       0  36000.0
North         2000.0  12000.0  2850.0  16850.0
South         3360.0  18000.0   900.0  22260.0
West               0  16800.0  1200.0  18000.0
Total         5360.0  82800.0  4950.0  93110.0

The ‘Total’ row and column show sums across all categories—extremely useful for reports!

Margins with Multiple Aggregations

# Margins with multiple aggregations
pivot_margins_multi = pd.pivot_table(
    sales,
    values=['revenue', 'quantity'],
    index='region',
    aggfunc='sum',
    fill_value=0,
    margins=True
)

print("Revenue & Quantity with Totals:")
print(pivot_margins_multi)

Understanding .pivot() vs .pivot_table()

Both reshape data, but they have important differences.

When to Use .pivot()

.pivot() works only when there are NO duplicate index-column combinations. It does not aggregate.

# Simple data with unique combinations
simple_data = pd.DataFrame({
    'date': ['2023-01', '2023-01', '2023-02', '2023-02'],
    'product': ['Laptop', 'Mouse', 'Laptop', 'Mouse'],
    'revenue': [12000, 1500, 15000, 1800]
})

# Using .pivot() - no aggregation needed
pivoted = simple_data.pivot(
    index='date',
    columns='product',
    values='revenue'
)

print("Using .pivot():")
print(pivoted)

Output:

product   Laptop  Mouse
date
2023-01    12000   1500
2023-02    15000   1800

Works perfectly when each date-product combination appears only once!

When .pivot() Fails

# Data with duplicates
duplicate_data = pd.DataFrame({
    'date': ['2023-01', '2023-01', '2023-01'],  # Duplicate!
    'product': ['Laptop', 'Laptop', 'Mouse'],
    'revenue': [12000, 15000, 1500]
})

# Try .pivot() - will error!
try:
    duplicate_data.pivot(index='date', columns='product', values='revenue')
except ValueError as e:
    print(f".pivot() error: {e}")

# Use .pivot_table() instead - aggregates automatically
pivoted_dups = pd.pivot_table(
    duplicate_data,
    index='date',
    columns='product',
    values='revenue',
    aggfunc='sum'
)

print("\n.pivot_table() handles duplicates:")
print(pivoted_dups)

.pivot_table() automatically sums the two Laptop sales (12000 + 15000 = 27000).

Key Differences

Feature.pivot().pivot_table()
Handles duplicatesNo (raises error)Yes (aggregates)
Aggregation requiredNoYes
Margins/totalsNoYes
Multiple aggfuncsNoYes
Use caseSimple reshapeComplex analysis

Rule of thumb: Use .pivot_table() when in doubt—it’s more flexible and safer!


Practice Exercises

Apply pivot table operations with these exercises.

Exercise 1: Basic Pivot Table

Given employee data:

employees = pd.DataFrame({
    'name': ['James', 'Maria', 'David', 'Anna', 'Michael', 'Sophie'],
    'department': ['IT', 'HR', 'IT', 'Sales', 'Sales', 'HR'],
    'city': ['New York', 'London', 'New York', 'London', 'New York', 'London'],
    'salary': [50000, 45000, 55000, 48000, 52000, 47000]
})
  1. Create a pivot table showing average salary by department and city
  2. Add margins to see overall averages
  3. Count employees in each department-city combination
# Your code here

Exercise 2: Multiple Aggregations

Given product sales data:

product_sales = pd.DataFrame({
    'month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'] * 2,
    'product': ['A', 'B', 'A', 'B', 'A', 'B'] * 2,
    'region': ['North'] * 6 + ['South'] * 6,
    'sales': [100, 150, 120, 140, 110, 160, 90, 130, 95, 125, 88, 145],
    'profit': [20, 30, 24, 28, 22, 32, 18, 26, 19, 25, 17.6, 29]
})
  1. Create a pivot showing total sales by product and region
  2. Show both sum and mean for sales and profit
  3. Calculate profit margin (profit/sales) for each combination
# Your code here

Hint

For profit margin, create separate pivot tables for sales and profit, then divide them!

Exercise 3: Time Series Pivot

Given monthly data:

monthly_data = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=24, freq='M'),
    'category': ['Electronics', 'Clothing'] * 12,
    'revenue': np.random.randint(5000, 15000, 24)
})
monthly_data['month'] = monthly_data['date'].dt.month
monthly_data['quarter'] = monthly_data['date'].dt.quarter
  1. Create a pivot showing revenue by category and quarter
  2. Show revenue by category and month
  3. Add margins to see quarterly and category totals
# Your code here

Summary

You now master pivot tables in pandas. Let’s review the key concepts.

Key Concepts

Basic Pivot Table

  • Reshapes data from long to wide format
  • Groups by index and columns
  • Aggregates values with specified function
  • Creates cross-tabulation layout

Components

  • values: Column to aggregate
  • index: Row labels
  • columns: Column headers
  • aggfunc: Aggregation function
  • fill_value: Replace NaN with specific value
  • margins: Add row/column totals

Multiple Aggregations

  • Pass list to aggfunc for multiple statistics
  • Specify different functions per column with dictionary
  • Creates multi-level column structure

Syntax Reference

# Basic pivot table
pd.pivot_table(df, values='col', index='row_col', columns='col_col', aggfunc='sum')

# With fill_value
pd.pivot_table(df, values='col', index='row', columns='col',
               aggfunc='sum', fill_value=0)

# Multiple aggregations
pd.pivot_table(df, values='col', index='row', columns='col',
               aggfunc=['sum', 'mean', 'count'])

# Multiple values
pd.pivot_table(df, values=['col1', 'col2'], index='row',
               columns='col', aggfunc='sum')

# Different aggfunc per column
pd.pivot_table(df, values=['col1', 'col2'], index='row',
               aggfunc={'col1': 'sum', 'col2': 'mean'})

# With margins (totals)
pd.pivot_table(df, values='col', index='row', columns='col',
               aggfunc='sum', margins=True, margins_name='Total')

Common Aggregation Functions

FunctionDescriptionUse Case
'sum'Total of all valuesTotal revenue, total sales
'mean'Average valueAverage price, average score
'count'Number of non-null valuesCount transactions
'min' / 'max'Minimum/maximum valueLowest/highest price
'median'Middle valueMedian income
'std'Standard deviationRevenue variability
Custom functionAny functionDomain-specific calculations

.pivot() vs .pivot_table()

Use .pivot() when:

  • Data has unique index-column combinations
  • No aggregation needed
  • Simple reshape operation
  • Slightly faster for simple cases

Use .pivot_table() when:

  • Data may have duplicates
  • Need aggregation (sum, mean, etc.)
  • Want margins/totals
  • More flexible and safer (recommended default)

Important Reminders

  • Fill NaN: Use fill_value=0 for clearer presentation
  • Margins: Set margins=True to add totals
  • Aggregation: Choose appropriate function for your data type
  • Duplicates: .pivot_table() handles them, .pivot() doesn’t
  • Multi-level: Multiple aggregations create hierarchical columns
  • Performance: For very large datasets, consider GroupBy instead

Next Steps

You can now create powerful pivot tables for data summarization. In the next lesson, you will learn Concatenating DataFrames—combining multiple DataFrames by stacking rows or columns.

Continue to Lesson 17 - Concatenating DataFrames

Learn to combine multiple DataFrames by stacking

Back to Lesson 15 - GroupBy and Aggregation

Review splitting data and calculating category-based statistics


Master Data Summarization

Pivot tables are the spreadsheet analyst’s favorite tool—now you have them in pandas with even more power and flexibility.

Use pivot tables to create insightful summary reports and cross-tabulations!