Lesson 19 - MultiIndex and Hierarchical Data

Working with Hierarchical Indexes

You can merge related DataFrames. Now you will learn MultiIndex—using hierarchical indexes to store and manipulate higher-dimensional data in a 2D DataFrame.

By the end of this lesson, you will be able to:

Create multi-level indexes with set_index()
Select data from multi-indexed DataFrames
Use .stack() and .unstack() to reshape data
Reset and manipulate hierarchical indexes
Understand when multi-index is useful
Aggregate data by specific index levels

Multi-index enables elegant representation of complex data hierarchies like Country → City → Store or Year → Quarter → Month. Let’s master this advanced technique.

Why MultiIndex?

Multi-index (hierarchical indexing) allows you to:

Store higher-dimensional data in 2D DataFrame
Group data by multiple levels (e.g., Country → City → Store)
Perform efficient operations on grouped data
Reshape data between wide and long formats

Common use cases:

Time series with multiple locations
Sales data by Region → City → Product
Scientific data with multiple experimental factors
Panel data (multiple metrics over time for different entities)

Sample Data

Let’s create sales data across different regions and products:

import pandas as pd
import numpy as np

# Sales data
sales = pd.DataFrame({
    'region': ['North', 'North', 'North', 'South', 'South', 'South',
               'East', 'East', 'East', 'West', 'West', 'West'],
    'city': ['New York', 'New York', 'New York', 'London', 'London', 'London',
             'Paris', 'Paris', 'Paris', 'Berlin', 'Berlin', 'Berlin'],
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Mouse', 'Keyboard',
                'Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Mouse', 'Keyboard'],
    'quantity': [15, 120, 45, 20, 150, 60, 12, 90, 30, 18, 110, 40],
    'revenue': [18000, 3600, 3600, 24000, 4500, 4800, 14400, 2700, 2400, 21600, 3300, 3200]
})

print("Original Sales Data:")
print(sales)
print(f"\nShape: {sales.shape}")

This dataset has 12 records across multiple regions, cities, and products—perfect for demonstrating hierarchical indexing.

Creating Multi-Index

Transform regular columns into hierarchical index levels.

Create Multi-Index with set_index()

# Create multi-index: Region → City → Product
sales_multi = sales.set_index(['region', 'city', 'product'])

print("Multi-Indexed DataFrame:")
print(sales_multi)
print(f"\nIndex levels: {sales_multi.index.names}")
print(f"Number of levels: {sales_multi.index.nlevels}")

Output:

                                    quantity  revenue
region city     product
North  New York Laptop                   15    18000
                Mouse                   120     3600
                Keyboard                 45     3600
South  London   Laptop                   20    24000
                Mouse                   150     4500
                Keyboard                 60     4800
East   Paris    Laptop                   12    14400
                Mouse                    90     2700
                Keyboard                 30     2400
West   Berlin   Laptop                   18    21600
                Mouse                   110     3300
                Keyboard                 40     3200

Notice the hierarchical structure:

Level 0: region
Level 1: city
Level 2: product

Selecting from Multi-Index

Navigate hierarchical data using .loc[] with different approaches.

Select by Single Level

# Select all data for North region
north_sales = sales_multi.loc['North']

print("North Region Sales:")
print(north_sales)

Output:

                         quantity  revenue
city     product
New York Laptop                15    18000
         Mouse                120     3600
         Keyboard              45     3600

Select by Multiple Levels

Use tuples to select specific combinations:

# Select North → New York
north_newyork = sales_multi.loc[('North', 'New York')]

print("North Region, New York City:")
print(north_newyork)

Output:

          quantity  revenue
product
Laptop          15    18000
Mouse          120     3600
Keyboard        45     3600

Select Specific Item

# Select specific product: North → New York → Laptop
specific = sales_multi.loc[('North', 'New York', 'Laptop')]

print("North → New York → Laptop:")
print(specific)
print(f"\nType: {type(specific)}")

Output:

quantity     15
revenue   18000
Name: (North, New York, Laptop), dtype: int64

Slice Across Levels

Use IndexSlice for cleaner syntax when slicing:

# Select all Laptops across all regions/cities
idx = pd.IndexSlice
all_laptops = sales_multi.loc[idx[:, :, 'Laptop'], :]

print("All Laptop Sales:")
print(all_laptops)

Output:

                               quantity  revenue
region city     product
North  New York Laptop               15    18000
South  London   Laptop               20    24000
East   Paris    Laptop               12    14400
West   Berlin   Laptop               18    21600

Aggregating Multi-Index Data

Perform calculations across specific index levels.

Aggregate by Level

# Total revenue by region (level 0)
revenue_by_region = sales_multi.groupby(level=0)['revenue'].sum()

print("Revenue by Region:")
print(revenue_by_region)
print(f"\nBest region: {revenue_by_region.idxmax()} (${revenue_by_region.max():,})")

Output:

region
East     19500
North    25200
South    33300
West     28100
Name: revenue, dtype: int64

Best region: South ($33,300)

Aggregate by Multiple Levels

# Total quantity by product (level 2)
quantity_by_product = sales_multi.groupby(level=2)['quantity'].sum().sort_values(ascending=False)

print("Quantity by Product:")
print(quantity_by_product)

Output:

product
Mouse       470
Keyboard    175
Laptop       65
Name: quantity, dtype: int64

Aggregate by Multiple Levels Simultaneously

# Revenue by region and product
revenue_by_region_product = sales_multi.groupby(level=[0, 2])['revenue'].sum()

print("Revenue by Region and Product:")
print(revenue_by_region_product)

Output:

region  product
East    Keyboard     2400
        Laptop      14400
        Mouse        2700
North   Keyboard     3600
        Laptop      18000
        Mouse        3600
South   Keyboard     4800
        Laptop      24000
        Mouse        4500
West    Keyboard     3200
        Laptop      21600
        Mouse        3300
Name: revenue, dtype: int64

Stack and Unstack

Reshape data by moving index levels to columns and vice versa.

Unstack - Move Index to Columns

.unstack() pivots the innermost index level to columns:

# Unstack product level (innermost)
unstacked = sales_multi.unstack()

print("Unstacked (Products as Columns):")
print(unstacked)

Output:

                quantity                       revenue
product          Keyboard Laptop Mouse Keyboard  Laptop  Mouse
region city
East   Paris           30     12    90     2400   14400   2700
North  New York        45     15   120     3600   18000   3600
South  London          60     20   150     4800   24000   4500
West   Berlin          40     18   110     3200   21600   3300

This creates a wider DataFrame with products as column headers.

Unstack Specific Level

# Unstack region level
unstacked_region = sales_multi.unstack(level='region')

print("Unstacked by Region:")
print(unstacked_region.head())

Stack - Move Columns to Index

.stack() does the opposite—moves column level to index:

# Stack back to original shape
stacked_back = unstacked.stack()

print("Stacked Back:")
print(stacked_back.head())

Practical Use - Creating Pivot-like Views

# Revenue by Region (rows) and Product (columns)
revenue_pivot = sales_multi['revenue'].unstack(level='product')

print("Revenue: Regions vs Products:")
print(revenue_pivot)

Output:

product        Keyboard   Laptop  Mouse
region city
East   Paris       2400  14400.0   2700
North  New York    3600  18000.0   3600
South  London      4800  24000.0   4500
West   Berlin      3200  21600.0   3300

This creates a pivot table-like view without using pivot_table()!

Resetting Index

Convert multi-index back to regular columns.

Reset All Index Levels

# Reset index to make it a regular DataFrame
flat_df = sales_multi.reset_index()

print("Flattened DataFrame:")
print(flat_df)

Output:

  region      city   product  quantity  revenue
0  North  New York    Laptop        15    18000
1  North  New York     Mouse       120     3600
2  North  New York  Keyboard        45     3600
...

Reset Specific Level

# Reset only the product level
partial_reset = sales_multi.reset_index(level='product')

print("Reset Product Level Only:")
print(partial_reset.head())

Practice Exercises

Apply multi-index operations with these exercises.

Exercise 1: Student Performance Multi-Index

Create and analyze student performance across subjects and semesters:

students = pd.DataFrame({
    'student': ['James', 'James', 'James', 'Maria', 'Maria', 'Maria', 'David', 'David', 'David'],
    'semester': ['Fall', 'Fall', 'Spring', 'Fall', 'Fall', 'Spring', 'Fall', 'Fall', 'Spring'],
    'subject': ['Math', 'Physics', 'Math', 'Math', 'Physics', 'Math', 'Math', 'Physics', 'Math'],
    'score': [85, 78, 92, 90, 88, 95, 76, 82, 80]
})

Set multi-index on student and semester
Find James’s scores for all semesters
Calculate average score by student
Unstack to show subjects as columns

# Your code here

Hint

Use .loc['James'] to select all data for James across all index levels!

Exercise 2: Temperature Data Reshaping

Work with temperature data across cities and seasons:

temps = pd.DataFrame({
    'city': ['New York', 'New York', 'New York', 'London', 'London', 'London', 'Paris', 'Paris', 'Paris'],
    'season': ['Winter', 'Spring', 'Summer', 'Winter', 'Spring', 'Summer', 'Winter', 'Spring', 'Summer'],
    'temp_celsius': [8, 18, 35, 20, 30, 42, 5, 15, 28]
})

Create multi-index on city and season
Use unstack to create a wide format (seasons as columns)
Find the warmest city in Summer
Calculate average temperature per city

# Your code here

Summary

You now master multi-index and hierarchical data. Let’s review the key concepts.

Key Concepts

Creating Multi-Index

Use set_index() with multiple columns
Creates hierarchical index structure
Each level can be referenced by number or name
Enables elegant representation of complex hierarchies

Selecting Data

Single level: df.loc['level0_value']
Multiple levels: df.loc[('level0', 'level1')]
Slicing with IndexSlice: df.loc[idx[:, :, 'value'], :]
Returns DataFrame or Series depending on selection

Aggregating by Level

Group by level number: df.groupby(level=0)
Group by level name: df.groupby(level='name')
Multiple levels: df.groupby(level=[0, 2])

Reshaping

Unstack: Index → Columns (creates wide format)
Stack: Columns → Index (creates long format)
Specify level to control which level moves
Useful for pivot-like transformations

Syntax Reference

# Create multi-index
df.set_index(['col1', 'col2', 'col3'])

# Select by levels
df.loc['level0']
df.loc[('level0', 'level1')]
df.loc[('level0', 'level1', 'level2')]

# IndexSlice for complex slicing
idx = pd.IndexSlice
df.loc[idx[:, :, 'value'], :]

# Aggregate by level
df.groupby(level=0).sum()
df.groupby(level='name').mean()
df.groupby(level=[0, 2]).sum()

# Stack and unstack
df.unstack()              # Innermost to columns
df.unstack(level=0)       # Specific level
df.unstack(level='name')  # By name
df.stack()                # Columns to index

# Reset index
df.reset_index()          # All levels
df.reset_index(level=0)   # Specific level
df.reset_index(level='name')

When to Use Multi-Index?

Good Use Cases:

Time series data with multiple locations
Data with natural hierarchy (Country → State → City)
Results of GroupBy operations (keep the multi-index)
Panel data (multiple metrics over time for entities)
Scientific experiments with multiple factors

Avoid When:

Simple, flat data structure is sufficient
Frequent need to reset index for operations
Sharing data with users unfamiliar with hierarchical indexing
Performance is critical (multi-index has overhead)

Important Reminders

Tuples for selection: Use df.loc[('A', 'B')] for multiple levels
IndexSlice: Use pd.IndexSlice for complex slicing with : syntax
Unstack creates NaN: Missing combinations become NaN in wide format
Level parameter: Can use numbers (0, 1, 2) or names
GroupBy creates multi-index: Many operations naturally create it
Reset when needed: Use reset_index() to flatten for export

Common Patterns

Pattern 1: Create and Aggregate

df_multi = df.set_index(['category', 'subcategory'])
result = df_multi.groupby(level=0).sum()

Pattern 2: Reshape for Analysis

df_multi = df.set_index(['date', 'metric'])
wide = df_multi.unstack(level='metric')

Pattern 3: Select and Compute

idx = pd.IndexSlice
subset = df_multi.loc[idx[:, 'SpecificValue'], :]
stats = subset.groupby(level=0).mean()

Next Steps

You can now work with hierarchical indexes for complex data structures. In the next lesson, you will learn Window Functions and Rolling Operations—calculating moving averages, cumulative sums, and time-based window calculations.

Continue to Lesson 20 - Window Functions and Rolling

Learn rolling averages, cumulative sums, and window calculations

Back to Lesson 18 - Merging and Joining

Review SQL-style joins for combining DataFrames

Master Hierarchical Data

Multi-index unlocks powerful hierarchical data representations—essential for complex analyses and elegant data structures.

Use multi-index to organize and analyze multi-dimensional data efficiently!

Lesson 18 - Merging and Joining DataFrames

Lesson 20 - Window Functions and Rolling Operations

Courses

DATATWEETS

Title here

Lesson 19 - MultiIndex and Hierarchical Data

Working with Hierarchical Indexes

Why MultiIndex?

Sample Data

Creating Multi-Index

Create Multi-Index with set_index()

Selecting from Multi-Index

Select by Single Level

Select by Multiple Levels

Select Specific Item

Slice Across Levels

Aggregating Multi-Index Data

Aggregate by Level

Aggregate by Multiple Levels

Aggregate by Multiple Levels Simultaneously

Stack and Unstack

Unstack - Move Index to Columns

Unstack Specific Level

Stack - Move Columns to Index

Practical Use - Creating Pivot-like Views

Resetting Index

Reset All Index Levels

Reset Specific Level

Practice Exercises

Exercise 1: Student Performance Multi-Index

Exercise 2: Temperature Data Reshaping

Summary

Key Concepts

Syntax Reference

When to Use Multi-Index?

Important Reminders

Common Patterns

Next Steps

Continue to Lesson 20 - Window Functions and Rolling

Back to Lesson 18 - Merging and Joining

Master Hierarchical Data

Lesson 19 - MultiIndex and Hierarchical Data

Working with Hierarchical Indexes#

Why MultiIndex?#

Sample Data#

Creating Multi-Index#

Create Multi-Index with set_index()#

Selecting from Multi-Index#

Select by Single Level#

Select by Multiple Levels#

Select Specific Item#

Slice Across Levels#

Aggregating Multi-Index Data#

Aggregate by Level#

Aggregate by Multiple Levels#

Aggregate by Multiple Levels Simultaneously#

Stack and Unstack#

Unstack - Move Index to Columns#

Unstack Specific Level#

Stack - Move Columns to Index#

Practical Use - Creating Pivot-like Views#

Resetting Index#

Reset All Index Levels#

Reset Specific Level#

Practice Exercises#

Exercise 1: Student Performance Multi-Index#

Exercise 2: Temperature Data Reshaping#

Summary#

Key Concepts#

Syntax Reference#

When to Use Multi-Index?#

Important Reminders#

Common Patterns#

Next Steps#

Continue to Lesson 20 - Window Functions and Rolling

Back to Lesson 18 - Merging and Joining

Master Hierarchical Data#

Working with Hierarchical Indexes

Why MultiIndex?

Sample Data

Creating Multi-Index

Create Multi-Index with set_index()

Selecting from Multi-Index

Select by Single Level

Select by Multiple Levels

Select Specific Item

Slice Across Levels

Aggregating Multi-Index Data

Aggregate by Level

Aggregate by Multiple Levels

Aggregate by Multiple Levels Simultaneously

Stack and Unstack

Unstack - Move Index to Columns

Unstack Specific Level

Stack - Move Columns to Index

Practical Use - Creating Pivot-like Views

Resetting Index

Reset All Index Levels

Reset Specific Level

Practice Exercises

Exercise 1: Student Performance Multi-Index

Exercise 2: Temperature Data Reshaping

Summary

Key Concepts

Syntax Reference

When to Use Multi-Index?

Important Reminders

Common Patterns

Next Steps

Master Hierarchical Data