Lesson 10 - Apply, Map, and Transform Functions
Advanced Transformation Techniques
You learned basic column operations. Now you will master advanced transformation techniques—replacing values with .map(), applying functions at different levels with .apply(), and reshaping data between wide and long formats.
By the end of this lesson, you will be able to:
- Replace values using
.map()with dictionaries - Apply functions at element, row, and column levels
- Reshape wide data to long format with
.melt() - Reshape long data to wide format with
.pivot() - Understand when to use each transformation method
These techniques handle complex transformations that simple arithmetic cannot—recoding values, multi-column logic, and restructuring data for different analysis needs.
Using .map() to Replace Values
The .map() method replaces values in a Series based on a dictionary or function.
Sample Data
import pandas as pd
import numpy as np
# Create companies dataset
companies = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines'],
'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
'Healthcare', 'Retail', 'Transportation'],
'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
'USA', 'UK', 'Germany'],
'revenues': [125000, 89000, 156000, 234000, 178000,
98000, 112000, 187000],
'profits': [12000, 8500, -3000, 45000, 23000,
15000, 9800, 21000]
})
print("Companies dataset:")
companies.head()Map with Dictionary
# Map country codes using dictionary
country_codes = {
'USA': 'IR',
'UK': 'AE',
'Germany': 'QA'
}
companies['country_code'] = companies['country'].map(country_codes)
print("With country codes:")
companies[['company', 'country', 'country_code']]Output:
company country country_code
0 TechCorp Global USA IR
1 FreshMart Inc USA IR
2 AutoDrive Motors USA IR
3 FirstBank Holdings UK AE
4 PowerGen Energy Germany QA
...Each value in ‘country’ is replaced by its corresponding code.
Another Mapping Example
# Map sector to industry category
industry_category = {
'Technology': 'Tech & Telecom',
'Food': 'Consumer Goods',
'Automotive': 'Industrials',
'Financials': 'Financial Services',
'Energy': 'Energy & Utilities',
'Healthcare': 'Healthcare',
'Retail': 'Consumer Goods',
'Transportation': 'Industrials'
}
companies['industry'] = companies['sector'].map(industry_category)
print("With industry categories:")
companies[['company', 'sector', 'industry']].head()Output:
company sector industry
0 TechCorp Global Technology Tech & Telecom
1 FreshMart Inc Food Consumer Goods
2 AutoDrive Motors Automotive Industrials
3 FirstBank Holdings Financials Financial Services
4 PowerGen Energy Energy Energy & UtilitiesMap with Function
# Map using a function
def categorize_revenue(revenue):
if revenue >= 150000:
return 'Large'
elif revenue >= 100000:
return 'Medium'
else:
return 'Small'
companies['size_category'] = companies['revenues'].map(categorize_revenue)
print("Revenue-based size:")
companies[['company', 'revenues', 'size_category']]Output:
company revenues size_category
0 TechCorp Global 125000 Medium
1 FreshMart Inc 89000 Small
2 AutoDrive Motors 156000 Large
3 FirstBank Holdings 234000 Large
....map() vs .replace()
# .replace() works on entire DataFrame or specific values
companies_replaced = companies.replace({
'USA': 'Islamic Republic of USA',
'UK': 'United Arab Emirates'
})
print("Using .replace():")
print(companies_replaced[['company', 'country']].head()).replace() finds and replaces values anywhere in the DataFrame, while .map() transforms an entire Series.
Advanced .apply() Usage
The .apply() method is versatile—it works at different levels for different transformations.
Apply on Series (Element-wise)
# Format revenue values
companies['revenue_k'] = companies['revenues'].apply(lambda x: f"{x/1000:.0f}K")
print("Formatted revenues:")
companies[['company', 'revenues', 'revenue_k']].head()Output:
company revenues revenue_k
0 TechCorp Global 125000 125K
1 FreshMart Inc 89000 89K
2 AutoDrive Motors 156000 156K
...Apply on DataFrame Rows (axis=1)
Apply a function to each row, accessing multiple columns:
# Calculate multiple metrics at once
def calculate_metrics(row):
"""Calculate metrics from row data"""
margin = (row['profits'] / row['revenues']) * 100
status = 'Profitable' if row['profits'] > 0 else 'Loss'
return pd.Series({
'profit_margin': round(margin, 2),
'status': status
})
# Apply returns multiple columns!
metrics = companies.apply(calculate_metrics, axis=1)
print("Calculated metrics:")
print(metrics.head())Output:
profit_margin status
0 9.60 Profitable
1 9.55 Profitable
2 -1.92 Loss
3 19.23 Profitable
4 12.92 ProfitableCombine with original DataFrame:
# Add the new columns
companies_extended = pd.concat([companies, metrics], axis=1)
print("Extended DataFrame:")
companies_extended[['company', 'revenues', 'profits', 'profit_margin', 'status']].head()Apply on DataFrame Columns (axis=0)
Aggregate entire columns:
# Calculate range (max - min) for each numeric column
numeric_cols = companies[['revenues', 'profits']]
column_stats = numeric_cols.apply(lambda col: col.max() - col.min())
print("Range (max - min) for each column:")
print(column_stats)Output:
revenues 145000
profits 48000
dtype: int64Modern Element-wise Operations
For applying functions to every element in a DataFrame:
# Example: double every value
sample_data = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
# Modern way: use .map() on DataFrame
doubled = sample_data.map(lambda x: x * 2)
print("Every element doubled:")
print(doubled)Output:
A B
0 2 8
1 4 10
2 6 12Reshaping with .melt()
The .melt() function transforms wide format to long format—converting columns to rows.
Wide Format Example
# Wide format: Sales data with quarters as columns
sales_wide = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors'],
'Q1': [30000, 25000, 40000],
'Q2': [32000, 26000, 38000],
'Q3': [35000, 27000, 42000],
'Q4': [38000, 28000, 45000]
})
print("Wide format (each quarter is a column):")
print(sales_wide)Output:
company Q1 Q2 Q3 Q4
0 TechCorp Global 30000 32000 35000 38000
1 FreshMart Inc 25000 26000 27000 28000
2 AutoDrive Motors 40000 38000 42000 45000One row per company, quarters as columns.
Melt to Long Format
# Melt to long format
sales_long = sales_wide.melt(
id_vars=['company'], # Column(s) to keep as identifier
var_name='quarter', # Name for the variable column
value_name='sales' # Name for the values column
)
print("Long format (each row is one observation):")
print(sales_long)Output:
company quarter sales
0 TechCorp Global Q1 30000
1 FreshMart Inc Q1 25000
2 AutoDrive Motors Q1 40000
3 TechCorp Global Q2 32000
4 FreshMart Inc Q2 26000
5 AutoDrive Motors Q2 38000
6 TechCorp Global Q3 35000
...Now each row represents one company-quarter observation.
Why Use Long Format?
# 1. Easier to group and aggregate
avg_by_quarter = sales_long.groupby('quarter')['sales'].mean()
print("Average sales by quarter:")
print(avg_by_quarter)Output:
quarter
Q1 31666.666667
Q2 32000.000000
Q3 34666.666667
Q4 37000.000000
Name: sales, dtype: float64Long format works better for groupby, plotting, and many analysis tools.
Melting Multiple Metrics
# Wide format with multiple metrics
financials = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc'],
'revenue_2023': [100000, 80000],
'revenue_2024': [125000, 89000],
'profit_2023': [10000, 7000],
'profit_2024': [12000, 8500]
})
print("Wide format with multiple metrics:")
print(financials)Melt to long format:
# Melt
financials_long = financials.melt(
id_vars=['company'],
var_name='metric_year',
value_name='amount'
)
print("Long format:")
print(financials_long)Split into separate columns:
# Split metric and year
financials_long[['metric', 'year']] = financials_long['metric_year'].str.split('_', expand=True)
financials_long = financials_long.drop('metric_year', axis=1)
financials_long = financials_long[['company', 'year', 'metric', 'amount']]
print("Cleaned long format:")
print(financials_long)Output:
company year metric amount
0 TechCorp Global 2023 revenue 100000
1 FreshMart Inc 2023 revenue 80000
2 TechCorp Global 2024 revenue 125000
3 FreshMart Inc 2024 revenue 89000
4 TechCorp Global 2023 profit 10000
5 FreshMart Inc 2023 profit 7000
6 TechCorp Global 2024 profit 12000
7 FreshMart Inc 2024 profit 8500Reshaping with .pivot()
The .pivot() function transforms long format to wide format—the opposite of melt.
Pivot Back to Wide
# Start with long format
print("Long format:")
print(sales_long.head())
# Pivot back to wide format
sales_wide_again = sales_long.pivot(
index='company', # Row identifier
columns='quarter', # What becomes column names
values='sales' # What fills the cells
)
print("\nBack to wide format:")
print(sales_wide_again)Output:
quarter Q1 Q2 Q3 Q4
company
TechCorp Global 30000 32000 35000 38000
FreshMart Inc 25000 26000 27000 28000
AutoDrive Motors 40000 38000 42000 45000Reset index for cleaner format:
# Reset index to make company a regular column
sales_wide_clean = sales_wide_again.reset_index()
print("Wide format with reset index:")
print(sales_wide_clean)Pivot with Multiple Levels
# Pivot financials to show metrics as columns
financials_pivoted = financials_long.pivot(
index=['company', 'year'],
columns='metric',
values='amount'
)
print("Pivoted by metric:")
print(financials_pivoted)Output:
metric profit revenue
company year
TechCorp Global 2023 10000 100000
2024 12000 125000
FreshMart Inc 2023 7000 80000
2024 8500 89000Metrics are now columns!
Alternative Pivot Structure
# Pivot to show years as columns
financials_by_year = financials_long.pivot(
index=['company', 'metric'],
columns='year',
values='amount'
)
print("Pivoted by year:")
print(financials_by_year)Output:
year 2023 2024
company metric
TechCorp Global profit 10000 12000
revenue 100000 125000
FreshMart Inc profit 7000 8500
revenue 80000 89000Practice Exercises
Apply advanced transformations with these exercises.
Exercise 1: Using .map()
- Create a
regioncolumn by mapping: USA→‘Middle East’, UK→‘Gulf’, Germany→‘Gulf’ - Map first 4 letters of sector to create
sector_short(use lambda) - Map profit values to categories: Negative→‘Loss’, 0-15000→‘Low’, 15000+→‘High’
# Your code hereExercise 2: Using .apply()
- Use
.apply()to create arevenue_category(Small < 100k, Medium 100-150k, Large > 150k) - Use
.apply(axis=1)to create description: “{company} is a {sector} company in {country}” - Calculate sum of absolute profit values using
.apply()
# Your code hereHint
For row-wise operations accessing multiple columns, use axis=1!
Exercise 3: Melt and Pivot
Create this dataset:
monthly_sales = pd.DataFrame({
'product': ['Laptop', 'Phone', 'Tablet'],
'Jan': [100, 250, 80],
'Feb': [120, 240, 85],
'Mar': [110, 260, 90]
})Tasks:
- Melt to long format with columns: product, month, sales
- Calculate total sales per product from long format
- Pivot back to wide format
# Your code hereSummary
You now master advanced transformation techniques. Let’s review the key concepts.
Key Concepts
.map() for Value Replacement
- Replace values using dictionary
- Transform Series with function
- Cleaner than multiple conditions
.apply() at Different Levels
- Element-wise on Series
- Row-wise with
axis=1 - Column-wise with
axis=0 - Can return multiple values
.melt() for Wide → Long
- Converts columns to rows
- Better for grouping and plotting
- Standard format for analysis
.pivot() for Long → Wide
- Converts rows to columns
- Better for human readability
- Cross-tabulation format
Syntax Reference
# MAP: Replace values
df['new'] = df['col'].map({'old': 'new', ...})
df['new'] = df['col'].map(function)
# APPLY: Transform with function
df['col'].apply(func) # Element-wise on Series
df.apply(func, axis=1) # Row-wise
df.apply(func, axis=0) # Column-wise
# MELT: Wide → Long
df.melt(
id_vars=['id_col'],
var_name='variable',
value_name='value'
)
# PIVOT: Long → Wide
df.pivot(
index='row_id',
columns='col_names',
values='values'
)When to Use What
| Task | Method |
|---|---|
| Replace specific values | .map() with dict |
| Transform each element | .apply() or .map() |
| Multi-column calculation | .apply(axis=1) |
| Quarters/months as columns → rows | .melt() |
| Categories as rows → columns | .pivot() |
| Value lookup/translation | .map() |
| Complex row logic | .apply(axis=1) |
Wide vs Long Format
Wide format (good for humans):
Product | Jan | Feb | Mar
Laptop | 100 | 120 | 110
Phone | 250 | 240 | 260Long format (good for analysis):
Product | Month | Sales
Laptop | Jan | 100
Laptop | Feb | 120
Laptop | Mar | 110
Phone | Jan | 250
...Important Reminders
.map()vs.replace():.map()transforms entire Series,.replace()replaces specific valuesaxis=1for rows: When using.apply()on rows- Long format advantages: Better for groupby, plotting, statistical analysis
- Wide format advantages: Easier to read, better for reports
- Reset index after pivot: Use
.reset_index()for clean output .melt()and.pivot()are opposites: They reverse each other
Next Steps
You now master advanced transformations for complex data manipulation. In the next lesson, you will learn to handle missing data—detecting, removing, and filling missing values in real-world datasets.
Continue to Lesson 11 - Handling Missing Data
Learn to detect, remove, and fill missing values
Back to Lesson 9 - Adding and Modifying Columns
Review creating and transforming columns
Master Advanced Transformations
These transformation techniques handle complex real-world scenarios—value recoding, multi-column calculations, and data restructuring. Combined with your filtering and selection skills, you can now perform sophisticated data manipulations.
Use these tools to transform data into the exact format your analysis requires!