Lesson 10 - Apply, Map, and Transform Functions

Advanced Transformation Techniques

You learned basic column operations. Now you will master advanced transformation techniques—replacing values with .map(), applying functions at different levels with .apply(), and reshaping data between wide and long formats.

By the end of this lesson, you will be able to:

  • Replace values using .map() with dictionaries
  • Apply functions at element, row, and column levels
  • Reshape wide data to long format with .melt()
  • Reshape long data to wide format with .pivot()
  • Understand when to use each transformation method

These techniques handle complex transformations that simple arithmetic cannot—recoding values, multi-column logic, and restructuring data for different analysis needs.


Using .map() to Replace Values

The .map() method replaces values in a Series based on a dictionary or function.

Sample Data

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation'],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000]
})

print("Companies dataset:")
companies.head()

Map with Dictionary

# Map country codes using dictionary
country_codes = {
    'USA': 'IR',
    'UK': 'AE',
    'Germany': 'QA'
}

companies['country_code'] = companies['country'].map(country_codes)

print("With country codes:")
companies[['company', 'country', 'country_code']]

Output:

           company country country_code
0         TechCorp Global    USA           IR
1       FreshMart Inc    USA           IR
2      AutoDrive Motors    USA           IR
3      FirstBank Holdings     UK           AE
4     PowerGen Energy   Germany           QA
...

Each value in ‘country’ is replaced by its corresponding code.

Another Mapping Example

# Map sector to industry category
industry_category = {
    'Technology': 'Tech & Telecom',
    'Food': 'Consumer Goods',
    'Automotive': 'Industrials',
    'Financials': 'Financial Services',
    'Energy': 'Energy & Utilities',
    'Healthcare': 'Healthcare',
    'Retail': 'Consumer Goods',
    'Transportation': 'Industrials'
}

companies['industry'] = companies['sector'].map(industry_category)

print("With industry categories:")
companies[['company', 'sector', 'industry']].head()

Output:

        company       sector           industry
0      TechCorp Global   Technology    Tech & Telecom
1    FreshMart Inc         Food    Consumer Goods
2   AutoDrive Motors   Automotive        Industrials
3   FirstBank Holdings   Financials  Financial Services
4 PowerGen Energy       Energy    Energy & Utilities

Map with Function

# Map using a function
def categorize_revenue(revenue):
    if revenue >= 150000:
        return 'Large'
    elif revenue >= 100000:
        return 'Medium'
    else:
        return 'Small'

companies['size_category'] = companies['revenues'].map(categorize_revenue)

print("Revenue-based size:")
companies[['company', 'revenues', 'size_category']]

Output:

           company  revenues size_category
0         TechCorp Global    125000        Medium
1       FreshMart Inc     89000         Small
2      AutoDrive Motors    156000         Large
3      FirstBank Holdings    234000         Large
...

.map() vs .replace()

# .replace() works on entire DataFrame or specific values
companies_replaced = companies.replace({
    'USA': 'Islamic Republic of USA',
    'UK': 'United Arab Emirates'
})

print("Using .replace():")
print(companies_replaced[['company', 'country']].head())

.replace() finds and replaces values anywhere in the DataFrame, while .map() transforms an entire Series.


Advanced .apply() Usage

The .apply() method is versatile—it works at different levels for different transformations.

Apply on Series (Element-wise)

# Format revenue values
companies['revenue_k'] = companies['revenues'].apply(lambda x: f"{x/1000:.0f}K")

print("Formatted revenues:")
companies[['company', 'revenues', 'revenue_k']].head()

Output:

        company  revenues revenue_k
0      TechCorp Global    125000      125K
1    FreshMart Inc     89000       89K
2   AutoDrive Motors    156000      156K
...

Apply on DataFrame Rows (axis=1)

Apply a function to each row, accessing multiple columns:

# Calculate multiple metrics at once
def calculate_metrics(row):
    """Calculate metrics from row data"""
    margin = (row['profits'] / row['revenues']) * 100
    status = 'Profitable' if row['profits'] > 0 else 'Loss'
    return pd.Series({
        'profit_margin': round(margin, 2),
        'status': status
    })

# Apply returns multiple columns!
metrics = companies.apply(calculate_metrics, axis=1)

print("Calculated metrics:")
print(metrics.head())

Output:

   profit_margin      status
0           9.60  Profitable
1           9.55  Profitable
2          -1.92        Loss
3          19.23  Profitable
4          12.92  Profitable

Combine with original DataFrame:

# Add the new columns
companies_extended = pd.concat([companies, metrics], axis=1)

print("Extended DataFrame:")
companies_extended[['company', 'revenues', 'profits', 'profit_margin', 'status']].head()

Apply on DataFrame Columns (axis=0)

Aggregate entire columns:

# Calculate range (max - min) for each numeric column
numeric_cols = companies[['revenues', 'profits']]
column_stats = numeric_cols.apply(lambda col: col.max() - col.min())

print("Range (max - min) for each column:")
print(column_stats)

Output:

revenues    145000
profits      48000
dtype: int64

Modern Element-wise Operations

For applying functions to every element in a DataFrame:

# Example: double every value
sample_data = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Modern way: use .map() on DataFrame
doubled = sample_data.map(lambda x: x * 2)

print("Every element doubled:")
print(doubled)

Output:

   A   B
0  2   8
1  4  10
2  6  12

Reshaping with .melt()

The .melt() function transforms wide format to long format—converting columns to rows.

Wide Format Example

# Wide format: Sales data with quarters as columns
sales_wide = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors'],
    'Q1': [30000, 25000, 40000],
    'Q2': [32000, 26000, 38000],
    'Q3': [35000, 27000, 42000],
    'Q4': [38000, 28000, 45000]
})

print("Wide format (each quarter is a column):")
print(sales_wide)

Output:

       company     Q1     Q2     Q3     Q4
0     TechCorp Global  30000  32000  35000  38000
1   FreshMart Inc  25000  26000  27000  28000
2  AutoDrive Motors  40000  38000  42000  45000

One row per company, quarters as columns.

Melt to Long Format

# Melt to long format
sales_long = sales_wide.melt(
    id_vars=['company'],           # Column(s) to keep as identifier
    var_name='quarter',             # Name for the variable column
    value_name='sales'              # Name for the values column
)

print("Long format (each row is one observation):")
print(sales_long)

Output:

       company quarter  sales
0     TechCorp Global      Q1  30000
1   FreshMart Inc      Q1  25000
2  AutoDrive Motors      Q1  40000
3     TechCorp Global      Q2  32000
4   FreshMart Inc      Q2  26000
5  AutoDrive Motors      Q2  38000
6     TechCorp Global      Q3  35000
...

Now each row represents one company-quarter observation.

Why Use Long Format?

# 1. Easier to group and aggregate
avg_by_quarter = sales_long.groupby('quarter')['sales'].mean()
print("Average sales by quarter:")
print(avg_by_quarter)

Output:

quarter
Q1    31666.666667
Q2    32000.000000
Q3    34666.666667
Q4    37000.000000
Name: sales, dtype: float64

Long format works better for groupby, plotting, and many analysis tools.

Melting Multiple Metrics

# Wide format with multiple metrics
financials = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc'],
    'revenue_2023': [100000, 80000],
    'revenue_2024': [125000, 89000],
    'profit_2023': [10000, 7000],
    'profit_2024': [12000, 8500]
})

print("Wide format with multiple metrics:")
print(financials)

Melt to long format:

# Melt
financials_long = financials.melt(
    id_vars=['company'],
    var_name='metric_year',
    value_name='amount'
)

print("Long format:")
print(financials_long)

Split into separate columns:

# Split metric and year
financials_long[['metric', 'year']] = financials_long['metric_year'].str.split('_', expand=True)
financials_long = financials_long.drop('metric_year', axis=1)
financials_long = financials_long[['company', 'year', 'metric', 'amount']]

print("Cleaned long format:")
print(financials_long)

Output:

       company    year   metric  amount
0     TechCorp Global    2023  revenue  100000
1   FreshMart Inc    2023  revenue   80000
2     TechCorp Global    2024  revenue  125000
3   FreshMart Inc    2024  revenue   89000
4     TechCorp Global    2023   profit   10000
5   FreshMart Inc    2023   profit    7000
6     TechCorp Global    2024   profit   12000
7   FreshMart Inc    2024   profit    8500

Reshaping with .pivot()

The .pivot() function transforms long format to wide format—the opposite of melt.

Pivot Back to Wide

# Start with long format
print("Long format:")
print(sales_long.head())

# Pivot back to wide format
sales_wide_again = sales_long.pivot(
    index='company',      # Row identifier
    columns='quarter',    # What becomes column names
    values='sales'        # What fills the cells
)

print("\nBack to wide format:")
print(sales_wide_again)

Output:

quarter         Q1     Q2     Q3     Q4
company
TechCorp Global     30000  32000  35000  38000
FreshMart Inc   25000  26000  27000  28000
AutoDrive Motors  40000  38000  42000  45000

Reset index for cleaner format:

# Reset index to make company a regular column
sales_wide_clean = sales_wide_again.reset_index()

print("Wide format with reset index:")
print(sales_wide_clean)

Pivot with Multiple Levels

# Pivot financials to show metrics as columns
financials_pivoted = financials_long.pivot(
    index=['company', 'year'],
    columns='metric',
    values='amount'
)

print("Pivoted by metric:")
print(financials_pivoted)

Output:

metric             profit  revenue
company    year
TechCorp Global   2023     10000   100000
           2024     12000   125000
FreshMart Inc 2023      7000    80000
           2024      8500    89000

Metrics are now columns!

Alternative Pivot Structure

# Pivot to show years as columns
financials_by_year = financials_long.pivot(
    index=['company', 'metric'],
    columns='year',
    values='amount'
)

print("Pivoted by year:")
print(financials_by_year)

Output:

year                2023    2024
company    metric
TechCorp Global   profit  10000   12000
           revenue 100000  125000
FreshMart Inc profit   7000    8500
           revenue  80000   89000

Practice Exercises

Apply advanced transformations with these exercises.

Exercise 1: Using .map()

  1. Create a region column by mapping: USA→‘Middle East’, UK→‘Gulf’, Germany→‘Gulf’
  2. Map first 4 letters of sector to create sector_short (use lambda)
  3. Map profit values to categories: Negative→‘Loss’, 0-15000→‘Low’, 15000+→‘High’
# Your code here

Exercise 2: Using .apply()

  1. Use .apply() to create a revenue_category (Small < 100k, Medium 100-150k, Large > 150k)
  2. Use .apply(axis=1) to create description: “{company} is a {sector} company in {country}”
  3. Calculate sum of absolute profit values using .apply()
# Your code here

Hint

For row-wise operations accessing multiple columns, use axis=1!

Exercise 3: Melt and Pivot

Create this dataset:

monthly_sales = pd.DataFrame({
    'product': ['Laptop', 'Phone', 'Tablet'],
    'Jan': [100, 250, 80],
    'Feb': [120, 240, 85],
    'Mar': [110, 260, 90]
})

Tasks:

  1. Melt to long format with columns: product, month, sales
  2. Calculate total sales per product from long format
  3. Pivot back to wide format
# Your code here

Summary

You now master advanced transformation techniques. Let’s review the key concepts.

Key Concepts

.map() for Value Replacement

  • Replace values using dictionary
  • Transform Series with function
  • Cleaner than multiple conditions

.apply() at Different Levels

  • Element-wise on Series
  • Row-wise with axis=1
  • Column-wise with axis=0
  • Can return multiple values

.melt() for Wide → Long

  • Converts columns to rows
  • Better for grouping and plotting
  • Standard format for analysis

.pivot() for Long → Wide

  • Converts rows to columns
  • Better for human readability
  • Cross-tabulation format

Syntax Reference

# MAP: Replace values
df['new'] = df['col'].map({'old': 'new', ...})
df['new'] = df['col'].map(function)

# APPLY: Transform with function
df['col'].apply(func)              # Element-wise on Series
df.apply(func, axis=1)             # Row-wise
df.apply(func, axis=0)             # Column-wise

# MELT: Wide → Long
df.melt(
    id_vars=['id_col'],
    var_name='variable',
    value_name='value'
)

# PIVOT: Long → Wide
df.pivot(
    index='row_id',
    columns='col_names',
    values='values'
)

When to Use What

TaskMethod
Replace specific values.map() with dict
Transform each element.apply() or .map()
Multi-column calculation.apply(axis=1)
Quarters/months as columns → rows.melt()
Categories as rows → columns.pivot()
Value lookup/translation.map()
Complex row logic.apply(axis=1)

Wide vs Long Format

Wide format (good for humans):

  Product | Jan | Feb | Mar
  Laptop  | 100 | 120 | 110
  Phone   | 250 | 240 | 260

Long format (good for analysis):

  Product | Month | Sales
  Laptop  | Jan   | 100
  Laptop  | Feb   | 120
  Laptop  | Mar   | 110
  Phone   | Jan   | 250
  ...

Important Reminders

  • .map() vs .replace(): .map() transforms entire Series, .replace() replaces specific values
  • axis=1 for rows: When using .apply() on rows
  • Long format advantages: Better for groupby, plotting, statistical analysis
  • Wide format advantages: Easier to read, better for reports
  • Reset index after pivot: Use .reset_index() for clean output
  • .melt() and .pivot() are opposites: They reverse each other

Next Steps

You now master advanced transformations for complex data manipulation. In the next lesson, you will learn to handle missing data—detecting, removing, and filling missing values in real-world datasets.

Continue to Lesson 11 - Handling Missing Data

Learn to detect, remove, and fill missing values

Back to Lesson 9 - Adding and Modifying Columns

Review creating and transforming columns


Master Advanced Transformations

These transformation techniques handle complex real-world scenarios—value recoding, multi-column calculations, and data restructuring. Combined with your filtering and selection skills, you can now perform sophisticated data manipulations.

Use these tools to transform data into the exact format your analysis requires!