Lesson 9 - Adding and Modifying Columns

Transform Your Data

You can select, filter, and sort existing data. Now you will learn to create new columns and modify existing ones—deriving insights through calculations, categorizations, and transformations.

By the end of this lesson, you will be able to:

  • Create new columns from arithmetic calculations
  • Build conditional columns with np.where()
  • Apply custom functions with .apply()
  • Delete unwanted columns with .drop()
  • Rename columns for clarity
  • Create complex transformations

Adding and modifying columns is how you derive new insights from existing data—profit margins from revenues and profits, size categories from employee counts, and performance ratings from multiple metrics.


Creating Calculated Columns

Create new columns using arithmetic operations on existing columns.

Sample Data

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany']
})

print("Original DataFrame:")
companies.head()

Simple Calculations

# Add profit margin column (profits / revenues * 100)
companies['profit_margin'] = (companies['profits'] / companies['revenues']) * 100

print("With profit margin:")
companies[['company', 'revenues', 'profits', 'profit_margin']]

Output:

           company  revenues  profits  profit_margin
0         TechCorp Global    125000    12000       9.600000
1       FreshMart Inc     89000     8500       9.550562
2      AutoDrive Motors    156000    -3000      -1.923077
3      FirstBank Holdings    234000    45000      19.230769
...

New column created from existing columns!

# Add revenue per employee
companies['revenue_per_employee'] = companies['revenues'] / companies['employees']

print("Revenue efficiency:")
companies[['company', 'revenues', 'employees', 'revenue_per_employee']]

Multiple Columns at Once

# Create several calculated columns
companies['profit_per_employee'] = companies['profits'] / companies['employees']
companies['total_costs'] = companies['revenues'] - companies['profits']

print("Multiple new columns:")
print(companies.columns.tolist())

Output shows all columns including new ones.

Rounding for Readability

# Round to 2 decimal places
companies['profit_margin'] = companies['profit_margin'].round(2)
companies['revenue_per_employee'] = companies['revenue_per_employee'].round(2)
companies['profit_per_employee'] = companies['profit_per_employee'].round(2)

print("Rounded values:")
companies[['company', 'profit_margin', 'revenue_per_employee']]

Output:

           company  profit_margin  revenue_per_employee
0         TechCorp Global           9.60                104.17
1       FreshMart Inc           9.55                100.00
2      AutoDrive Motors          -1.92                 67.83
...

Conditional Columns with np.where()

Use np.where() to create columns based on conditions.

Syntax: np.where(condition, value_if_true, value_if_false)

Simple If/Else

# Create profitability status column
companies['status'] = np.where(
    companies['profits'] > 0,
    'Profitable',
    'Loss'
)

print("Profitability status:")
companies[['company', 'profits', 'status']]

Output:

           company  profits      status
0         TechCorp Global    12000  Profitable
1       FreshMart Inc     8500  Profitable
2      AutoDrive Motors    -3000        Loss
3      FirstBank Holdings    45000  Profitable
...

If profits > 0, status is ‘Profitable’, otherwise ‘Loss’.

Another Example

# Create size category based on employees
companies['size'] = np.where(
    companies['employees'] >= 5000,
    'Large',
    'Small/Medium'
)

print("Company size:")
companies[['company', 'employees', 'size']]

Nested np.where() for Multiple Categories

For 3+ categories, nest np.where() calls:

# Revenue category (3 levels)
companies['revenue_category'] = np.where(
    companies['revenues'] >= 150000,
    'High',
    np.where(
        companies['revenues'] >= 100000,
        'Medium',
        'Low'
    )
)

print("Revenue categories:")
companies[['company', 'revenues', 'revenue_category']]

Output:

           company  revenues revenue_category
0         TechCorp Global    125000           Medium
1       FreshMart Inc     89000              Low
2      AutoDrive Motors    156000             High
3      FirstBank Holdings    234000             High
...

Logic:

  • If revenues >= 150000 → ‘High’
  • Else if revenues >= 100000 → ‘Medium’
  • Else → ‘Low’

Four-Level Example

# Performance rating based on profit margin
companies['performance'] = np.where(
    companies['profit_margin'] >= 15,
    'Excellent',
    np.where(
        companies['profit_margin'] >= 8,
        'Good',
        np.where(
            companies['profit_margin'] >= 0,
            'Fair',
            'Poor'
        )
    )
)

print("Performance rating:")
companies[['company', 'profit_margin', 'performance']]

Output:

           company  profit_margin performance
0         TechCorp Global           9.60        Good
1       FreshMart Inc           9.55        Good
2      AutoDrive Motors          -1.92        Poor
3      FirstBank Holdings          19.23   Excellent
...

Using .apply() for Complex Transformations

For complex logic that doesn’t fit np.where(), use .apply() with custom functions.

Custom Function on Rows

# Define custom function
def categorize_company(row):
    """Categorize company based on multiple criteria"""
    if row['profits'] < 0:
        return 'Struggling'
    elif row['profit_margin'] >= 15 and row['employees'] >= 3000:
        return 'Top Performer'
    elif row['profit_margin'] >= 10:
        return 'Strong'
    else:
        return 'Average'

# Apply function to each row (axis=1)
companies['category'] = companies.apply(categorize_company, axis=1)

print("Company categories:")
companies[['company', 'profit_margin', 'employees', 'category']]

Output:

           company  profit_margin  employees       category
0         TechCorp Global           9.60       1200        Average
1       FreshMart Inc           9.55        890        Average
2      AutoDrive Motors          -1.92       2300     Struggling
3      FirstBank Holdings          19.23       5600  Top Performer
...

The function receives each row as a Series and returns a value.

Lambda Functions for Simple Transformations

# Convert company names to uppercase
companies['company_upper'] = companies['company'].apply(lambda x: x.upper())

print("Uppercase names:")
companies[['company', 'company_upper']].head()

Output:

        company company_upper
0      TechCorp Global      ALI TECH
1    FreshMart Inc    SARA FOODS
2   AutoDrive Motors   REZA MOTORS
...

Extract Information

# Extract first word from company name
companies['owner'] = companies['company'].apply(lambda x: x.split()[0])

print("Extracted owner names:")
companies[['company', 'owner']]

Output:

           company     owner
0         TechCorp Global       Ali
1       FreshMart Inc      Sara
2      AutoDrive Motors      Reza
3      FirstBank Holdings    Maryam
...

Format Values

# Format revenues with 'k' suffix
def format_revenue(value):
    """Format revenue as XXXk"""
    return f"{value/1000:.0f}k"

companies['revenue_formatted'] = companies['revenues'].apply(format_revenue)

print("Formatted revenues:")
companies[['company', 'revenues', 'revenue_formatted']]

Output:

           company  revenues revenue_formatted
0         TechCorp Global    125000              125k
1       FreshMart Inc     89000               89k
2      AutoDrive Motors    156000              156k
...

Renaming and Dropping Columns

Manage column names and remove unwanted columns.

Rename Columns

# Rename columns
companies_renamed = companies.rename(columns={
    'company': 'company_name',
    'revenues': 'total_revenue',
    'profits': 'net_profit'
})

print("Renamed columns:")
print(companies_renamed.columns.tolist())

Output:

['company_name', 'sector', 'total_revenue', 'net_profit', 'employees', 'country', ...]

Drop Single Column

# Drop one column
companies_clean = companies.drop('company_upper', axis=1)

print("After dropping company_upper:")
print(companies_clean.columns.tolist())

Drop Multiple Columns

# Drop several columns
companies_minimal = companies.drop(
    ['company_upper', 'revenue_formatted', 'owner'],
    axis=1
)

print("After dropping multiple columns:")
print(companies_minimal.columns.tolist())

Drop with inplace

# Drop columns inplace (modify original)
companies_copy = companies.copy()
companies_copy.drop(['total_costs', 'profit_per_employee'], axis=1, inplace=True)

print("After inplace drop:")
print(companies_copy.columns.tolist())

Select Specific Columns (Alternative to Drop)

# Keep only specific columns
companies_essential = companies[[
    'company', 'sector', 'country',
    'revenues', 'profits', 'employees',
    'profit_margin'
]]

print("Essential columns only:")
companies_essential.head()

Practice Exercises

Apply column operations with these exercises.

Exercise 1: Calculated Columns

Create these new columns:

  1. cost_ratio = (revenues - profits) / revenues
  2. efficiency_score = profits / employees (round to 2 decimals)
  3. revenue_millions = revenues / 1,000,000 (round to 3 decimals)
# Your code here

Exercise 2: Conditional Columns

Using np.where(), create:

  1. profit_status = ‘High’ if profits >= 20000, ‘Medium’ if >= 10000, else ‘Low’
  2. alert = ‘Check!’ if profit_margin < 5, else ‘OK’
# Your code here

Hint

For 3 categories, nest one np.where() inside another!

Exercise 3: Using .apply()

  1. Create company_code = first 3 letters of company name in uppercase
  2. Create grade using a custom function:
    • ‘A’ if profit_margin >= 15
    • ‘B’ if profit_margin >= 10
    • ‘C’ if profit_margin >= 5
    • ‘F’ otherwise
# Your code here

Exercise 4: Challenge - Complete Transformation

Starting with original data:

  1. Add profit_margin and revenue_per_employee columns
  2. Add performance column: ‘Top’ if both profit_margin >= 10 AND revenue_per_employee >= 100, else ‘Standard’
  3. Rename company to company_name
  4. Show only companies with ‘Top’ performance
# Your code here

Summary

You now master adding and modifying columns. Let’s review the key concepts.

Key Concepts

Calculated Columns

  • Create from arithmetic on existing columns
  • Assignment: df['new'] = calculation
  • Round for readability

Conditional Columns

  • Use np.where() for if/else logic
  • Nest for 3+ categories
  • Simpler than loops

Custom Transformations

  • Use .apply() for complex logic
  • Functions receive row/column as input
  • Lambda for simple operations

Column Management

  • Rename with .rename()
  • Drop with .drop(axis=1)
  • Select subset of columns

Syntax Reference

# Simple calculation
df['new_col'] = df['col1'] / df['col2']

# Conditional (2 options)
df['category'] = np.where(df['value'] > 100, 'High', 'Low')

# Nested conditions (3+ options)
df['tier'] = np.where(
    df['score'] >= 90, 'A',
    np.where(df['score'] >= 80, 'B', 'C')
)

# Custom function (complex logic)
def categorize(row):
    if row['x'] > 10 and row['y'] < 5:
        return 'Type A'
    else:
        return 'Type B'

df['type'] = df.apply(categorize, axis=1)

# Lambda (simple transformation)
df['upper'] = df['name'].apply(lambda x: x.upper())

# Rename columns
df = df.rename(columns={'old': 'new', 'old2': 'new2'})

# Drop columns
df = df.drop(['col1', 'col2'], axis=1)
df.drop('col', axis=1, inplace=True)  # Modify original

# Round values
df['col'] = df['col'].round(2)

When to Use What

TaskMethod
Simple calculationdf['new'] = df['a'] + df['b']
If/else (2 options)np.where()
If/elif/else (3+ options)Nested np.where()
Complex logic.apply() with function
String transformation.apply() with lambda
Multiple criteriaCustom function with .apply()

Important Reminders

  • Create new columns: Use assignment df['new_col'] = ...
  • Vectorized operations: No loops needed
  • np.where() nesting: Inner np.where() is the else clause
  • axis=1 for rows: When using .apply() on rows
  • Round for display: Use .round(decimals)
  • Drop axis=1: For columns (axis=0 for rows)
  • Rename returns new: Unless inplace=True

Next Steps

You can now create and transform columns to derive insights. In the next lesson, you will learn advanced transformation techniques with .apply(), .map(), and .melt()—powerful tools for complex data reshaping.

Continue to Lesson 10 - Apply, Map, and Transform

Learn advanced transformation and reshaping techniques

Back to Lesson 8 - Sorting and Ranking

Review ordering data and finding top values


Master Data Transformation

Adding and modifying columns is how you create insights from raw data. Profit margins, efficiency ratios, performance categories—all derived from combining and transforming existing columns.

Use these techniques to turn data into insights!