Lesson 9 - Adding and Modifying Columns

Transform Your Data

You can select, filter, and sort existing data. Now you will learn to create new columns and modify existing ones—deriving insights through calculations, categorizations, and transformations.

By the end of this lesson, you will be able to:

Create new columns from arithmetic calculations
Build conditional columns with np.where()
Apply custom functions with .apply()
Delete unwanted columns with .drop()
Rename columns for clarity
Create complex transformations

Adding and modifying columns is how you derive new insights from existing data—profit margins from revenues and profits, size categories from employee counts, and performance ratings from multiple metrics.

Creating Calculated Columns

Create new columns using arithmetic operations on existing columns.

Sample Data

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany']
})

print("Original DataFrame:")
companies.head()

Simple Calculations

# Add profit margin column (profits / revenues * 100)
companies['profit_margin'] = (companies['profits'] / companies['revenues']) * 100

print("With profit margin:")
companies[['company', 'revenues', 'profits', 'profit_margin']]

Output:

           company  revenues  profits  profit_margin
0         TechCorp Global    125000    12000       9.600000
1       FreshMart Inc     89000     8500       9.550562
2      AutoDrive Motors    156000    -3000      -1.923077
3      FirstBank Holdings    234000    45000      19.230769
...

New column created from existing columns!

# Add revenue per employee
companies['revenue_per_employee'] = companies['revenues'] / companies['employees']

print("Revenue efficiency:")
companies[['company', 'revenues', 'employees', 'revenue_per_employee']]

Multiple Columns at Once

# Create several calculated columns
companies['profit_per_employee'] = companies['profits'] / companies['employees']
companies['total_costs'] = companies['revenues'] - companies['profits']

print("Multiple new columns:")
print(companies.columns.tolist())

Output shows all columns including new ones.

Rounding for Readability

# Round to 2 decimal places
companies['profit_margin'] = companies['profit_margin'].round(2)
companies['revenue_per_employee'] = companies['revenue_per_employee'].round(2)
companies['profit_per_employee'] = companies['profit_per_employee'].round(2)

print("Rounded values:")
companies[['company', 'profit_margin', 'revenue_per_employee']]

Output:

           company  profit_margin  revenue_per_employee
0         TechCorp Global           9.60                104.17
1       FreshMart Inc           9.55                100.00
2      AutoDrive Motors          -1.92                 67.83
...

Conditional Columns with np.where()

Use np.where() to create columns based on conditions.

Syntax: np.where(condition, value_if_true, value_if_false)

Simple If/Else

# Create profitability status column
companies['status'] = np.where(
    companies['profits'] > 0,
    'Profitable',
    'Loss'
)

print("Profitability status:")
companies[['company', 'profits', 'status']]

Output:

           company  profits      status
0         TechCorp Global    12000  Profitable
1       FreshMart Inc     8500  Profitable
2      AutoDrive Motors    -3000        Loss
3      FirstBank Holdings    45000  Profitable
...

If profits > 0, status is ‘Profitable’, otherwise ‘Loss’.

Another Example

# Create size category based on employees
companies['size'] = np.where(
    companies['employees'] >= 5000,
    'Large',
    'Small/Medium'
)

print("Company size:")
companies[['company', 'employees', 'size']]

Nested np.where() for Multiple Categories

For 3+ categories, nest np.where() calls:

# Revenue category (3 levels)
companies['revenue_category'] = np.where(
    companies['revenues'] >= 150000,
    'High',
    np.where(
        companies['revenues'] >= 100000,
        'Medium',
        'Low'
    )
)

print("Revenue categories:")
companies[['company', 'revenues', 'revenue_category']]

Output:

           company  revenues revenue_category
0         TechCorp Global    125000           Medium
1       FreshMart Inc     89000              Low
2      AutoDrive Motors    156000             High
3      FirstBank Holdings    234000             High
...

Logic:

If revenues >= 150000 → ‘High’
Else if revenues >= 100000 → ‘Medium’
Else → ‘Low’

Four-Level Example

# Performance rating based on profit margin
companies['performance'] = np.where(
    companies['profit_margin'] >= 15,
    'Excellent',
    np.where(
        companies['profit_margin'] >= 8,
        'Good',
        np.where(
            companies['profit_margin'] >= 0,
            'Fair',
            'Poor'
        )
    )
)

print("Performance rating:")
companies[['company', 'profit_margin', 'performance']]

Output:

           company  profit_margin performance
0         TechCorp Global           9.60        Good
1       FreshMart Inc           9.55        Good
2      AutoDrive Motors          -1.92        Poor
3      FirstBank Holdings          19.23   Excellent
...

Using .apply() for Complex Transformations

For complex logic that doesn’t fit np.where(), use .apply() with custom functions.

Custom Function on Rows

# Define custom function
def categorize_company(row):
    """Categorize company based on multiple criteria"""
    if row['profits'] < 0:
        return 'Struggling'
    elif row['profit_margin'] >= 15 and row['employees'] >= 3000:
        return 'Top Performer'
    elif row['profit_margin'] >= 10:
        return 'Strong'
    else:
        return 'Average'

# Apply function to each row (axis=1)
companies['category'] = companies.apply(categorize_company, axis=1)

print("Company categories:")
companies[['company', 'profit_margin', 'employees', 'category']]

Output:

           company  profit_margin  employees       category
0         TechCorp Global           9.60       1200        Average
1       FreshMart Inc           9.55        890        Average
2      AutoDrive Motors          -1.92       2300     Struggling
3      FirstBank Holdings          19.23       5600  Top Performer
...

The function receives each row as a Series and returns a value.

Lambda Functions for Simple Transformations

# Convert company names to uppercase
companies['company_upper'] = companies['company'].apply(lambda x: x.upper())

print("Uppercase names:")
companies[['company', 'company_upper']].head()

Output:

        company company_upper
0      TechCorp Global      ALI TECH
1    FreshMart Inc    SARA FOODS
2   AutoDrive Motors   REZA MOTORS
...

Extract Information

# Extract first word from company name
companies['owner'] = companies['company'].apply(lambda x: x.split()[0])

print("Extracted owner names:")
companies[['company', 'owner']]

Output:

           company     owner
0         TechCorp Global       Ali
1       FreshMart Inc      Sara
2      AutoDrive Motors      Reza
3      FirstBank Holdings    Maryam
...

Format Values

# Format revenues with 'k' suffix
def format_revenue(value):
    """Format revenue as XXXk"""
    return f"{value/1000:.0f}k"

companies['revenue_formatted'] = companies['revenues'].apply(format_revenue)

print("Formatted revenues:")
companies[['company', 'revenues', 'revenue_formatted']]

Output:

           company  revenues revenue_formatted
0         TechCorp Global    125000              125k
1       FreshMart Inc     89000               89k
2      AutoDrive Motors    156000              156k
...

Renaming and Dropping Columns

Manage column names and remove unwanted columns.

Rename Columns

# Rename columns
companies_renamed = companies.rename(columns={
    'company': 'company_name',
    'revenues': 'total_revenue',
    'profits': 'net_profit'
})

print("Renamed columns:")
print(companies_renamed.columns.tolist())

Output:

['company_name', 'sector', 'total_revenue', 'net_profit', 'employees', 'country', ...]

Drop Single Column

# Drop one column
companies_clean = companies.drop('company_upper', axis=1)

print("After dropping company_upper:")
print(companies_clean.columns.tolist())

Drop Multiple Columns

# Drop several columns
companies_minimal = companies.drop(
    ['company_upper', 'revenue_formatted', 'owner'],
    axis=1
)

print("After dropping multiple columns:")
print(companies_minimal.columns.tolist())

Drop with inplace

# Drop columns inplace (modify original)
companies_copy = companies.copy()
companies_copy.drop(['total_costs', 'profit_per_employee'], axis=1, inplace=True)

print("After inplace drop:")
print(companies_copy.columns.tolist())

Select Specific Columns (Alternative to Drop)

# Keep only specific columns
companies_essential = companies[[
    'company', 'sector', 'country',
    'revenues', 'profits', 'employees',
    'profit_margin'
]]

print("Essential columns only:")
companies_essential.head()

Practice Exercises

Apply column operations with these exercises.

Exercise 1: Calculated Columns

Create these new columns:

cost_ratio = (revenues - profits) / revenues
efficiency_score = profits / employees (round to 2 decimals)
revenue_millions = revenues / 1,000,000 (round to 3 decimals)

# Your code here

Exercise 2: Conditional Columns

Using np.where(), create:

profit_status = ‘High’ if profits >= 20000, ‘Medium’ if >= 10000, else ‘Low’
alert = ‘Check!’ if profit_margin < 5, else ‘OK’

# Your code here

Hint

For 3 categories, nest one np.where() inside another!

Exercise 3: Using .apply()

Create company_code = first 3 letters of company name in uppercase
Create grade using a custom function:
- ‘A’ if profit_margin >= 15
- ‘B’ if profit_margin >= 10
- ‘C’ if profit_margin >= 5
- ‘F’ otherwise

# Your code here

Exercise 4: Challenge - Complete Transformation

Starting with original data:

Add profit_margin and revenue_per_employee columns
Add performance column: ‘Top’ if both profit_margin >= 10 AND revenue_per_employee >= 100, else ‘Standard’
Rename company to company_name
Show only companies with ‘Top’ performance

# Your code here

Summary

You now master adding and modifying columns. Let’s review the key concepts.

Key Concepts

Calculated Columns

Create from arithmetic on existing columns
Assignment: df['new'] = calculation
Round for readability

Conditional Columns

Use np.where() for if/else logic
Nest for 3+ categories
Simpler than loops

Custom Transformations

Use .apply() for complex logic
Functions receive row/column as input
Lambda for simple operations

Column Management

Rename with .rename()
Drop with .drop(axis=1)
Select subset of columns

Syntax Reference

# Simple calculation
df['new_col'] = df['col1'] / df['col2']

# Conditional (2 options)
df['category'] = np.where(df['value'] > 100, 'High', 'Low')

# Nested conditions (3+ options)
df['tier'] = np.where(
    df['score'] >= 90, 'A',
    np.where(df['score'] >= 80, 'B', 'C')
)

# Custom function (complex logic)
def categorize(row):
    if row['x'] > 10 and row['y'] < 5:
        return 'Type A'
    else:
        return 'Type B'

df['type'] = df.apply(categorize, axis=1)

# Lambda (simple transformation)
df['upper'] = df['name'].apply(lambda x: x.upper())

# Rename columns
df = df.rename(columns={'old': 'new', 'old2': 'new2'})

# Drop columns
df = df.drop(['col1', 'col2'], axis=1)
df.drop('col', axis=1, inplace=True)  # Modify original

# Round values
df['col'] = df['col'].round(2)

When to Use What

Task	Method
Simple calculation	`df['new'] = df['a'] + df['b']`
If/else (2 options)	`np.where()`
If/elif/else (3+ options)	Nested `np.where()`
Complex logic	`.apply()` with function
String transformation	`.apply()` with lambda
Multiple criteria	Custom function with `.apply()`

Important Reminders

Create new columns: Use assignment df['new_col'] = ...
Vectorized operations: No loops needed
np.where() nesting: Inner np.where() is the else clause
axis=1 for rows: When using .apply() on rows
Round for display: Use .round(decimals)
Drop axis=1: For columns (axis=0 for rows)
Rename returns new: Unless inplace=True

Next Steps

You can now create and transform columns to derive insights. In the next lesson, you will learn advanced transformation techniques with .apply(), .map(), and .melt()—powerful tools for complex data reshaping.

Continue to Lesson 10 - Apply, Map, and Transform

Learn advanced transformation and reshaping techniques

Back to Lesson 8 - Sorting and Ranking

Review ordering data and finding top values

Master Data Transformation

Adding and modifying columns is how you create insights from raw data. Profit margins, efficiency ratios, performance categories—all derived from combining and transforming existing columns.

Use these techniques to turn data into insights!

Lesson 8 - Sorting and Ranking

Lesson 10 - Apply, Map, and Transform Functions

Courses

DATATWEETS

Title here

Lesson 9 - Adding and Modifying Columns

Transform Your Data

Creating Calculated Columns

Sample Data

Simple Calculations

Multiple Columns at Once

Rounding for Readability

Conditional Columns with np.where()

Simple If/Else

Another Example

Nested np.where() for Multiple Categories

Four-Level Example

Using .apply() for Complex Transformations

Custom Function on Rows

Lambda Functions for Simple Transformations

Extract Information

Format Values

Renaming and Dropping Columns

Rename Columns

Drop Single Column

Drop Multiple Columns

Drop with inplace

Select Specific Columns (Alternative to Drop)

Practice Exercises

Exercise 1: Calculated Columns

Exercise 2: Conditional Columns

Exercise 3: Using .apply()

Exercise 4: Challenge - Complete Transformation

Summary

Key Concepts

Syntax Reference

When to Use What

Important Reminders

Next Steps

Continue to Lesson 10 - Apply, Map, and Transform

Back to Lesson 8 - Sorting and Ranking

Master Data Transformation

Lesson 9 - Adding and Modifying Columns

Transform Your Data#

Creating Calculated Columns#

Sample Data#

Simple Calculations#

Multiple Columns at Once#

Rounding for Readability#

Conditional Columns with np.where()#

Simple If/Else#

Another Example#

Nested np.where() for Multiple Categories#

Four-Level Example#

Using .apply() for Complex Transformations#

Custom Function on Rows#

Lambda Functions for Simple Transformations#

Extract Information#

Format Values#

Renaming and Dropping Columns#

Rename Columns#

Drop Single Column#

Drop Multiple Columns#

Drop with inplace#

Select Specific Columns (Alternative to Drop)#

Practice Exercises#

Exercise 1: Calculated Columns#

Exercise 2: Conditional Columns#

Exercise 3: Using .apply()#

Exercise 4: Challenge - Complete Transformation#

Summary#

Key Concepts#

Syntax Reference#

When to Use What#

Important Reminders#

Next Steps#

Continue to Lesson 10 - Apply, Map, and Transform

Back to Lesson 8 - Sorting and Ranking

Master Data Transformation#

Transform Your Data

Creating Calculated Columns

Sample Data

Simple Calculations

Multiple Columns at Once

Rounding for Readability

Conditional Columns with np.where()

Simple If/Else

Another Example

Nested np.where() for Multiple Categories

Four-Level Example

Using .apply() for Complex Transformations

Custom Function on Rows

Lambda Functions for Simple Transformations

Extract Information

Format Values

Renaming and Dropping Columns

Rename Columns

Drop Single Column

Drop Multiple Columns

Drop with inplace

Select Specific Columns (Alternative to Drop)

Practice Exercises

Exercise 1: Calculated Columns

Exercise 2: Conditional Columns

Exercise 3: Using .apply()

Exercise 4: Challenge - Complete Transformation

Summary

Key Concepts

Syntax Reference

When to Use What

Important Reminders

Next Steps

Master Data Transformation