Lesson 9 - Adding and Modifying Columns
Transform Your Data
You can select, filter, and sort existing data. Now you will learn to create new columns and modify existing ones—deriving insights through calculations, categorizations, and transformations.
By the end of this lesson, you will be able to:
- Create new columns from arithmetic calculations
- Build conditional columns with
np.where() - Apply custom functions with
.apply() - Delete unwanted columns with
.drop() - Rename columns for clarity
- Create complex transformations
Adding and modifying columns is how you derive new insights from existing data—profit margins from revenues and profits, size categories from employee counts, and performance ratings from multiple metrics.
Creating Calculated Columns
Create new columns using arithmetic operations on existing columns.
Sample Data
import pandas as pd
import numpy as np
# Create companies dataset
companies = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines'],
'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
'Healthcare', 'Retail', 'Transportation'],
'revenues': [125000, 89000, 156000, 234000, 178000,
98000, 112000, 187000],
'profits': [12000, 8500, -3000, 45000, 23000,
15000, 9800, 21000],
'employees': [1200, 890, 2300, 5600, 3400,
2100, 4500, 8900],
'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
'USA', 'UK', 'Germany']
})
print("Original DataFrame:")
companies.head()Simple Calculations
# Add profit margin column (profits / revenues * 100)
companies['profit_margin'] = (companies['profits'] / companies['revenues']) * 100
print("With profit margin:")
companies[['company', 'revenues', 'profits', 'profit_margin']]Output:
company revenues profits profit_margin
0 TechCorp Global 125000 12000 9.600000
1 FreshMart Inc 89000 8500 9.550562
2 AutoDrive Motors 156000 -3000 -1.923077
3 FirstBank Holdings 234000 45000 19.230769
...New column created from existing columns!
# Add revenue per employee
companies['revenue_per_employee'] = companies['revenues'] / companies['employees']
print("Revenue efficiency:")
companies[['company', 'revenues', 'employees', 'revenue_per_employee']]Multiple Columns at Once
# Create several calculated columns
companies['profit_per_employee'] = companies['profits'] / companies['employees']
companies['total_costs'] = companies['revenues'] - companies['profits']
print("Multiple new columns:")
print(companies.columns.tolist())Output shows all columns including new ones.
Rounding for Readability
# Round to 2 decimal places
companies['profit_margin'] = companies['profit_margin'].round(2)
companies['revenue_per_employee'] = companies['revenue_per_employee'].round(2)
companies['profit_per_employee'] = companies['profit_per_employee'].round(2)
print("Rounded values:")
companies[['company', 'profit_margin', 'revenue_per_employee']]Output:
company profit_margin revenue_per_employee
0 TechCorp Global 9.60 104.17
1 FreshMart Inc 9.55 100.00
2 AutoDrive Motors -1.92 67.83
...Conditional Columns with np.where()
Use np.where() to create columns based on conditions.
Syntax: np.where(condition, value_if_true, value_if_false)
Simple If/Else
# Create profitability status column
companies['status'] = np.where(
companies['profits'] > 0,
'Profitable',
'Loss'
)
print("Profitability status:")
companies[['company', 'profits', 'status']]Output:
company profits status
0 TechCorp Global 12000 Profitable
1 FreshMart Inc 8500 Profitable
2 AutoDrive Motors -3000 Loss
3 FirstBank Holdings 45000 Profitable
...If profits > 0, status is ‘Profitable’, otherwise ‘Loss’.
Another Example
# Create size category based on employees
companies['size'] = np.where(
companies['employees'] >= 5000,
'Large',
'Small/Medium'
)
print("Company size:")
companies[['company', 'employees', 'size']]Nested np.where() for Multiple Categories
For 3+ categories, nest np.where() calls:
# Revenue category (3 levels)
companies['revenue_category'] = np.where(
companies['revenues'] >= 150000,
'High',
np.where(
companies['revenues'] >= 100000,
'Medium',
'Low'
)
)
print("Revenue categories:")
companies[['company', 'revenues', 'revenue_category']]Output:
company revenues revenue_category
0 TechCorp Global 125000 Medium
1 FreshMart Inc 89000 Low
2 AutoDrive Motors 156000 High
3 FirstBank Holdings 234000 High
...Logic:
- If revenues >= 150000 → ‘High’
- Else if revenues >= 100000 → ‘Medium’
- Else → ‘Low’
Four-Level Example
# Performance rating based on profit margin
companies['performance'] = np.where(
companies['profit_margin'] >= 15,
'Excellent',
np.where(
companies['profit_margin'] >= 8,
'Good',
np.where(
companies['profit_margin'] >= 0,
'Fair',
'Poor'
)
)
)
print("Performance rating:")
companies[['company', 'profit_margin', 'performance']]Output:
company profit_margin performance
0 TechCorp Global 9.60 Good
1 FreshMart Inc 9.55 Good
2 AutoDrive Motors -1.92 Poor
3 FirstBank Holdings 19.23 Excellent
...Using .apply() for Complex Transformations
For complex logic that doesn’t fit np.where(), use .apply() with custom functions.
Custom Function on Rows
# Define custom function
def categorize_company(row):
"""Categorize company based on multiple criteria"""
if row['profits'] < 0:
return 'Struggling'
elif row['profit_margin'] >= 15 and row['employees'] >= 3000:
return 'Top Performer'
elif row['profit_margin'] >= 10:
return 'Strong'
else:
return 'Average'
# Apply function to each row (axis=1)
companies['category'] = companies.apply(categorize_company, axis=1)
print("Company categories:")
companies[['company', 'profit_margin', 'employees', 'category']]Output:
company profit_margin employees category
0 TechCorp Global 9.60 1200 Average
1 FreshMart Inc 9.55 890 Average
2 AutoDrive Motors -1.92 2300 Struggling
3 FirstBank Holdings 19.23 5600 Top Performer
...The function receives each row as a Series and returns a value.
Lambda Functions for Simple Transformations
# Convert company names to uppercase
companies['company_upper'] = companies['company'].apply(lambda x: x.upper())
print("Uppercase names:")
companies[['company', 'company_upper']].head()Output:
company company_upper
0 TechCorp Global ALI TECH
1 FreshMart Inc SARA FOODS
2 AutoDrive Motors REZA MOTORS
...Extract Information
# Extract first word from company name
companies['owner'] = companies['company'].apply(lambda x: x.split()[0])
print("Extracted owner names:")
companies[['company', 'owner']]Output:
company owner
0 TechCorp Global Ali
1 FreshMart Inc Sara
2 AutoDrive Motors Reza
3 FirstBank Holdings Maryam
...Format Values
# Format revenues with 'k' suffix
def format_revenue(value):
"""Format revenue as XXXk"""
return f"{value/1000:.0f}k"
companies['revenue_formatted'] = companies['revenues'].apply(format_revenue)
print("Formatted revenues:")
companies[['company', 'revenues', 'revenue_formatted']]Output:
company revenues revenue_formatted
0 TechCorp Global 125000 125k
1 FreshMart Inc 89000 89k
2 AutoDrive Motors 156000 156k
...Renaming and Dropping Columns
Manage column names and remove unwanted columns.
Rename Columns
# Rename columns
companies_renamed = companies.rename(columns={
'company': 'company_name',
'revenues': 'total_revenue',
'profits': 'net_profit'
})
print("Renamed columns:")
print(companies_renamed.columns.tolist())Output:
['company_name', 'sector', 'total_revenue', 'net_profit', 'employees', 'country', ...]Drop Single Column
# Drop one column
companies_clean = companies.drop('company_upper', axis=1)
print("After dropping company_upper:")
print(companies_clean.columns.tolist())Drop Multiple Columns
# Drop several columns
companies_minimal = companies.drop(
['company_upper', 'revenue_formatted', 'owner'],
axis=1
)
print("After dropping multiple columns:")
print(companies_minimal.columns.tolist())Drop with inplace
# Drop columns inplace (modify original)
companies_copy = companies.copy()
companies_copy.drop(['total_costs', 'profit_per_employee'], axis=1, inplace=True)
print("After inplace drop:")
print(companies_copy.columns.tolist())Select Specific Columns (Alternative to Drop)
# Keep only specific columns
companies_essential = companies[[
'company', 'sector', 'country',
'revenues', 'profits', 'employees',
'profit_margin'
]]
print("Essential columns only:")
companies_essential.head()Practice Exercises
Apply column operations with these exercises.
Exercise 1: Calculated Columns
Create these new columns:
cost_ratio= (revenues - profits) / revenuesefficiency_score= profits / employees (round to 2 decimals)revenue_millions= revenues / 1,000,000 (round to 3 decimals)
# Your code hereExercise 2: Conditional Columns
Using np.where(), create:
profit_status= ‘High’ if profits >= 20000, ‘Medium’ if >= 10000, else ‘Low’alert= ‘Check!’ if profit_margin < 5, else ‘OK’
# Your code hereHint
For 3 categories, nest one np.where() inside another!
Exercise 3: Using .apply()
- Create
company_code= first 3 letters of company name in uppercase - Create
gradeusing a custom function:- ‘A’ if profit_margin >= 15
- ‘B’ if profit_margin >= 10
- ‘C’ if profit_margin >= 5
- ‘F’ otherwise
# Your code hereExercise 4: Challenge - Complete Transformation
Starting with original data:
- Add
profit_marginandrevenue_per_employeecolumns - Add
performancecolumn: ‘Top’ if both profit_margin >= 10 AND revenue_per_employee >= 100, else ‘Standard’ - Rename
companytocompany_name - Show only companies with ‘Top’ performance
# Your code hereSummary
You now master adding and modifying columns. Let’s review the key concepts.
Key Concepts
Calculated Columns
- Create from arithmetic on existing columns
- Assignment:
df['new'] = calculation - Round for readability
Conditional Columns
- Use
np.where()for if/else logic - Nest for 3+ categories
- Simpler than loops
Custom Transformations
- Use
.apply()for complex logic - Functions receive row/column as input
- Lambda for simple operations
Column Management
- Rename with
.rename() - Drop with
.drop(axis=1) - Select subset of columns
Syntax Reference
# Simple calculation
df['new_col'] = df['col1'] / df['col2']
# Conditional (2 options)
df['category'] = np.where(df['value'] > 100, 'High', 'Low')
# Nested conditions (3+ options)
df['tier'] = np.where(
df['score'] >= 90, 'A',
np.where(df['score'] >= 80, 'B', 'C')
)
# Custom function (complex logic)
def categorize(row):
if row['x'] > 10 and row['y'] < 5:
return 'Type A'
else:
return 'Type B'
df['type'] = df.apply(categorize, axis=1)
# Lambda (simple transformation)
df['upper'] = df['name'].apply(lambda x: x.upper())
# Rename columns
df = df.rename(columns={'old': 'new', 'old2': 'new2'})
# Drop columns
df = df.drop(['col1', 'col2'], axis=1)
df.drop('col', axis=1, inplace=True) # Modify original
# Round values
df['col'] = df['col'].round(2)When to Use What
| Task | Method |
|---|---|
| Simple calculation | df['new'] = df['a'] + df['b'] |
| If/else (2 options) | np.where() |
| If/elif/else (3+ options) | Nested np.where() |
| Complex logic | .apply() with function |
| String transformation | .apply() with lambda |
| Multiple criteria | Custom function with .apply() |
Important Reminders
- Create new columns: Use assignment
df['new_col'] = ... - Vectorized operations: No loops needed
- np.where() nesting: Inner
np.where()is the else clause - axis=1 for rows: When using
.apply()on rows - Round for display: Use
.round(decimals) - Drop axis=1: For columns (axis=0 for rows)
- Rename returns new: Unless
inplace=True
Next Steps
You can now create and transform columns to derive insights. In the next lesson, you will learn advanced transformation techniques with .apply(), .map(), and .melt()—powerful tools for complex data reshaping.
Continue to Lesson 10 - Apply, Map, and Transform
Learn advanced transformation and reshaping techniques
Back to Lesson 8 - Sorting and Ranking
Review ordering data and finding top values
Master Data Transformation
Adding and modifying columns is how you create insights from raw data. Profit margins, efficiency ratios, performance categories—all derived from combining and transforming existing columns.
Use these techniques to turn data into insights!