Lesson 5 - Series Operations and Value Counts
Analyzing Data with Series Operations
You can select data from DataFrames. Now you will learn to perform operations on Series—calculating new values, counting categories, and extracting statistics.
By the end of this lesson, you will be able to:
- Perform arithmetic operations on Series
- Create calculated columns from existing data
- Count unique values with
.value_counts() - Find top categories and most frequent values
- Calculate statistics on Series data
- Use
.idxmax()and.idxmin()to find positions
Series operations are vectorized like NumPy, making data analysis fast and efficient. Let’s master these essential skills.
Series Arithmetic Operations
Series support vectorized arithmetic—operations apply to all elements automatically, just like NumPy arrays.
Let’s create our sample dataset:
import pandas as pd
import numpy as np
# Create companies dataset
companies = pd.DataFrame({
'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
'revenues': [125000, 89000, 156000, 234000, 178000,
98000, 112000, 187000, 145000, 165000],
'profits': [12000, 8500, -3000, 45000, 23000,
15000, 9800, 21000, 18000, 28000],
'employees': [1200, 890, 2300, 5600, 3400,
2100, 4500, 8900, 3200, 6700],
'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
'USA', 'UK', 'Germany', 'USA', 'UK']
})
companies = companies.set_index('company')
print(f"Shape: {companies.shape}")
companies.head()Basic Arithmetic with Scalars
# Extract revenues as a Series
revenues = companies['revenues']
print("Revenues Series:")
print(revenues)Output:
company
TechCorp Global 125000
FreshMart Inc 89000
AutoDrive Motors 156000
FirstBank Holdings 234000
PowerGen Energy 178000
...
Name: revenues, dtype: int64Now convert to thousands:
# Convert to thousands (divide all values)
revenues_thousands = revenues / 1000
print("Revenues in thousands:")
print(revenues_thousands)Output:
company
TechCorp Global 125.0
FreshMart Inc 89.0
AutoDrive Motors 156.0
FirstBank Holdings 234.0
PowerGen Energy 178.0
...
Name: revenues, dtype: float64The division applies to every element automatically!
Arithmetic Between Series
Calculate profit margin for each company:
# Profit Margin = (Profits / Revenues) * 100
profit_margin = (companies['profits'] / companies['revenues']) * 100
print("Profit Margin (%) for each company:")
print(profit_margin.round(2))Output:
company
TechCorp Global 9.60
FreshMart Inc 9.55
AutoDrive Motors -1.92
FirstBank Holdings 19.23
PowerGen Energy 12.92
MediPharm Solutions 15.31
RetailHub Ltd 8.75
SkyWings Airlines 11.23
SteelCore Industries 12.41
NetLink Telecom 16.97
dtype: float64Element-wise division between two Series, then multiply by 100.
Creating New Columns
Add the calculated profit margin as a new column:
# Add profit margin as a new column
companies['profit_margin'] = profit_margin
print("DataFrame with new column:")
companies.head()Output:
sector revenues profits employees country profit_margin
company
TechCorp Global Technology 125000 12000 1200 USA 9.60
FreshMart Inc Food 89000 8500 890 USA 9.55
AutoDrive Motors Automotive 156000 -3000 2300 USA -1.92
FirstBank Holdings Financials 234000 45000 5600 UK 19.23
PowerGen Energy Energy 178000 23000 3400 Germany 12.92Another example:
# Revenue per employee
companies['revenue_per_employee'] = companies['revenues'] / companies['employees']
print("Revenue per employee:")
print(companies['revenue_per_employee'].round(2))Output:
company
TechCorp Global 104.17
FreshMart Inc 100.00
AutoDrive Motors 67.83
FirstBank Holdings 41.79
PowerGen Energy 52.35
...
dtype: float64Key Point: Operations vectorize automatically, just like NumPy!
Counting Values with .value_counts()
The .value_counts() method counts how many times each unique value appears. This is essential for analyzing categorical data.
Basic Value Counts
# Count companies by country
country_counts = companies['country'].value_counts()
print("Companies per country:")
print(country_counts)
print(f"\nType: {type(country_counts)}")Output:
Companies per country:
USA 5
UK 3
Germany 2
Name: country, dtype: int64
Type: <class 'pandas.core.series.Series'>Result is a Series with counts for each unique value, sorted by count (descending).
Counting by Sector
# Count companies by sector
sector_counts = companies['sector'].value_counts()
print("Companies per sector:")
print(sector_counts)Output:
Technology 2
Automotive 1
Energy 1
Financials 1
Food 1
Healthcare 1
Materials 1
Retail 1
Transportation 1
Name: sector, dtype: int64Technology has 2 companies, all others have 1.
Sorting Alphabetically
# Sort alphabetically instead of by count
sector_counts_sorted = companies['sector'].value_counts().sort_index()
print("Companies per sector (alphabetically):")
print(sector_counts_sorted)Output:
Automotive 1
Energy 1
Financials 1
Food 1
Healthcare 1
Materials 1
Retail 1
Technology 2
Transportation 1
Name: sector, dtype: int64Use .sort_index() to sort by the category names instead of counts.
Getting Proportions
# Get proportions instead of counts
country_proportions = companies['country'].value_counts(normalize=True)
print("Country proportions:")
print(country_proportions)
print("\nAs percentages:")
print((country_proportions * 100).round(1))Output:
Country proportions:
USA 0.5
UK 0.3
Germany 0.2
Name: country, dtype: float64
As percentages:
USA 50.0
UK 30.0
Germany 20.0
Name: country, dtype: float64The normalize=True parameter gives proportions that sum to 1.0.
Counting Unique Values
# How many unique values?
print(f"Number of unique countries: {companies['country'].nunique()}")
print(f"Number of unique sectors: {companies['sector'].nunique()}")
# What are the unique values?
print(f"\nUnique countries: {companies['country'].unique()}")
print(f"Unique sectors: {companies['sector'].unique()}")Output:
Number of unique countries: 3
Number of unique sectors: 9
Unique countries: ['USA' 'UK' 'Germany']
Unique sectors: ['Technology' 'Food' 'Automotive' 'Financials' 'Energy'
'Healthcare' 'Retail' 'Transportation' 'Materials']Methods:
.nunique()- count unique values.unique()- get array of unique values
Series Statistical Methods
Series provide the same statistical methods as NumPy arrays:
Basic Statistics
# Statistics on revenues
print("Revenue Statistics:")
print(f"Mean: ${companies['revenues'].mean():,.0f}")
print(f"Median: ${companies['revenues'].median():,.0f}")
print(f"Min: ${companies['revenues'].min():,.0f}")
print(f"Max: ${companies['revenues'].max():,.0f}")
print(f"Std Dev: ${companies['revenues'].std():,.0f}")Output:
Revenue Statistics:
Mean: $148,900
Median: $150,500
Min: $89,000
Max: $234,000
Std Dev: $41,793Summary Statistics
# Get all statistics at once
print("Revenue Statistics Summary:")
print(companies['revenues'].describe())Output:
count 10.000000
mean 148900.000000
std 44256.826645
min 89000.000000
25% 115500.000000
50% 150500.000000
75% 175250.000000
max 234000.000000
Name: revenues, dtype: float64The .describe() method provides count, mean, std, min, quartiles, and max.
Sum and Average
# Total and average profits
total_profits = companies['profits'].sum()
avg_profit = companies['profits'].mean()
print(f"Total profits across all companies: ${total_profits:,}")
print(f"Average profit per company: ${avg_profit:,.0f}")Output:
Total profits across all companies: $176,300
Average profit per company: $17,630Finding Maximum and Minimum Labels
# Which company has maximum/minimum?
max_revenue_company = companies['revenues'].idxmax()
min_profit_company = companies['profits'].idxmin()
print(f"Highest revenue: {max_revenue_company} (${companies.loc[max_revenue_company, 'revenues']:,})")
print(f"Lowest profit: {min_profit_company} (${companies.loc[min_profit_company, 'profits']:,})")Output:
Highest revenue: FirstBank Holdings ($234,000)
Lowest profit: AutoDrive Motors ($-3,000)Methods:
.idxmax()- returns the label (index) of maximum value.idxmin()- returns the label (index) of minimum value
These are different from .max() and .min() which return the values themselves.
Statistics for Multiple Columns
# Statistics for all numeric columns
print("Statistics for all numeric columns:")
print(companies.describe())This gives summary statistics for every numeric column in the DataFrame.
Series Methods Reference
Complete reference for common Series operations:
Arithmetic Operations
# Scalar operations (apply to all elements)
series + 10 # Add 10 to all values
series - 5 # Subtract 5 from all values
series * 100 # Multiply all values by 100
series / 1000 # Divide all values by 1000
series ** 2 # Square all values
# Operations between Series (element-wise)
df['col1'] + df['col2'] # Add two columns
df['col1'] / df['col2'] # Divide one column by another
df['col1'] * 100 # Multiply column by scalarCounting Methods
# Counting unique values
.value_counts() # Count occurrences of each value
.value_counts(normalize=True) # Get proportions (sum to 1.0)
.value_counts().sort_index() # Sort alphabetically
.nunique() # Count unique values
.unique() # Get array of unique values
.count() # Count non-null valuesStatistical Methods
# Central tendency
.mean() # Average
.median() # Middle value
.mode() # Most common value
# Spread
.std() # Standard deviation
.var() # Variance
.min() # Minimum value
.max() # Maximum value
# Totals
.sum() # Sum of all values
.cumsum() # Cumulative sum
# Position of extremes
.idxmax() # Label of maximum value
.idxmin() # Label of minimum value
# Summary
.describe() # All statistics at onceComparison to NumPy
All these methods work exactly like NumPy:
| Operation | NumPy | Pandas Series |
|---|---|---|
| Mean | arr.mean() | series.mean() |
| Sum | arr.sum() | series.sum() |
| Max | arr.max() | series.max() |
| Std Dev | arr.std() | series.std() |
| Arithmetic | arr / 1000 | series / 1000 |
Your NumPy skills transfer directly!
Practice Exercises
Apply Series operations with these exercises.
Exercise 1: Arithmetic Operations
- Create a new column
profit_per_employee(profits divided by employees) - Convert all revenues to thousands
- Calculate what a 10% increase in profits would be for each company
# Your code hereExercise 2: Value Counts
- Count how many companies are in each sector
- What percentage of companies are from USA?
- How many unique countries are in the dataset?
# Your code hereHint
For percentages, use value_counts(normalize=True) and multiply by 100!
Exercise 3: Statistics
- Find the average number of employees across all companies
- Find the total revenues across all companies
- Which company has the most employees?
- What is the median profit?
# Your code hereExercise 4: Combined Operations
- Create a column for revenue in billions (divide by 1,000,000)
- Find which sector has the highest average revenue
- Calculate the total profit for Iranian companies only
# Your code hereSummary
You now master Series operations for data analysis. Let’s review the key concepts.
Key Concepts
Vectorized Operations
- Operations apply to all elements automatically
- No loops needed—fast and efficient
- Works just like NumPy arrays
Value Counts
- Essential for categorical data analysis
- Counts occurrences of each unique value
- Can get proportions with
normalize=True
Statistical Methods
- Calculate mean, median, sum, min, max, std
.describe()gives complete summary.idxmax()and.idxmin()find labels of extremes
Creating Columns
- Use arithmetic to calculate new columns
- Assignment syntax:
df['new_col'] = calculation - Calculations are vectorized
Syntax Reference
# Arithmetic (vectorized)
series / 1000 # Divide all values
series * 100 # Multiply all values
df['col1'] / df['col2'] # Element-wise division
(df['a'] + df['b']) / 2 # Complex calculations
# Counting
.value_counts() # Count each unique value
.value_counts(normalize=True) # Get proportions
.nunique() # Count unique values
.unique() # Get unique values
.count() # Count non-null
# Statistics
.mean() # Average
.median() # Middle value
.sum() # Total
.min() / .max() # Minimum/Maximum
.std() # Standard deviation
.describe() # All statistics
.idxmax() / .idxmin() # Label of max/min
# Creating columns
df['new_col'] = calculation # Add calculated column
df['margin'] = df['profit'] / df['revenue']NumPy Skills Transfer
Everything you learned in NumPy applies:
# NumPy # Pandas
arr.mean() → series.mean()
arr.sum() → series.sum()
arr / 1000 → series / 1000
arr[arr > 50] → series[series > 50]
arr.max() → series.max()Plus pandas adds:
.value_counts()for categorical analysis.idxmax()/.idxmin()for finding labels- Automatic label alignment
- Missing data handling
Important Reminders
- Operations are vectorized: No need for loops
- Labels are preserved: Results maintain index labels
- Create new columns: Use arithmetic and assignment
- Count categories: Use
.value_counts() - Find positions: Use
.idxmax()and.idxmin() - Get summaries: Use
.describe()for quick overview
Next Steps
You now understand Series operations and value counting. In the next lesson, you will learn to work with dates and times—essential for time-series analysis.
Continue to Lesson 6 - DateTime Fundamentals
Learn to work with dates, times, and time-series data
Back to Lesson 4 - Selecting with .iloc[]
Review position-based selection using integer indices
Master Data Analysis Operations
Series operations are the foundation of data analysis. You can now calculate new values, count categories, and extract statistics—the everyday tasks of data analysis.
These vectorized operations make pandas fast and your code concise. Use them instead of loops!