Lesson 5 - Series Operations and Value Counts

Analyzing Data with Series Operations

You can select data from DataFrames. Now you will learn to perform operations on Series—calculating new values, counting categories, and extracting statistics.

By the end of this lesson, you will be able to:

Perform arithmetic operations on Series
Create calculated columns from existing data
Count unique values with .value_counts()
Find top categories and most frequent values
Calculate statistics on Series data
Use .idxmax() and .idxmin() to find positions

Series operations are vectorized like NumPy, making data analysis fast and efficient. Let’s master these essential skills.

Series Arithmetic Operations

Series support vectorized arithmetic—operations apply to all elements automatically, just like NumPy arrays.

Let’s create our sample dataset:

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK']
})

companies = companies.set_index('company')

print(f"Shape: {companies.shape}")
companies.head()

Basic Arithmetic with Scalars

# Extract revenues as a Series
revenues = companies['revenues']

print("Revenues Series:")
print(revenues)

Output:

company
TechCorp Global        125000
FreshMart Inc       89000
AutoDrive Motors     156000
FirstBank Holdings     234000
PowerGen Energy   178000
...
Name: revenues, dtype: int64

Now convert to thousands:

# Convert to thousands (divide all values)
revenues_thousands = revenues / 1000

print("Revenues in thousands:")
print(revenues_thousands)

Output:

company
TechCorp Global        125.0
FreshMart Inc       89.0
AutoDrive Motors     156.0
FirstBank Holdings     234.0
PowerGen Energy   178.0
...
Name: revenues, dtype: float64

The division applies to every element automatically!

Arithmetic Between Series

Calculate profit margin for each company:

# Profit Margin = (Profits / Revenues) * 100
profit_margin = (companies['profits'] / companies['revenues']) * 100

print("Profit Margin (%) for each company:")
print(profit_margin.round(2))

Output:

company
TechCorp Global          9.60
FreshMart Inc        9.55
AutoDrive Motors      -1.92
FirstBank Holdings      19.23
PowerGen Energy    12.92
MediPharm Solutions     15.31
RetailHub Ltd        8.75
SkyWings Airlines  11.23
SteelCore Industries   12.41
NetLink Telecom    16.97
dtype: float64

Element-wise division between two Series, then multiply by 100.

Creating New Columns

Add the calculated profit margin as a new column:

# Add profit margin as a new column
companies['profit_margin'] = profit_margin

print("DataFrame with new column:")
companies.head()

Output:

              sector  revenues  profits  employees country  profit_margin
company
TechCorp Global  Technology    125000    12000       1200    USA           9.60
FreshMart Inc      Food     89000     8500        890    USA           9.55
AutoDrive Motors Automotive   156000    -3000       2300    USA          -1.92
FirstBank Holdings Financials   234000    45000       5600     UK          19.23
PowerGen Energy   Energy   178000    23000       3400   Germany          12.92

Another example:

# Revenue per employee
companies['revenue_per_employee'] = companies['revenues'] / companies['employees']

print("Revenue per employee:")
print(companies['revenue_per_employee'].round(2))

Output:

company
TechCorp Global         104.17
FreshMart Inc       100.00
AutoDrive Motors       67.83
FirstBank Holdings       41.79
PowerGen Energy     52.35
...
dtype: float64

Key Point: Operations vectorize automatically, just like NumPy!

Counting Values with .value_counts()

The .value_counts() method counts how many times each unique value appears. This is essential for analyzing categorical data.

Basic Value Counts

# Count companies by country
country_counts = companies['country'].value_counts()

print("Companies per country:")
print(country_counts)
print(f"\nType: {type(country_counts)}")

Output:

Companies per country:
USA     5
UK      3
Germany    2
Name: country, dtype: int64

Type: <class 'pandas.core.series.Series'>

Result is a Series with counts for each unique value, sorted by count (descending).

Counting by Sector

# Count companies by sector
sector_counts = companies['sector'].value_counts()

print("Companies per sector:")
print(sector_counts)

Output:

Technology      2
Automotive      1
Energy          1
Financials      1
Food            1
Healthcare      1
Materials       1
Retail          1
Transportation  1
Name: sector, dtype: int64

Technology has 2 companies, all others have 1.

Sorting Alphabetically

# Sort alphabetically instead of by count
sector_counts_sorted = companies['sector'].value_counts().sort_index()

print("Companies per sector (alphabetically):")
print(sector_counts_sorted)

Output:

Automotive      1
Energy          1
Financials      1
Food            1
Healthcare      1
Materials       1
Retail          1
Technology      2
Transportation  1
Name: sector, dtype: int64

Use .sort_index() to sort by the category names instead of counts.

Getting Proportions

# Get proportions instead of counts
country_proportions = companies['country'].value_counts(normalize=True)

print("Country proportions:")
print(country_proportions)

print("\nAs percentages:")
print((country_proportions * 100).round(1))

Output:

Country proportions:
USA     0.5
UK      0.3
Germany    0.2
Name: country, dtype: float64

As percentages:
USA     50.0
UK      30.0
Germany    20.0
Name: country, dtype: float64

The normalize=True parameter gives proportions that sum to 1.0.

Counting Unique Values

# How many unique values?
print(f"Number of unique countries: {companies['country'].nunique()}")
print(f"Number of unique sectors: {companies['sector'].nunique()}")

# What are the unique values?
print(f"\nUnique countries: {companies['country'].unique()}")
print(f"Unique sectors: {companies['sector'].unique()}")

Output:

Number of unique countries: 3
Number of unique sectors: 9

Unique countries: ['USA' 'UK' 'Germany']
Unique sectors: ['Technology' 'Food' 'Automotive' 'Financials' 'Energy'
 'Healthcare' 'Retail' 'Transportation' 'Materials']

Methods:

.nunique() - count unique values
.unique() - get array of unique values

Series Statistical Methods

Series provide the same statistical methods as NumPy arrays:

Basic Statistics

# Statistics on revenues
print("Revenue Statistics:")
print(f"Mean: ${companies['revenues'].mean():,.0f}")
print(f"Median: ${companies['revenues'].median():,.0f}")
print(f"Min: ${companies['revenues'].min():,.0f}")
print(f"Max: ${companies['revenues'].max():,.0f}")
print(f"Std Dev: ${companies['revenues'].std():,.0f}")

Output:

Revenue Statistics:
Mean: $148,900
Median: $150,500
Min: $89,000
Max: $234,000
Std Dev: $41,793

Summary Statistics

# Get all statistics at once
print("Revenue Statistics Summary:")
print(companies['revenues'].describe())

Output:

count        10.000000
mean     148900.000000
std       44256.826645
min       89000.000000
25%      115500.000000
50%      150500.000000
75%      175250.000000
max      234000.000000
Name: revenues, dtype: float64

The .describe() method provides count, mean, std, min, quartiles, and max.

Sum and Average

# Total and average profits
total_profits = companies['profits'].sum()
avg_profit = companies['profits'].mean()

print(f"Total profits across all companies: ${total_profits:,}")
print(f"Average profit per company: ${avg_profit:,.0f}")

Output:

Total profits across all companies: $176,300
Average profit per company: $17,630

Finding Maximum and Minimum Labels

# Which company has maximum/minimum?
max_revenue_company = companies['revenues'].idxmax()
min_profit_company = companies['profits'].idxmin()

print(f"Highest revenue: {max_revenue_company} (${companies.loc[max_revenue_company, 'revenues']:,})")
print(f"Lowest profit: {min_profit_company} (${companies.loc[min_profit_company, 'profits']:,})")

Output:

Highest revenue: FirstBank Holdings ($234,000)
Lowest profit: AutoDrive Motors ($-3,000)

Methods:

.idxmax() - returns the label (index) of maximum value
.idxmin() - returns the label (index) of minimum value

These are different from .max() and .min() which return the values themselves.

Statistics for Multiple Columns

# Statistics for all numeric columns
print("Statistics for all numeric columns:")
print(companies.describe())

This gives summary statistics for every numeric column in the DataFrame.

Series Methods Reference

Complete reference for common Series operations:

Arithmetic Operations

# Scalar operations (apply to all elements)
series + 10          # Add 10 to all values
series - 5           # Subtract 5 from all values
series * 100         # Multiply all values by 100
series / 1000        # Divide all values by 1000
series ** 2          # Square all values

# Operations between Series (element-wise)
df['col1'] + df['col2']    # Add two columns
df['col1'] / df['col2']    # Divide one column by another
df['col1'] * 100           # Multiply column by scalar

Counting Methods

# Counting unique values
.value_counts()                # Count occurrences of each value
.value_counts(normalize=True)  # Get proportions (sum to 1.0)
.value_counts().sort_index()   # Sort alphabetically
.nunique()                     # Count unique values
.unique()                      # Get array of unique values
.count()                       # Count non-null values

Statistical Methods

# Central tendency
.mean()              # Average
.median()            # Middle value
.mode()              # Most common value

# Spread
.std()               # Standard deviation
.var()               # Variance
.min()               # Minimum value
.max()               # Maximum value

# Totals
.sum()               # Sum of all values
.cumsum()            # Cumulative sum

# Position of extremes
.idxmax()            # Label of maximum value
.idxmin()            # Label of minimum value

# Summary
.describe()          # All statistics at once

Comparison to NumPy

All these methods work exactly like NumPy:

Operation	NumPy	Pandas Series
Mean	`arr.mean()`	`series.mean()`
Sum	`arr.sum()`	`series.sum()`
Max	`arr.max()`	`series.max()`
Std Dev	`arr.std()`	`series.std()`
Arithmetic	`arr / 1000`	`series / 1000`

Your NumPy skills transfer directly!

Practice Exercises

Apply Series operations with these exercises.

Exercise 1: Arithmetic Operations

Create a new column profit_per_employee (profits divided by employees)
Convert all revenues to thousands
Calculate what a 10% increase in profits would be for each company

# Your code here

Exercise 2: Value Counts

Count how many companies are in each sector
What percentage of companies are from USA?
How many unique countries are in the dataset?

# Your code here

Hint

For percentages, use value_counts(normalize=True) and multiply by 100!

Exercise 3: Statistics

Find the average number of employees across all companies
Find the total revenues across all companies
Which company has the most employees?
What is the median profit?

# Your code here

Exercise 4: Combined Operations

Create a column for revenue in billions (divide by 1,000,000)
Find which sector has the highest average revenue
Calculate the total profit for Iranian companies only

# Your code here

Summary

You now master Series operations for data analysis. Let’s review the key concepts.

Key Concepts

Vectorized Operations

Operations apply to all elements automatically
No loops needed—fast and efficient
Works just like NumPy arrays

Value Counts

Essential for categorical data analysis
Counts occurrences of each unique value
Can get proportions with normalize=True

Statistical Methods

Calculate mean, median, sum, min, max, std
.describe() gives complete summary
.idxmax() and .idxmin() find labels of extremes

Creating Columns

Use arithmetic to calculate new columns
Assignment syntax: df['new_col'] = calculation
Calculations are vectorized

Syntax Reference

# Arithmetic (vectorized)
series / 1000                    # Divide all values
series * 100                     # Multiply all values
df['col1'] / df['col2']          # Element-wise division
(df['a'] + df['b']) / 2          # Complex calculations

# Counting
.value_counts()                  # Count each unique value
.value_counts(normalize=True)    # Get proportions
.nunique()                       # Count unique values
.unique()                        # Get unique values
.count()                         # Count non-null

# Statistics
.mean()                          # Average
.median()                        # Middle value
.sum()                           # Total
.min() / .max()                  # Minimum/Maximum
.std()                           # Standard deviation
.describe()                      # All statistics
.idxmax() / .idxmin()            # Label of max/min

# Creating columns
df['new_col'] = calculation      # Add calculated column
df['margin'] = df['profit'] / df['revenue']

NumPy Skills Transfer

Everything you learned in NumPy applies:

# NumPy                          # Pandas
arr.mean()          →            series.mean()
arr.sum()           →            series.sum()
arr / 1000          →            series / 1000
arr[arr > 50]       →            series[series > 50]
arr.max()           →            series.max()

Plus pandas adds:

.value_counts() for categorical analysis
.idxmax() / .idxmin() for finding labels
Automatic label alignment
Missing data handling

Important Reminders

Operations are vectorized: No need for loops
Labels are preserved: Results maintain index labels
Create new columns: Use arithmetic and assignment
Count categories: Use .value_counts()
Find positions: Use .idxmax() and .idxmin()
Get summaries: Use .describe() for quick overview

Next Steps

You now understand Series operations and value counting. In the next lesson, you will learn to work with dates and times—essential for time-series analysis.

Continue to Lesson 6 - DateTime Fundamentals

Learn to work with dates, times, and time-series data

Back to Lesson 4 - Selecting with .iloc[]

Review position-based selection using integer indices

Master Data Analysis Operations

Series operations are the foundation of data analysis. You can now calculate new values, count categories, and extract statistics—the everyday tasks of data analysis.

These vectorized operations make pandas fast and your code concise. Use them instead of loops!

Lesson 4 - Selecting Data with .iloc[]

Lesson 6 - DateTime Fundamentals

Courses

DATATWEETS

Title here

Lesson 5 - Series Operations and Value Counts

Analyzing Data with Series Operations

Series Arithmetic Operations

Basic Arithmetic with Scalars

Arithmetic Between Series

Creating New Columns

Counting Values with .value_counts()

Basic Value Counts

Counting by Sector

Sorting Alphabetically

Getting Proportions

Counting Unique Values

Series Statistical Methods

Basic Statistics

Summary Statistics

Sum and Average

Finding Maximum and Minimum Labels

Statistics for Multiple Columns

Series Methods Reference

Arithmetic Operations

Counting Methods

Statistical Methods

Comparison to NumPy

Practice Exercises

Exercise 1: Arithmetic Operations

Exercise 2: Value Counts

Exercise 3: Statistics

Exercise 4: Combined Operations

Summary

Key Concepts

Syntax Reference

NumPy Skills Transfer

Important Reminders

Next Steps

Continue to Lesson 6 - DateTime Fundamentals

Back to Lesson 4 - Selecting with .iloc[]

Master Data Analysis Operations

Lesson 5 - Series Operations and Value Counts

Analyzing Data with Series Operations#

Series Arithmetic Operations#

Basic Arithmetic with Scalars#

Arithmetic Between Series#

Creating New Columns#

Counting Values with .value_counts()#

Basic Value Counts#

Counting by Sector#

Sorting Alphabetically#

Getting Proportions#

Counting Unique Values#

Series Statistical Methods#

Basic Statistics#

Summary Statistics#

Sum and Average#

Finding Maximum and Minimum Labels#

Statistics for Multiple Columns#

Series Methods Reference#

Arithmetic Operations#

Counting Methods#

Statistical Methods#

Comparison to NumPy#

Practice Exercises#

Exercise 1: Arithmetic Operations#

Exercise 2: Value Counts#

Exercise 3: Statistics#

Exercise 4: Combined Operations#

Summary#

Key Concepts#

Syntax Reference#

NumPy Skills Transfer#

Important Reminders#

Next Steps#

Continue to Lesson 6 - DateTime Fundamentals

Back to Lesson 4 - Selecting with .iloc[]

Master Data Analysis Operations#

Analyzing Data with Series Operations

Series Arithmetic Operations

Basic Arithmetic with Scalars

Arithmetic Between Series

Creating New Columns

Counting Values with .value_counts()

Basic Value Counts

Counting by Sector

Sorting Alphabetically

Getting Proportions

Counting Unique Values

Series Statistical Methods

Basic Statistics

Summary Statistics

Sum and Average

Finding Maximum and Minimum Labels

Statistics for Multiple Columns

Series Methods Reference

Arithmetic Operations

Counting Methods

Statistical Methods

Comparison to NumPy

Practice Exercises

Exercise 1: Arithmetic Operations

Exercise 2: Value Counts

Exercise 3: Statistics

Exercise 4: Combined Operations

Summary

Key Concepts

Syntax Reference

NumPy Skills Transfer

Important Reminders

Next Steps

Master Data Analysis Operations