Lesson 5 - Series Operations and Value Counts

Analyzing Data with Series Operations

You can select data from DataFrames. Now you will learn to perform operations on Series—calculating new values, counting categories, and extracting statistics.

By the end of this lesson, you will be able to:

  • Perform arithmetic operations on Series
  • Create calculated columns from existing data
  • Count unique values with .value_counts()
  • Find top categories and most frequent values
  • Calculate statistics on Series data
  • Use .idxmax() and .idxmin() to find positions

Series operations are vectorized like NumPy, making data analysis fast and efficient. Let’s master these essential skills.


Series Arithmetic Operations

Series support vectorized arithmetic—operations apply to all elements automatically, just like NumPy arrays.

Let’s create our sample dataset:

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK']
})

companies = companies.set_index('company')

print(f"Shape: {companies.shape}")
companies.head()

Basic Arithmetic with Scalars

# Extract revenues as a Series
revenues = companies['revenues']

print("Revenues Series:")
print(revenues)

Output:

company
TechCorp Global        125000
FreshMart Inc       89000
AutoDrive Motors     156000
FirstBank Holdings     234000
PowerGen Energy   178000
...
Name: revenues, dtype: int64

Now convert to thousands:

# Convert to thousands (divide all values)
revenues_thousands = revenues / 1000

print("Revenues in thousands:")
print(revenues_thousands)

Output:

company
TechCorp Global        125.0
FreshMart Inc       89.0
AutoDrive Motors     156.0
FirstBank Holdings     234.0
PowerGen Energy   178.0
...
Name: revenues, dtype: float64

The division applies to every element automatically!

Arithmetic Between Series

Calculate profit margin for each company:

# Profit Margin = (Profits / Revenues) * 100
profit_margin = (companies['profits'] / companies['revenues']) * 100

print("Profit Margin (%) for each company:")
print(profit_margin.round(2))

Output:

company
TechCorp Global          9.60
FreshMart Inc        9.55
AutoDrive Motors      -1.92
FirstBank Holdings      19.23
PowerGen Energy    12.92
MediPharm Solutions     15.31
RetailHub Ltd        8.75
SkyWings Airlines  11.23
SteelCore Industries   12.41
NetLink Telecom    16.97
dtype: float64

Element-wise division between two Series, then multiply by 100.

Creating New Columns

Add the calculated profit margin as a new column:

# Add profit margin as a new column
companies['profit_margin'] = profit_margin

print("DataFrame with new column:")
companies.head()

Output:

              sector  revenues  profits  employees country  profit_margin
company
TechCorp Global  Technology    125000    12000       1200    USA           9.60
FreshMart Inc      Food     89000     8500        890    USA           9.55
AutoDrive Motors Automotive   156000    -3000       2300    USA          -1.92
FirstBank Holdings Financials   234000    45000       5600     UK          19.23
PowerGen Energy   Energy   178000    23000       3400   Germany          12.92

Another example:

# Revenue per employee
companies['revenue_per_employee'] = companies['revenues'] / companies['employees']

print("Revenue per employee:")
print(companies['revenue_per_employee'].round(2))

Output:

company
TechCorp Global         104.17
FreshMart Inc       100.00
AutoDrive Motors       67.83
FirstBank Holdings       41.79
PowerGen Energy     52.35
...
dtype: float64

Key Point: Operations vectorize automatically, just like NumPy!


Counting Values with .value_counts()

The .value_counts() method counts how many times each unique value appears. This is essential for analyzing categorical data.

Basic Value Counts

# Count companies by country
country_counts = companies['country'].value_counts()

print("Companies per country:")
print(country_counts)
print(f"\nType: {type(country_counts)}")

Output:

Companies per country:
USA     5
UK      3
Germany    2
Name: country, dtype: int64

Type: <class 'pandas.core.series.Series'>

Result is a Series with counts for each unique value, sorted by count (descending).

Counting by Sector

# Count companies by sector
sector_counts = companies['sector'].value_counts()

print("Companies per sector:")
print(sector_counts)

Output:

Technology      2
Automotive      1
Energy          1
Financials      1
Food            1
Healthcare      1
Materials       1
Retail          1
Transportation  1
Name: sector, dtype: int64

Technology has 2 companies, all others have 1.

Sorting Alphabetically

# Sort alphabetically instead of by count
sector_counts_sorted = companies['sector'].value_counts().sort_index()

print("Companies per sector (alphabetically):")
print(sector_counts_sorted)

Output:

Automotive      1
Energy          1
Financials      1
Food            1
Healthcare      1
Materials       1
Retail          1
Technology      2
Transportation  1
Name: sector, dtype: int64

Use .sort_index() to sort by the category names instead of counts.

Getting Proportions

# Get proportions instead of counts
country_proportions = companies['country'].value_counts(normalize=True)

print("Country proportions:")
print(country_proportions)

print("\nAs percentages:")
print((country_proportions * 100).round(1))

Output:

Country proportions:
USA     0.5
UK      0.3
Germany    0.2
Name: country, dtype: float64

As percentages:
USA     50.0
UK      30.0
Germany    20.0
Name: country, dtype: float64

The normalize=True parameter gives proportions that sum to 1.0.

Counting Unique Values

# How many unique values?
print(f"Number of unique countries: {companies['country'].nunique()}")
print(f"Number of unique sectors: {companies['sector'].nunique()}")

# What are the unique values?
print(f"\nUnique countries: {companies['country'].unique()}")
print(f"Unique sectors: {companies['sector'].unique()}")

Output:

Number of unique countries: 3
Number of unique sectors: 9

Unique countries: ['USA' 'UK' 'Germany']
Unique sectors: ['Technology' 'Food' 'Automotive' 'Financials' 'Energy'
 'Healthcare' 'Retail' 'Transportation' 'Materials']

Methods:

  • .nunique() - count unique values
  • .unique() - get array of unique values

Series Statistical Methods

Series provide the same statistical methods as NumPy arrays:

Basic Statistics

# Statistics on revenues
print("Revenue Statistics:")
print(f"Mean: ${companies['revenues'].mean():,.0f}")
print(f"Median: ${companies['revenues'].median():,.0f}")
print(f"Min: ${companies['revenues'].min():,.0f}")
print(f"Max: ${companies['revenues'].max():,.0f}")
print(f"Std Dev: ${companies['revenues'].std():,.0f}")

Output:

Revenue Statistics:
Mean: $148,900
Median: $150,500
Min: $89,000
Max: $234,000
Std Dev: $41,793

Summary Statistics

# Get all statistics at once
print("Revenue Statistics Summary:")
print(companies['revenues'].describe())

Output:

count        10.000000
mean     148900.000000
std       44256.826645
min       89000.000000
25%      115500.000000
50%      150500.000000
75%      175250.000000
max      234000.000000
Name: revenues, dtype: float64

The .describe() method provides count, mean, std, min, quartiles, and max.

Sum and Average

# Total and average profits
total_profits = companies['profits'].sum()
avg_profit = companies['profits'].mean()

print(f"Total profits across all companies: ${total_profits:,}")
print(f"Average profit per company: ${avg_profit:,.0f}")

Output:

Total profits across all companies: $176,300
Average profit per company: $17,630

Finding Maximum and Minimum Labels

# Which company has maximum/minimum?
max_revenue_company = companies['revenues'].idxmax()
min_profit_company = companies['profits'].idxmin()

print(f"Highest revenue: {max_revenue_company} (${companies.loc[max_revenue_company, 'revenues']:,})")
print(f"Lowest profit: {min_profit_company} (${companies.loc[min_profit_company, 'profits']:,})")

Output:

Highest revenue: FirstBank Holdings ($234,000)
Lowest profit: AutoDrive Motors ($-3,000)

Methods:

  • .idxmax() - returns the label (index) of maximum value
  • .idxmin() - returns the label (index) of minimum value

These are different from .max() and .min() which return the values themselves.

Statistics for Multiple Columns

# Statistics for all numeric columns
print("Statistics for all numeric columns:")
print(companies.describe())

This gives summary statistics for every numeric column in the DataFrame.


Series Methods Reference

Complete reference for common Series operations:

Arithmetic Operations

# Scalar operations (apply to all elements)
series + 10          # Add 10 to all values
series - 5           # Subtract 5 from all values
series * 100         # Multiply all values by 100
series / 1000        # Divide all values by 1000
series ** 2          # Square all values

# Operations between Series (element-wise)
df['col1'] + df['col2']    # Add two columns
df['col1'] / df['col2']    # Divide one column by another
df['col1'] * 100           # Multiply column by scalar

Counting Methods

# Counting unique values
.value_counts()                # Count occurrences of each value
.value_counts(normalize=True)  # Get proportions (sum to 1.0)
.value_counts().sort_index()   # Sort alphabetically
.nunique()                     # Count unique values
.unique()                      # Get array of unique values
.count()                       # Count non-null values

Statistical Methods

# Central tendency
.mean()              # Average
.median()            # Middle value
.mode()              # Most common value

# Spread
.std()               # Standard deviation
.var()               # Variance
.min()               # Minimum value
.max()               # Maximum value

# Totals
.sum()               # Sum of all values
.cumsum()            # Cumulative sum

# Position of extremes
.idxmax()            # Label of maximum value
.idxmin()            # Label of minimum value

# Summary
.describe()          # All statistics at once

Comparison to NumPy

All these methods work exactly like NumPy:

OperationNumPyPandas Series
Meanarr.mean()series.mean()
Sumarr.sum()series.sum()
Maxarr.max()series.max()
Std Devarr.std()series.std()
Arithmeticarr / 1000series / 1000

Your NumPy skills transfer directly!


Practice Exercises

Apply Series operations with these exercises.

Exercise 1: Arithmetic Operations

  1. Create a new column profit_per_employee (profits divided by employees)
  2. Convert all revenues to thousands
  3. Calculate what a 10% increase in profits would be for each company
# Your code here

Exercise 2: Value Counts

  1. Count how many companies are in each sector
  2. What percentage of companies are from USA?
  3. How many unique countries are in the dataset?
# Your code here

Hint

For percentages, use value_counts(normalize=True) and multiply by 100!

Exercise 3: Statistics

  1. Find the average number of employees across all companies
  2. Find the total revenues across all companies
  3. Which company has the most employees?
  4. What is the median profit?
# Your code here

Exercise 4: Combined Operations

  1. Create a column for revenue in billions (divide by 1,000,000)
  2. Find which sector has the highest average revenue
  3. Calculate the total profit for Iranian companies only
# Your code here

Summary

You now master Series operations for data analysis. Let’s review the key concepts.

Key Concepts

Vectorized Operations

  • Operations apply to all elements automatically
  • No loops needed—fast and efficient
  • Works just like NumPy arrays

Value Counts

  • Essential for categorical data analysis
  • Counts occurrences of each unique value
  • Can get proportions with normalize=True

Statistical Methods

  • Calculate mean, median, sum, min, max, std
  • .describe() gives complete summary
  • .idxmax() and .idxmin() find labels of extremes

Creating Columns

  • Use arithmetic to calculate new columns
  • Assignment syntax: df['new_col'] = calculation
  • Calculations are vectorized

Syntax Reference

# Arithmetic (vectorized)
series / 1000                    # Divide all values
series * 100                     # Multiply all values
df['col1'] / df['col2']          # Element-wise division
(df['a'] + df['b']) / 2          # Complex calculations

# Counting
.value_counts()                  # Count each unique value
.value_counts(normalize=True)    # Get proportions
.nunique()                       # Count unique values
.unique()                        # Get unique values
.count()                         # Count non-null

# Statistics
.mean()                          # Average
.median()                        # Middle value
.sum()                           # Total
.min() / .max()                  # Minimum/Maximum
.std()                           # Standard deviation
.describe()                      # All statistics
.idxmax() / .idxmin()            # Label of max/min

# Creating columns
df['new_col'] = calculation      # Add calculated column
df['margin'] = df['profit'] / df['revenue']

NumPy Skills Transfer

Everything you learned in NumPy applies:

# NumPy                          # Pandas
arr.mean()                      series.mean()
arr.sum()                       series.sum()
arr / 1000                      series / 1000
arr[arr > 50]                   series[series > 50]
arr.max()                       series.max()

Plus pandas adds:

  • .value_counts() for categorical analysis
  • .idxmax() / .idxmin() for finding labels
  • Automatic label alignment
  • Missing data handling

Important Reminders

  • Operations are vectorized: No need for loops
  • Labels are preserved: Results maintain index labels
  • Create new columns: Use arithmetic and assignment
  • Count categories: Use .value_counts()
  • Find positions: Use .idxmax() and .idxmin()
  • Get summaries: Use .describe() for quick overview

Next Steps

You now understand Series operations and value counting. In the next lesson, you will learn to work with dates and times—essential for time-series analysis.

Continue to Lesson 6 - DateTime Fundamentals

Learn to work with dates, times, and time-series data

Back to Lesson 4 - Selecting with .iloc[]

Review position-based selection using integer indices


Master Data Analysis Operations

Series operations are the foundation of data analysis. You can now calculate new values, count categories, and extract statistics—the everyday tasks of data analysis.

These vectorized operations make pandas fast and your code concise. Use them instead of loops!