Pandas Data Analysis Module Overview

Your Complete Pandas Learning Path

This module teaches you pandas—Python’s most powerful library for data analysis. Through 21 comprehensive lessons, you will progress from pandas basics to advanced data manipulation techniques used by professional data analysts worldwide.


Module Structure

The Pandas Data Analysis module is organized into six thematic sections, each building on the previous one:

Foundation (Lessons 1-6)
    ↓
Filtering & Transformation (Lessons 7-10)
    ↓
Data Cleaning (Lessons 11-13)
    ↓
String Operations (Lesson 14)
    ↓
Aggregation & Combining (Lessons 15-18)
    ↓
Advanced Topics (Lessons 19-20)
    ↓
Real-World Project (Lesson 21)

Each lesson includes theory, code examples, practical exercises, and real-world applications.


Section 1: Foundation Skills

Lesson 1: Introduction to Pandas and Series

Duration: 15 minutes | Weight: 610

Learn what pandas is, why it matters, and understand the Series data structure—pandas’ one-dimensional labeled array.

What You Will Learn:

  • What pandas is and why it enhances NumPy
  • The difference between Series and NumPy arrays
  • How to create Series from lists, arrays, and dictionaries
  • Series properties: index, values, dtype, name
  • Basic Series operations and indexing
  • When to use Series vs DataFrame

Key Concepts:

  • Labels provide context and meaning to data
  • Series align data by index automatically
  • Series support mixed operations with scalars and other Series

Practical Application: Working with single columns of data like temperatures, prices, or counts.


Lesson 2: DataFrames and Reading Data

Duration: 20 minutes | Weight: 620

Master the DataFrame—pandas’ two-dimensional labeled data structure that represents tables with rows and columns.

What You Will Learn:

  • What a DataFrame is and how it relates to Series
  • Loading CSV files with pd.read_csv()
  • Exploring DataFrames with .head(), .tail(), .info(), .shape
  • Understanding DataFrame anatomy: index, columns, values
  • Inspecting data types with .dtypes
  • Getting statistical summaries with .describe()

Key Concepts:

  • DataFrames are collections of Series sharing the same index
  • Each column can have a different data type
  • Index and column labels make data self-documenting

Practical Application: Loading and exploring datasets like Fortune 500 companies or sales data.


Lesson 3: Selecting Data with .loc[]

Duration: 20 minutes | Weight: 630

Master label-based selection—the pandas way of accessing data using meaningful names instead of numeric positions.

What You Will Learn:

  • Selecting single values with .loc[row, column]
  • Selecting entire rows by label
  • Selecting multiple rows with lists or slices
  • Selecting columns by name
  • Combining row and column selections
  • Understanding inclusive slicing (different from Python!)

Key Concepts:

  • .loc[] uses labels (row names and column names)
  • Label-based slicing includes both start and end
  • Code becomes self-documenting when you use labels

Practical Application: Extracting specific company data, financial metrics for particular companies, or regional subsets.


Lesson 4: Selecting Data with .iloc[]

Duration: 15 minutes | Weight: 640

Learn position-based selection for when you need to access data by numeric location rather than labels.

What You Will Learn:

  • Selecting by integer position with .iloc[]
  • Differences between .loc[] and .iloc[]
  • When to use position-based vs label-based selection
  • Selecting rows and columns by position
  • Using negative indices to count from the end

Key Concepts:

  • .iloc[] uses integer positions (like NumPy)
  • Position-based slicing excludes the end (like Python)
  • Choose .loc[] for clarity, .iloc[] for position-based logic

Practical Application: Getting first n rows, last n rows, every nth row, or specific position-based samples.


Lesson 5: Series Operations and Value Counts

Duration: 15 minutes | Weight: 650

Learn to perform operations on Series and analyze categorical data with value counts.

What You Will Learn:

  • Arithmetic operations on Series
  • Creating calculated Series from existing columns
  • Counting unique values with .value_counts()
  • Finding top categories or most frequent values
  • Calculating percentages and proportions
  • Series statistical methods

Key Concepts:

  • Series operations vectorize like NumPy arrays
  • .value_counts() is essential for categorical analysis
  • Series methods return Series, maintaining labels

Practical Application: Calculating profit margins, counting companies by country, finding most common categories.


Lesson 6: DateTime Fundamentals

Duration: 20 minutes | Weight: 660

Master working with dates and times—essential for time-series analysis and temporal data.

What You Will Learn:

  • Converting strings to datetime objects
  • Extracting date components (year, month, day, weekday)
  • Performing date arithmetic and calculations
  • Filtering data by date ranges
  • Understanding pandas datetime index
  • Working with time zones

Key Concepts:

  • Dates are first-class data types in pandas
  • DateTime index enables powerful time-based operations
  • Date arithmetic works intuitively (add days, find differences)

Practical Application: Analyzing sales trends over time, filtering by date ranges, calculating time periods.


Section 2: Filtering and Transformation

Lesson 7: Boolean Filtering in Pandas

Duration: 15 minutes | Weight: 670

Apply NumPy Boolean indexing concepts to DataFrames for powerful data filtering.

What You Will Learn:

  • Creating Boolean masks on DataFrame columns
  • Filtering rows based on conditions
  • Combining conditions with & (AND), | (OR), ~ (NOT)
  • Using .isin() for membership testing
  • Filtering and selecting columns simultaneously
  • Counting filtered results

Key Concepts:

  • Each condition needs parentheses when combining
  • Boolean filtering creates new DataFrames
  • Filters can reference multiple columns

Practical Application: Finding profitable tech companies, filtering by country and sector, identifying outliers.


Lesson 8: Sorting and Ranking

Duration: 10 minutes | Weight: 680

Learn to order data for analysis and identify top performers.

What You Will Learn:

  • Sorting by single columns with .sort_values()
  • Sorting by multiple columns with priority
  • Ascending vs descending order
  • Understanding the inplace parameter
  • Sorting by index with .sort_index()
  • Finding largest and smallest values with .nlargest() and .nsmallest()

Key Concepts:

  • Sorting creates new DataFrames unless inplace=True
  • Multi-column sorting uses lists for columns and ascending
  • Rankings help identify top and bottom performers

Practical Application: Finding top revenue companies, identifying worst performers, sorting by country then revenue.


Lesson 9: Adding and Modifying Columns

Duration: 15 minutes | Weight: 690

Create new calculated columns and modify existing ones to derive insights.

What You Will Learn:

  • Adding columns from calculations
  • Modifying existing columns in place
  • Using np.where() for conditional columns
  • Dropping unwanted columns with .drop()
  • Renaming columns with .rename()
  • Reordering columns

Key Concepts:

  • New columns are calculated from existing ones
  • Column assignment syntax mirrors dictionary assignment
  • Conditional columns enable categorization

Practical Application: Calculating profit margins, creating size categories, flagging conditions, removing unnecessary columns.


Lesson 10: Apply, Map, and Transform Functions

Duration: 20 minutes | Weight: 700

Apply custom functions to Series and DataFrames for complex transformations.

What You Will Learn:

  • Using .apply() to apply functions to columns or rows
  • Using .map() for Series transformations
  • Applying custom functions with arguments
  • Understanding .applymap() for element-wise DataFrame operations
  • Lambda functions for quick transformations
  • Reshaping data with pd.melt() (wide to long format)

Key Concepts:

  • .apply() enables complex custom transformations
  • Functions can take additional parameters
  • Reshaping changes data organization for analysis

Practical Application: Categorizing values with custom logic, transforming data formats, normalizing values.


Section 3: Data Cleaning

Lesson 11: Handling Missing Data

Duration: 15 minutes | Weight: 710

Master techniques for dealing with missing values—a reality in all real datasets.

What You Will Learn:

  • Detecting missing values with .isnull() and .notnull()
  • Counting nulls per column
  • Dropping rows or columns with .dropna()
  • Filling missing values with .fillna()
  • Forward fill and backward fill methods
  • Different strategies for different columns

Key Concepts:

  • NaN represents missing numeric data
  • Different strategies: drop, fill with mean/median, forward fill
  • Consider why data is missing before choosing a strategy

Practical Application: Cleaning survey data, handling missing sales figures, preparing data for analysis.


Lesson 12: Data Type Conversion and Cleaning

Duration: 20 minutes | Weight: 720

Clean messy data by converting types, removing formatting, and extracting information.

What You Will Learn:

  • Converting between data types with .astype()
  • Cleaning column names (removing spaces, special characters)
  • Removing units from numeric strings (“8GB” → 8)
  • Handling European number formats
  • Extracting numeric values from text
  • Fixing encoding issues when loading data

Key Concepts:

  • Real data comes with formatting that must be removed
  • Type conversion prepares data for analysis
  • Column names should be clean and consistent

Practical Application: Cleaning laptop specifications, converting currency strings to numbers, standardizing formats.


Lesson 13: Removing Duplicates and Handling Outliers

Duration: 15 minutes | Weight: 730

Ensure data quality by identifying and handling duplicates and extreme values.

What You Will Learn:

  • Detecting duplicates with .duplicated()
  • Removing duplicates with .drop_duplicates()
  • Specifying columns for duplicate checking
  • Identifying outliers using IQR method
  • Deciding when to remove vs cap outliers
  • Using .clip() to cap extreme values

Key Concepts:

  • Duplicates can bias analysis if not removed
  • Outliers may be errors or legitimate extreme values
  • Context determines whether to remove or keep outliers

Practical Application: Removing duplicate entries, handling data entry errors, managing extreme values in revenue or prices.


Section 4: String Operations

Lesson 14: Working with String Data

Duration: 20 minutes | Weight: 740

Use vectorized string methods to clean and analyze text data efficiently.

What You Will Learn:

  • Accessing string methods with .str accessor
  • Case conversion (.str.lower(), .str.upper())
  • Removing whitespace with .str.strip()
  • Replacing text with .str.replace()
  • Checking if strings contain patterns
  • Splitting strings and extracting parts
  • Basic regular expression patterns

Key Concepts:

  • .str enables vectorized string operations
  • Operations apply to entire columns at once
  • String methods are much faster than loops

Practical Application: Standardizing country names, extracting information from text, filtering by text patterns.


Section 5: Aggregation and Combining

Lesson 15: GroupBy and Aggregation

Duration: 20 minutes | Weight: 750

Master the most powerful pandas feature—grouping data by categories and calculating statistics.

What You Will Learn:

  • Grouping data with .groupby()
  • Calculating aggregate statistics per group
  • Using .agg() with multiple functions
  • Grouping by multiple columns
  • Understanding split-apply-combine pattern
  • Applying different functions to different columns

Key Concepts:

  • GroupBy splits data into groups, applies functions, combines results
  • Essential for category-based analysis
  • Enables answering “by group” questions

Practical Application: Average revenue by sector, total employees by country, sales statistics by region and year.


Lesson 16: Pivot Tables and Reshaping

Duration: 15 minutes | Weight: 760

Create Excel-style pivot tables and reshape data between wide and long formats.

What You Will Learn:

  • Creating pivot tables with .pivot_table()
  • Specifying index, columns, and values
  • Using multiple aggregation functions
  • Reshaping wide data to long with pd.melt()
  • Reshaping long data to wide with .pivot()
  • Understanding when to use each format

Key Concepts:

  • Pivot tables summarize data in cross-tabulated form
  • Wide format: one row per entity, columns for variables
  • Long format: one row per observation

Practical Application: Summarizing sales by product and month, creating summary tables, preparing data for visualization.


Lesson 17: Concatenating DataFrames

Duration: 10 minutes | Weight: 770

Combine multiple DataFrames by stacking them vertically or horizontally.

What You Will Learn:

  • Vertical concatenation with pd.concat() (axis=0)
  • Horizontal concatenation (axis=1)
  • Handling mismatched columns or indexes
  • Resetting indexes after concatenation
  • Adding identifying columns before combining
  • Inner vs outer joins when concatenating

Key Concepts:

  • Concatenation stacks DataFrames like gluing
  • Vertical: combines rows (same columns)
  • Horizontal: combines columns (same rows)

Practical Application: Combining multi-year datasets, appending new data, side-by-side comparisons.


Lesson 18: Merging and Joining DataFrames

Duration: 20 minutes | Weight: 780

Perform database-style joins to combine datasets based on common keys.

What You Will Learn:

  • Merging DataFrames with pd.merge()
  • Understanding join types: inner, outer, left, right
  • Joining on single or multiple columns
  • Handling duplicate column names with suffixes
  • Merging on indexes vs columns
  • Validating merge results

Key Concepts:

  • Inner join: keeps only matching rows
  • Outer join: keeps all rows from both
  • Left/right join: keeps all from one side
  • Choose join type based on analysis needs

Practical Application: Combining happiness data across years, joining sales with product information, enriching datasets.


Section 6: Advanced Topics

Lesson 19: MultiIndex and Hierarchical Data

Duration: 15 minutes | Weight: 790

Work with hierarchical indexes for multi-level data organization.

What You Will Learn:

  • Creating MultiIndex DataFrames
  • Selecting data from hierarchical indexes
  • Using .xs() for cross-sections
  • Stacking and unstacking levels
  • Resetting and setting indexes
  • GroupBy with MultiIndex results

Key Concepts:

  • MultiIndex enables multi-dimensional indexing
  • Useful for grouped data and time series
  • Provides flexible data organization

Practical Application: Organizing data by country and city, multi-level time series, hierarchical categories.


Lesson 20: Window Functions and Rolling Calculations

Duration: 15 minutes | Weight: 800

Calculate moving averages and cumulative statistics for time-series analysis.

What You Will Learn:

  • Computing rolling means with .rolling()
  • Window-based statistics (sum, min, max)
  • Expanding windows for cumulative calculations
  • Calculating percentage changes
  • Shift operations for lagged values
  • Working with time-based windows

Key Concepts:

  • Rolling windows smooth data and identify trends
  • Window size determines smoothing level
  • Essential for time-series analysis

Practical Application: Moving averages for stock prices, smoothing noisy data, trend analysis.


Final Project

Lesson 21: Complete Data Analysis Project

Duration: 30 minutes | Weight: 810

Apply everything you learned to a comprehensive real-world data analysis project.

Project: Analyze Global Happiness Trends (2015-2017)

Tasks:

Part 1: Loading and Combining

  • Load multiple years of happiness data
  • Add year identifiers
  • Combine into single DataFrame
  • Handle encoding issues

Part 2: Data Cleaning

  • Standardize column names
  • Handle missing values
  • Remove duplicates
  • Verify data types

Part 3: Analysis Questions

  • Which region had highest happiness each year?
  • Did global happiness increase or decrease?
  • Which countries improved the most?
  • What factors correlate with happiness?
  • Count countries by region and year

Part 4: Export Results

  • Create summary DataFrames
  • Export to CSV for further use
  • Save cleaned dataset

Skills Applied: All skills from Lessons 1-20 combined in a realistic workflow.


Learning Progression

This module follows a carefully designed progression:

Foundation → Understand pandas data structures

Selection → Access exactly the data you need

Filtering → Extract subsets based on conditions

Transformation → Calculate and modify data

Cleaning → Prepare messy data for analysis

Aggregation → Summarize by groups

Combination → Merge multiple datasets

Advanced → Handle complex scenarios

Application → Complete real projects

Each lesson builds on previous ones, ensuring you master concepts before advancing.


Real Datasets Throughout

You will work with actual datasets used in professional data analysis:

  • Fortune 500 Companies: Revenue, profits, employees, sectors
  • World Happiness Reports: Multi-year international data
  • Laptop Specifications: Messy real-world e-commerce data
  • Time Series Data: Sales, stock prices, trends

These datasets contain real challenges: missing values, inconsistent formatting, encoding issues, duplicates, and outliers. You will learn to handle them all.


Get Started

Ready to begin your pandas journey? Start with Lesson 1 to build your foundation.

Start Lesson 1 - Introduction to Pandas and Series

Learn what pandas is and master the Series data structure

Back to Module Home

Return to the Pandas Data Analysis module introduction


Your Path to Data Mastery

This module transforms you from someone who understands arrays to someone who can clean, analyze, and derive insights from real-world datasets. Pandas is the most important Python library for working with tabular data.

The skills you learn here will serve you every day as a data professional.

Complete these 21 lessons, and you will have the data manipulation expertise that employers demand and projects require. Let’s begin!