Pandas Data Analysis Module Overview
On this page
- Your Complete Pandas Learning Path
- Module Structure
- Section 1: Foundation Skills
- Section 2: Filtering and Transformation
- Section 3: Data Cleaning
- Section 4: String Operations
- Section 5: Aggregation and Combining
- Section 6: Advanced Topics
- Final Project
- Learning Progression
- Real Datasets Throughout
- Get Started
- Your Path to Data Mastery
Your Complete Pandas Learning Path
This module teaches you pandas—Python’s most powerful library for data analysis. Through 21 comprehensive lessons, you will progress from pandas basics to advanced data manipulation techniques used by professional data analysts worldwide.
Module Structure
The Pandas Data Analysis module is organized into six thematic sections, each building on the previous one:
Foundation (Lessons 1-6)
↓
Filtering & Transformation (Lessons 7-10)
↓
Data Cleaning (Lessons 11-13)
↓
String Operations (Lesson 14)
↓
Aggregation & Combining (Lessons 15-18)
↓
Advanced Topics (Lessons 19-20)
↓
Real-World Project (Lesson 21)Each lesson includes theory, code examples, practical exercises, and real-world applications.
Section 1: Foundation Skills
Lesson 1: Introduction to Pandas and Series
Duration: 15 minutes | Weight: 610
Learn what pandas is, why it matters, and understand the Series data structure—pandas’ one-dimensional labeled array.
What You Will Learn:
- What pandas is and why it enhances NumPy
- The difference between Series and NumPy arrays
- How to create Series from lists, arrays, and dictionaries
- Series properties: index, values, dtype, name
- Basic Series operations and indexing
- When to use Series vs DataFrame
Key Concepts:
- Labels provide context and meaning to data
- Series align data by index automatically
- Series support mixed operations with scalars and other Series
Practical Application: Working with single columns of data like temperatures, prices, or counts.
Lesson 2: DataFrames and Reading Data
Duration: 20 minutes | Weight: 620
Master the DataFrame—pandas’ two-dimensional labeled data structure that represents tables with rows and columns.
What You Will Learn:
- What a DataFrame is and how it relates to Series
- Loading CSV files with
pd.read_csv() - Exploring DataFrames with
.head(),.tail(),.info(),.shape - Understanding DataFrame anatomy: index, columns, values
- Inspecting data types with
.dtypes - Getting statistical summaries with
.describe()
Key Concepts:
- DataFrames are collections of Series sharing the same index
- Each column can have a different data type
- Index and column labels make data self-documenting
Practical Application: Loading and exploring datasets like Fortune 500 companies or sales data.
Lesson 3: Selecting Data with .loc[]
Duration: 20 minutes | Weight: 630
Master label-based selection—the pandas way of accessing data using meaningful names instead of numeric positions.
What You Will Learn:
- Selecting single values with
.loc[row, column] - Selecting entire rows by label
- Selecting multiple rows with lists or slices
- Selecting columns by name
- Combining row and column selections
- Understanding inclusive slicing (different from Python!)
Key Concepts:
.loc[]uses labels (row names and column names)- Label-based slicing includes both start and end
- Code becomes self-documenting when you use labels
Practical Application: Extracting specific company data, financial metrics for particular companies, or regional subsets.
Lesson 4: Selecting Data with .iloc[]
Duration: 15 minutes | Weight: 640
Learn position-based selection for when you need to access data by numeric location rather than labels.
What You Will Learn:
- Selecting by integer position with
.iloc[] - Differences between
.loc[]and.iloc[] - When to use position-based vs label-based selection
- Selecting rows and columns by position
- Using negative indices to count from the end
Key Concepts:
.iloc[]uses integer positions (like NumPy)- Position-based slicing excludes the end (like Python)
- Choose
.loc[]for clarity,.iloc[]for position-based logic
Practical Application: Getting first n rows, last n rows, every nth row, or specific position-based samples.
Lesson 5: Series Operations and Value Counts
Duration: 15 minutes | Weight: 650
Learn to perform operations on Series and analyze categorical data with value counts.
What You Will Learn:
- Arithmetic operations on Series
- Creating calculated Series from existing columns
- Counting unique values with
.value_counts() - Finding top categories or most frequent values
- Calculating percentages and proportions
- Series statistical methods
Key Concepts:
- Series operations vectorize like NumPy arrays
.value_counts()is essential for categorical analysis- Series methods return Series, maintaining labels
Practical Application: Calculating profit margins, counting companies by country, finding most common categories.
Lesson 6: DateTime Fundamentals
Duration: 20 minutes | Weight: 660
Master working with dates and times—essential for time-series analysis and temporal data.
What You Will Learn:
- Converting strings to datetime objects
- Extracting date components (year, month, day, weekday)
- Performing date arithmetic and calculations
- Filtering data by date ranges
- Understanding pandas datetime index
- Working with time zones
Key Concepts:
- Dates are first-class data types in pandas
- DateTime index enables powerful time-based operations
- Date arithmetic works intuitively (add days, find differences)
Practical Application: Analyzing sales trends over time, filtering by date ranges, calculating time periods.
Section 2: Filtering and Transformation
Lesson 7: Boolean Filtering in Pandas
Duration: 15 minutes | Weight: 670
Apply NumPy Boolean indexing concepts to DataFrames for powerful data filtering.
What You Will Learn:
- Creating Boolean masks on DataFrame columns
- Filtering rows based on conditions
- Combining conditions with
&(AND),|(OR),~(NOT) - Using
.isin()for membership testing - Filtering and selecting columns simultaneously
- Counting filtered results
Key Concepts:
- Each condition needs parentheses when combining
- Boolean filtering creates new DataFrames
- Filters can reference multiple columns
Practical Application: Finding profitable tech companies, filtering by country and sector, identifying outliers.
Lesson 8: Sorting and Ranking
Duration: 10 minutes | Weight: 680
Learn to order data for analysis and identify top performers.
What You Will Learn:
- Sorting by single columns with
.sort_values() - Sorting by multiple columns with priority
- Ascending vs descending order
- Understanding the
inplaceparameter - Sorting by index with
.sort_index() - Finding largest and smallest values with
.nlargest()and.nsmallest()
Key Concepts:
- Sorting creates new DataFrames unless
inplace=True - Multi-column sorting uses lists for columns and ascending
- Rankings help identify top and bottom performers
Practical Application: Finding top revenue companies, identifying worst performers, sorting by country then revenue.
Lesson 9: Adding and Modifying Columns
Duration: 15 minutes | Weight: 690
Create new calculated columns and modify existing ones to derive insights.
What You Will Learn:
- Adding columns from calculations
- Modifying existing columns in place
- Using
np.where()for conditional columns - Dropping unwanted columns with
.drop() - Renaming columns with
.rename() - Reordering columns
Key Concepts:
- New columns are calculated from existing ones
- Column assignment syntax mirrors dictionary assignment
- Conditional columns enable categorization
Practical Application: Calculating profit margins, creating size categories, flagging conditions, removing unnecessary columns.
Lesson 10: Apply, Map, and Transform Functions
Duration: 20 minutes | Weight: 700
Apply custom functions to Series and DataFrames for complex transformations.
What You Will Learn:
- Using
.apply()to apply functions to columns or rows - Using
.map()for Series transformations - Applying custom functions with arguments
- Understanding
.applymap()for element-wise DataFrame operations - Lambda functions for quick transformations
- Reshaping data with
pd.melt()(wide to long format)
Key Concepts:
.apply()enables complex custom transformations- Functions can take additional parameters
- Reshaping changes data organization for analysis
Practical Application: Categorizing values with custom logic, transforming data formats, normalizing values.
Section 3: Data Cleaning
Lesson 11: Handling Missing Data
Duration: 15 minutes | Weight: 710
Master techniques for dealing with missing values—a reality in all real datasets.
What You Will Learn:
- Detecting missing values with
.isnull()and.notnull() - Counting nulls per column
- Dropping rows or columns with
.dropna() - Filling missing values with
.fillna() - Forward fill and backward fill methods
- Different strategies for different columns
Key Concepts:
- NaN represents missing numeric data
- Different strategies: drop, fill with mean/median, forward fill
- Consider why data is missing before choosing a strategy
Practical Application: Cleaning survey data, handling missing sales figures, preparing data for analysis.
Lesson 12: Data Type Conversion and Cleaning
Duration: 20 minutes | Weight: 720
Clean messy data by converting types, removing formatting, and extracting information.
What You Will Learn:
- Converting between data types with
.astype() - Cleaning column names (removing spaces, special characters)
- Removing units from numeric strings (“8GB” → 8)
- Handling European number formats
- Extracting numeric values from text
- Fixing encoding issues when loading data
Key Concepts:
- Real data comes with formatting that must be removed
- Type conversion prepares data for analysis
- Column names should be clean and consistent
Practical Application: Cleaning laptop specifications, converting currency strings to numbers, standardizing formats.
Lesson 13: Removing Duplicates and Handling Outliers
Duration: 15 minutes | Weight: 730
Ensure data quality by identifying and handling duplicates and extreme values.
What You Will Learn:
- Detecting duplicates with
.duplicated() - Removing duplicates with
.drop_duplicates() - Specifying columns for duplicate checking
- Identifying outliers using IQR method
- Deciding when to remove vs cap outliers
- Using
.clip()to cap extreme values
Key Concepts:
- Duplicates can bias analysis if not removed
- Outliers may be errors or legitimate extreme values
- Context determines whether to remove or keep outliers
Practical Application: Removing duplicate entries, handling data entry errors, managing extreme values in revenue or prices.
Section 4: String Operations
Lesson 14: Working with String Data
Duration: 20 minutes | Weight: 740
Use vectorized string methods to clean and analyze text data efficiently.
What You Will Learn:
- Accessing string methods with
.straccessor - Case conversion (
.str.lower(),.str.upper()) - Removing whitespace with
.str.strip() - Replacing text with
.str.replace() - Checking if strings contain patterns
- Splitting strings and extracting parts
- Basic regular expression patterns
Key Concepts:
.strenables vectorized string operations- Operations apply to entire columns at once
- String methods are much faster than loops
Practical Application: Standardizing country names, extracting information from text, filtering by text patterns.
Section 5: Aggregation and Combining
Lesson 15: GroupBy and Aggregation
Duration: 20 minutes | Weight: 750
Master the most powerful pandas feature—grouping data by categories and calculating statistics.
What You Will Learn:
- Grouping data with
.groupby() - Calculating aggregate statistics per group
- Using
.agg()with multiple functions - Grouping by multiple columns
- Understanding split-apply-combine pattern
- Applying different functions to different columns
Key Concepts:
- GroupBy splits data into groups, applies functions, combines results
- Essential for category-based analysis
- Enables answering “by group” questions
Practical Application: Average revenue by sector, total employees by country, sales statistics by region and year.
Lesson 16: Pivot Tables and Reshaping
Duration: 15 minutes | Weight: 760
Create Excel-style pivot tables and reshape data between wide and long formats.
What You Will Learn:
- Creating pivot tables with
.pivot_table() - Specifying index, columns, and values
- Using multiple aggregation functions
- Reshaping wide data to long with
pd.melt() - Reshaping long data to wide with
.pivot() - Understanding when to use each format
Key Concepts:
- Pivot tables summarize data in cross-tabulated form
- Wide format: one row per entity, columns for variables
- Long format: one row per observation
Practical Application: Summarizing sales by product and month, creating summary tables, preparing data for visualization.
Lesson 17: Concatenating DataFrames
Duration: 10 minutes | Weight: 770
Combine multiple DataFrames by stacking them vertically or horizontally.
What You Will Learn:
- Vertical concatenation with
pd.concat()(axis=0) - Horizontal concatenation (axis=1)
- Handling mismatched columns or indexes
- Resetting indexes after concatenation
- Adding identifying columns before combining
- Inner vs outer joins when concatenating
Key Concepts:
- Concatenation stacks DataFrames like gluing
- Vertical: combines rows (same columns)
- Horizontal: combines columns (same rows)
Practical Application: Combining multi-year datasets, appending new data, side-by-side comparisons.
Lesson 18: Merging and Joining DataFrames
Duration: 20 minutes | Weight: 780
Perform database-style joins to combine datasets based on common keys.
What You Will Learn:
- Merging DataFrames with
pd.merge() - Understanding join types: inner, outer, left, right
- Joining on single or multiple columns
- Handling duplicate column names with suffixes
- Merging on indexes vs columns
- Validating merge results
Key Concepts:
- Inner join: keeps only matching rows
- Outer join: keeps all rows from both
- Left/right join: keeps all from one side
- Choose join type based on analysis needs
Practical Application: Combining happiness data across years, joining sales with product information, enriching datasets.
Section 6: Advanced Topics
Lesson 19: MultiIndex and Hierarchical Data
Duration: 15 minutes | Weight: 790
Work with hierarchical indexes for multi-level data organization.
What You Will Learn:
- Creating MultiIndex DataFrames
- Selecting data from hierarchical indexes
- Using
.xs()for cross-sections - Stacking and unstacking levels
- Resetting and setting indexes
- GroupBy with MultiIndex results
Key Concepts:
- MultiIndex enables multi-dimensional indexing
- Useful for grouped data and time series
- Provides flexible data organization
Practical Application: Organizing data by country and city, multi-level time series, hierarchical categories.
Lesson 20: Window Functions and Rolling Calculations
Duration: 15 minutes | Weight: 800
Calculate moving averages and cumulative statistics for time-series analysis.
What You Will Learn:
- Computing rolling means with
.rolling() - Window-based statistics (sum, min, max)
- Expanding windows for cumulative calculations
- Calculating percentage changes
- Shift operations for lagged values
- Working with time-based windows
Key Concepts:
- Rolling windows smooth data and identify trends
- Window size determines smoothing level
- Essential for time-series analysis
Practical Application: Moving averages for stock prices, smoothing noisy data, trend analysis.
Final Project
Lesson 21: Complete Data Analysis Project
Duration: 30 minutes | Weight: 810
Apply everything you learned to a comprehensive real-world data analysis project.
Project: Analyze Global Happiness Trends (2015-2017)
Tasks:
Part 1: Loading and Combining
- Load multiple years of happiness data
- Add year identifiers
- Combine into single DataFrame
- Handle encoding issues
Part 2: Data Cleaning
- Standardize column names
- Handle missing values
- Remove duplicates
- Verify data types
Part 3: Analysis Questions
- Which region had highest happiness each year?
- Did global happiness increase or decrease?
- Which countries improved the most?
- What factors correlate with happiness?
- Count countries by region and year
Part 4: Export Results
- Create summary DataFrames
- Export to CSV for further use
- Save cleaned dataset
Skills Applied: All skills from Lessons 1-20 combined in a realistic workflow.
Learning Progression
This module follows a carefully designed progression:
Foundation → Understand pandas data structures
Selection → Access exactly the data you need
Filtering → Extract subsets based on conditions
Transformation → Calculate and modify data
Cleaning → Prepare messy data for analysis
Aggregation → Summarize by groups
Combination → Merge multiple datasets
Advanced → Handle complex scenarios
Application → Complete real projects
Each lesson builds on previous ones, ensuring you master concepts before advancing.
Real Datasets Throughout
You will work with actual datasets used in professional data analysis:
- Fortune 500 Companies: Revenue, profits, employees, sectors
- World Happiness Reports: Multi-year international data
- Laptop Specifications: Messy real-world e-commerce data
- Time Series Data: Sales, stock prices, trends
These datasets contain real challenges: missing values, inconsistent formatting, encoding issues, duplicates, and outliers. You will learn to handle them all.
Get Started
Ready to begin your pandas journey? Start with Lesson 1 to build your foundation.
Start Lesson 1 - Introduction to Pandas and Series
Learn what pandas is and master the Series data structure
Back to Module Home
Return to the Pandas Data Analysis module introduction
Your Path to Data Mastery
This module transforms you from someone who understands arrays to someone who can clean, analyze, and derive insights from real-world datasets. Pandas is the most important Python library for working with tabular data.
The skills you learn here will serve you every day as a data professional.
Complete these 21 lessons, and you will have the data manipulation expertise that employers demand and projects require. Let’s begin!