Pandas Data Analysis Module Overview

Your Complete Pandas Learning Path

This module teaches you pandas—Python’s most powerful library for data analysis. Through 21 comprehensive lessons, you will progress from pandas basics to advanced data manipulation techniques used by professional data analysts worldwide.

Module Structure

The Pandas Data Analysis module is organized into six thematic sections, each building on the previous one:

Foundation (Lessons 1-6)
    ↓
Filtering & Transformation (Lessons 7-10)
    ↓
Data Cleaning (Lessons 11-13)
    ↓
String Operations (Lesson 14)
    ↓
Aggregation & Combining (Lessons 15-18)
    ↓
Advanced Topics (Lessons 19-20)
    ↓
Real-World Project (Lesson 21)

Each lesson includes theory, code examples, practical exercises, and real-world applications.

Section 1: Foundation Skills

Lesson 1: Introduction to Pandas and Series

Duration: 15 minutes | Weight: 610

Learn what pandas is, why it matters, and understand the Series data structure—pandas’ one-dimensional labeled array.

What You Will Learn:

What pandas is and why it enhances NumPy
The difference between Series and NumPy arrays
How to create Series from lists, arrays, and dictionaries
Series properties: index, values, dtype, name
Basic Series operations and indexing
When to use Series vs DataFrame

Key Concepts:

Labels provide context and meaning to data
Series align data by index automatically
Series support mixed operations with scalars and other Series

Practical Application: Working with single columns of data like temperatures, prices, or counts.

Lesson 2: DataFrames and Reading Data

Duration: 20 minutes | Weight: 620

Master the DataFrame—pandas’ two-dimensional labeled data structure that represents tables with rows and columns.

What You Will Learn:

What a DataFrame is and how it relates to Series
Loading CSV files with pd.read_csv()
Exploring DataFrames with .head(), .tail(), .info(), .shape
Understanding DataFrame anatomy: index, columns, values
Inspecting data types with .dtypes
Getting statistical summaries with .describe()

Key Concepts:

DataFrames are collections of Series sharing the same index
Each column can have a different data type
Index and column labels make data self-documenting

Practical Application: Loading and exploring datasets like Fortune 500 companies or sales data.

Lesson 3: Selecting Data with .loc[]

Duration: 20 minutes | Weight: 630

Master label-based selection—the pandas way of accessing data using meaningful names instead of numeric positions.

What You Will Learn:

Selecting single values with .loc[row, column]
Selecting entire rows by label
Selecting multiple rows with lists or slices
Selecting columns by name
Combining row and column selections
Understanding inclusive slicing (different from Python!)

Key Concepts:

.loc[] uses labels (row names and column names)
Label-based slicing includes both start and end
Code becomes self-documenting when you use labels

Practical Application: Extracting specific company data, financial metrics for particular companies, or regional subsets.

Lesson 4: Selecting Data with .iloc[]

Duration: 15 minutes | Weight: 640

Learn position-based selection for when you need to access data by numeric location rather than labels.

What You Will Learn:

Selecting by integer position with .iloc[]
Differences between .loc[] and .iloc[]
When to use position-based vs label-based selection
Selecting rows and columns by position
Using negative indices to count from the end

Key Concepts:

.iloc[] uses integer positions (like NumPy)
Position-based slicing excludes the end (like Python)
Choose .loc[] for clarity, .iloc[] for position-based logic

Practical Application: Getting first n rows, last n rows, every nth row, or specific position-based samples.

Lesson 5: Series Operations and Value Counts

Duration: 15 minutes | Weight: 650

Learn to perform operations on Series and analyze categorical data with value counts.

What You Will Learn:

Arithmetic operations on Series
Creating calculated Series from existing columns
Counting unique values with .value_counts()
Finding top categories or most frequent values
Calculating percentages and proportions
Series statistical methods

Key Concepts:

Series operations vectorize like NumPy arrays
.value_counts() is essential for categorical analysis
Series methods return Series, maintaining labels

Practical Application: Calculating profit margins, counting companies by country, finding most common categories.

Lesson 6: DateTime Fundamentals

Duration: 20 minutes | Weight: 660

Master working with dates and times—essential for time-series analysis and temporal data.

What You Will Learn:

Converting strings to datetime objects
Extracting date components (year, month, day, weekday)
Performing date arithmetic and calculations
Filtering data by date ranges
Understanding pandas datetime index
Working with time zones

Key Concepts:

Dates are first-class data types in pandas
DateTime index enables powerful time-based operations
Date arithmetic works intuitively (add days, find differences)

Practical Application: Analyzing sales trends over time, filtering by date ranges, calculating time periods.

Section 2: Filtering and Transformation

Lesson 7: Boolean Filtering in Pandas

Duration: 15 minutes | Weight: 670

Apply NumPy Boolean indexing concepts to DataFrames for powerful data filtering.

What You Will Learn:

Creating Boolean masks on DataFrame columns
Filtering rows based on conditions
Combining conditions with & (AND), | (OR), ~ (NOT)
Using .isin() for membership testing
Filtering and selecting columns simultaneously
Counting filtered results

Key Concepts:

Each condition needs parentheses when combining
Boolean filtering creates new DataFrames
Filters can reference multiple columns

Practical Application: Finding profitable tech companies, filtering by country and sector, identifying outliers.

Lesson 8: Sorting and Ranking

Duration: 10 minutes | Weight: 680

Learn to order data for analysis and identify top performers.

What You Will Learn:

Sorting by single columns with .sort_values()
Sorting by multiple columns with priority
Ascending vs descending order
Understanding the inplace parameter
Sorting by index with .sort_index()
Finding largest and smallest values with .nlargest() and .nsmallest()

Key Concepts:

Sorting creates new DataFrames unless inplace=True
Multi-column sorting uses lists for columns and ascending
Rankings help identify top and bottom performers

Practical Application: Finding top revenue companies, identifying worst performers, sorting by country then revenue.

Lesson 9: Adding and Modifying Columns

Duration: 15 minutes | Weight: 690

Create new calculated columns and modify existing ones to derive insights.

What You Will Learn:

Adding columns from calculations
Modifying existing columns in place
Using np.where() for conditional columns
Dropping unwanted columns with .drop()
Renaming columns with .rename()
Reordering columns

Key Concepts:

New columns are calculated from existing ones
Column assignment syntax mirrors dictionary assignment
Conditional columns enable categorization

Practical Application: Calculating profit margins, creating size categories, flagging conditions, removing unnecessary columns.

Lesson 10: Apply, Map, and Transform Functions

Duration: 20 minutes | Weight: 700

Apply custom functions to Series and DataFrames for complex transformations.

What You Will Learn:

Using .apply() to apply functions to columns or rows
Using .map() for Series transformations
Applying custom functions with arguments
Understanding .applymap() for element-wise DataFrame operations
Lambda functions for quick transformations
Reshaping data with pd.melt() (wide to long format)

Key Concepts:

.apply() enables complex custom transformations
Functions can take additional parameters
Reshaping changes data organization for analysis

Practical Application: Categorizing values with custom logic, transforming data formats, normalizing values.

Section 3: Data Cleaning

Lesson 11: Handling Missing Data

Duration: 15 minutes | Weight: 710

Master techniques for dealing with missing values—a reality in all real datasets.

What You Will Learn:

Detecting missing values with .isnull() and .notnull()
Counting nulls per column
Dropping rows or columns with .dropna()
Filling missing values with .fillna()
Forward fill and backward fill methods
Different strategies for different columns

Key Concepts:

NaN represents missing numeric data
Different strategies: drop, fill with mean/median, forward fill
Consider why data is missing before choosing a strategy

Practical Application: Cleaning survey data, handling missing sales figures, preparing data for analysis.

Lesson 12: Data Type Conversion and Cleaning

Duration: 20 minutes | Weight: 720

Clean messy data by converting types, removing formatting, and extracting information.

What You Will Learn:

Converting between data types with .astype()
Cleaning column names (removing spaces, special characters)
Removing units from numeric strings (“8GB” → 8)
Handling European number formats
Extracting numeric values from text
Fixing encoding issues when loading data

Key Concepts:

Real data comes with formatting that must be removed
Type conversion prepares data for analysis
Column names should be clean and consistent

Practical Application: Cleaning laptop specifications, converting currency strings to numbers, standardizing formats.

Lesson 13: Removing Duplicates and Handling Outliers

Duration: 15 minutes | Weight: 730

Ensure data quality by identifying and handling duplicates and extreme values.

What You Will Learn:

Detecting duplicates with .duplicated()
Removing duplicates with .drop_duplicates()
Specifying columns for duplicate checking
Identifying outliers using IQR method
Deciding when to remove vs cap outliers
Using .clip() to cap extreme values

Key Concepts:

Duplicates can bias analysis if not removed
Outliers may be errors or legitimate extreme values
Context determines whether to remove or keep outliers

Practical Application: Removing duplicate entries, handling data entry errors, managing extreme values in revenue or prices.

Section 4: String Operations

Lesson 14: Working with String Data

Duration: 20 minutes | Weight: 740

Use vectorized string methods to clean and analyze text data efficiently.

What You Will Learn:

Accessing string methods with .str accessor
Case conversion (.str.lower(), .str.upper())
Removing whitespace with .str.strip()
Replacing text with .str.replace()
Checking if strings contain patterns
Splitting strings and extracting parts
Basic regular expression patterns

Key Concepts:

.str enables vectorized string operations
Operations apply to entire columns at once
String methods are much faster than loops

Practical Application: Standardizing country names, extracting information from text, filtering by text patterns.

Section 5: Aggregation and Combining

Lesson 15: GroupBy and Aggregation

Duration: 20 minutes | Weight: 750

Master the most powerful pandas feature—grouping data by categories and calculating statistics.

What You Will Learn:

Grouping data with .groupby()
Calculating aggregate statistics per group
Using .agg() with multiple functions
Grouping by multiple columns
Understanding split-apply-combine pattern
Applying different functions to different columns

Key Concepts:

GroupBy splits data into groups, applies functions, combines results
Essential for category-based analysis
Enables answering “by group” questions

Practical Application: Average revenue by sector, total employees by country, sales statistics by region and year.

Lesson 16: Pivot Tables and Reshaping

Duration: 15 minutes | Weight: 760

Create Excel-style pivot tables and reshape data between wide and long formats.

What You Will Learn:

Creating pivot tables with .pivot_table()
Specifying index, columns, and values
Using multiple aggregation functions
Reshaping wide data to long with pd.melt()
Reshaping long data to wide with .pivot()
Understanding when to use each format

Key Concepts:

Pivot tables summarize data in cross-tabulated form
Wide format: one row per entity, columns for variables
Long format: one row per observation

Practical Application: Summarizing sales by product and month, creating summary tables, preparing data for visualization.

Lesson 17: Concatenating DataFrames

Duration: 10 minutes | Weight: 770

Combine multiple DataFrames by stacking them vertically or horizontally.

What You Will Learn:

Vertical concatenation with pd.concat() (axis=0)
Horizontal concatenation (axis=1)
Handling mismatched columns or indexes
Resetting indexes after concatenation
Adding identifying columns before combining
Inner vs outer joins when concatenating

Key Concepts:

Concatenation stacks DataFrames like gluing
Vertical: combines rows (same columns)
Horizontal: combines columns (same rows)

Practical Application: Combining multi-year datasets, appending new data, side-by-side comparisons.

Lesson 18: Merging and Joining DataFrames

Duration: 20 minutes | Weight: 780

Perform database-style joins to combine datasets based on common keys.

What You Will Learn:

Merging DataFrames with pd.merge()
Understanding join types: inner, outer, left, right
Joining on single or multiple columns
Handling duplicate column names with suffixes
Merging on indexes vs columns
Validating merge results

Key Concepts:

Inner join: keeps only matching rows
Outer join: keeps all rows from both
Left/right join: keeps all from one side
Choose join type based on analysis needs

Practical Application: Combining happiness data across years, joining sales with product information, enriching datasets.

Section 6: Advanced Topics

Lesson 19: MultiIndex and Hierarchical Data

Duration: 15 minutes | Weight: 790

Work with hierarchical indexes for multi-level data organization.

What You Will Learn:

Creating MultiIndex DataFrames
Selecting data from hierarchical indexes
Using .xs() for cross-sections
Stacking and unstacking levels
Resetting and setting indexes
GroupBy with MultiIndex results

Key Concepts:

MultiIndex enables multi-dimensional indexing
Useful for grouped data and time series
Provides flexible data organization

Practical Application: Organizing data by country and city, multi-level time series, hierarchical categories.

Lesson 20: Window Functions and Rolling Calculations

Duration: 15 minutes | Weight: 800

Calculate moving averages and cumulative statistics for time-series analysis.

What You Will Learn:

Computing rolling means with .rolling()
Window-based statistics (sum, min, max)
Expanding windows for cumulative calculations
Calculating percentage changes
Shift operations for lagged values
Working with time-based windows

Key Concepts:

Rolling windows smooth data and identify trends
Window size determines smoothing level
Essential for time-series analysis

Practical Application: Moving averages for stock prices, smoothing noisy data, trend analysis.

Final Project

Lesson 21: Complete Data Analysis Project

Duration: 30 minutes | Weight: 810

Apply everything you learned to a comprehensive real-world data analysis project.

Project: Analyze Global Happiness Trends (2015-2017)

Tasks:

Part 1: Loading and Combining

Load multiple years of happiness data
Add year identifiers
Combine into single DataFrame
Handle encoding issues

Part 2: Data Cleaning

Standardize column names
Handle missing values
Remove duplicates
Verify data types

Part 3: Analysis Questions

Which region had highest happiness each year?
Did global happiness increase or decrease?
Which countries improved the most?
What factors correlate with happiness?
Count countries by region and year

Part 4: Export Results

Create summary DataFrames
Export to CSV for further use
Save cleaned dataset

Skills Applied: All skills from Lessons 1-20 combined in a realistic workflow.

Learning Progression

This module follows a carefully designed progression:

Foundation → Understand pandas data structures

Selection → Access exactly the data you need

Filtering → Extract subsets based on conditions

Transformation → Calculate and modify data

Cleaning → Prepare messy data for analysis

Aggregation → Summarize by groups

Combination → Merge multiple datasets

Advanced → Handle complex scenarios

Application → Complete real projects

Each lesson builds on previous ones, ensuring you master concepts before advancing.

Real Datasets Throughout

You will work with actual datasets used in professional data analysis:

Fortune 500 Companies: Revenue, profits, employees, sectors
World Happiness Reports: Multi-year international data
Laptop Specifications: Messy real-world e-commerce data
Time Series Data: Sales, stock prices, trends

These datasets contain real challenges: missing values, inconsistent formatting, encoding issues, duplicates, and outliers. You will learn to handle them all.

Get Started

Ready to begin your pandas journey? Start with Lesson 1 to build your foundation.

Start Lesson 1 - Introduction to Pandas and Series

Learn what pandas is and master the Series data structure

Back to Module Home

Return to the Pandas Data Analysis module introduction

Your Path to Data Mastery

This module transforms you from someone who understands arrays to someone who can clean, analyze, and derive insights from real-world datasets. Pandas is the most important Python library for working with tabular data.

The skills you learn here will serve you every day as a data professional.

Complete these 21 lessons, and you will have the data manipulation expertise that employers demand and projects require. Let’s begin!

Lesson 6 - Modifying Data and Assignment

Lesson 1 - Introduction to Pandas and Series

Courses

DATATWEETS

Title here

Pandas Data Analysis Module Overview

Your Complete Pandas Learning Path

Module Structure

Section 1: Foundation Skills

Lesson 1: Introduction to Pandas and Series

Lesson 2: DataFrames and Reading Data

Lesson 3: Selecting Data with .loc[]

Lesson 4: Selecting Data with .iloc[]

Lesson 5: Series Operations and Value Counts

Lesson 6: DateTime Fundamentals

Section 2: Filtering and Transformation

Lesson 7: Boolean Filtering in Pandas

Lesson 8: Sorting and Ranking

Lesson 9: Adding and Modifying Columns

Lesson 10: Apply, Map, and Transform Functions

Section 3: Data Cleaning

Lesson 11: Handling Missing Data

Lesson 12: Data Type Conversion and Cleaning

Lesson 13: Removing Duplicates and Handling Outliers

Section 4: String Operations

Lesson 14: Working with String Data

Section 5: Aggregation and Combining

Lesson 15: GroupBy and Aggregation

Lesson 16: Pivot Tables and Reshaping

Lesson 17: Concatenating DataFrames

Lesson 18: Merging and Joining DataFrames

Section 6: Advanced Topics

Lesson 19: MultiIndex and Hierarchical Data

Lesson 20: Window Functions and Rolling Calculations

Final Project

Lesson 21: Complete Data Analysis Project

Learning Progression

Real Datasets Throughout

Get Started

Start Lesson 1 - Introduction to Pandas and Series

Back to Module Home

Your Path to Data Mastery

Pandas Data Analysis Module Overview

Your Complete Pandas Learning Path#

Module Structure#

Section 1: Foundation Skills#

Lesson 1: Introduction to Pandas and Series#

Lesson 2: DataFrames and Reading Data#

Lesson 3: Selecting Data with .loc[]#

Lesson 4: Selecting Data with .iloc[]#

Lesson 5: Series Operations and Value Counts#

Lesson 6: DateTime Fundamentals#

Section 2: Filtering and Transformation#

Lesson 7: Boolean Filtering in Pandas#

Lesson 8: Sorting and Ranking#

Lesson 9: Adding and Modifying Columns#

Lesson 10: Apply, Map, and Transform Functions#

Section 3: Data Cleaning#

Lesson 11: Handling Missing Data#

Lesson 12: Data Type Conversion and Cleaning#

Lesson 13: Removing Duplicates and Handling Outliers#

Section 4: String Operations#

Lesson 14: Working with String Data#

Section 5: Aggregation and Combining#

Lesson 15: GroupBy and Aggregation#

Lesson 16: Pivot Tables and Reshaping#

Lesson 17: Concatenating DataFrames#

Lesson 18: Merging and Joining DataFrames#

Section 6: Advanced Topics#

Lesson 19: MultiIndex and Hierarchical Data#

Lesson 20: Window Functions and Rolling Calculations#

Final Project#

Lesson 21: Complete Data Analysis Project#

Learning Progression#

Real Datasets Throughout#

Get Started#

Start Lesson 1 - Introduction to Pandas and Series

Back to Module Home

Your Path to Data Mastery#

Your Complete Pandas Learning Path

Module Structure

Section 1: Foundation Skills

Lesson 1: Introduction to Pandas and Series

Lesson 2: DataFrames and Reading Data

Lesson 3: Selecting Data with .loc[]

Lesson 4: Selecting Data with .iloc[]

Lesson 5: Series Operations and Value Counts

Lesson 6: DateTime Fundamentals

Section 2: Filtering and Transformation

Lesson 7: Boolean Filtering in Pandas

Lesson 8: Sorting and Ranking

Lesson 9: Adding and Modifying Columns

Lesson 10: Apply, Map, and Transform Functions

Section 3: Data Cleaning

Lesson 11: Handling Missing Data

Lesson 12: Data Type Conversion and Cleaning

Lesson 13: Removing Duplicates and Handling Outliers

Section 4: String Operations

Lesson 14: Working with String Data

Section 5: Aggregation and Combining

Lesson 15: GroupBy and Aggregation

Lesson 16: Pivot Tables and Reshaping

Lesson 17: Concatenating DataFrames

Lesson 18: Merging and Joining DataFrames

Section 6: Advanced Topics

Lesson 19: MultiIndex and Hierarchical Data

Lesson 20: Window Functions and Rolling Calculations

Final Project

Lesson 21: Complete Data Analysis Project

Learning Progression

Real Datasets Throughout

Get Started

Your Path to Data Mastery