Data Visualization with Python - Course Overview
On this page
- Course Overview
- What You Will Learn
- Dataset Documentation
- Lesson Structure
- Detailed Lesson Descriptions
- Lesson 1: Understanding Graphs and Coordinates
- Lesson 2: Introduction to Matplotlib
- Lesson 3: Customizing Plots
- Lesson 4: Multiple Lines and Series
- Lesson 5: Scatter Plots Basics
- Lesson 6: Customizing Scatter Plots
- Lesson 7: Correlation and Trendlines
- Lesson 8: Scatter Matrix
- Lesson 9: Bar Plots
- Lesson 10: Grouped and Stacked Bars
- Lesson 11: Histograms
- Lesson 12: Distribution Comparison
- Lesson 13: Pandas Plotting
- Lesson 14: Subplot Grids
- Lesson 15: Figure Sizing and Layout
- Lesson 16: Final Project - Traffic Analysis
- Learning Approach
- Prerequisites Review
- Next Steps
- Transform Data into Visual Stories
Course Overview
This module teaches you to create professional data visualizations using Matplotlib and Pandas plotting. You will learn to transform raw data into clear, informative charts that reveal patterns and communicate insights effectively.
Lessons: 16 Difficulty: Beginner to Intermediate Prerequisites: Python basics, NumPy, Pandas
Datasets Required:
- day.csv - Capital Bikeshare daily data (56 KB)
- hour.csv - Capital Bikeshare hourly data (1.1 MB)
- i94_traffic.csv - I-94 highway traffic data (3.1 MB)
What You Will Learn
This comprehensive course covers all essential visualization techniques:
Module 1: Line Plots and Time Series
- Graph anatomy and coordinate systems
- Creating line plots with pyplot
- Customizing plot appearance
- Plotting multiple series for comparison
- Time series visualization
Module 2: Scatter Plots and Correlation
- Creating scatter plots
- Customizing markers and colors
- Calculating correlation coefficients
- Adding trendlines and regression
- Creating scatter plot matrices
Module 3: Bar Plots and Histograms
- Creating bar charts for categorical data
- Grouped and stacked bar plots
- Histogram fundamentals
- Comparing distributions
- Choosing appropriate bin sizes
Module 4: Pandas Plotting and Grid Charts
- Using Pandas plotting methods
- Creating subplot grids
- Controlling figure size and layout
- Publication-quality formatting
Module 5: Final Project
- Real-world traffic analysis project
- Applying all visualization techniques
- Professional report creation
Dataset Documentation
This course uses three real-world datasets. Complete data dictionaries are provided below.
Dataset 1: Capital Bikeshare
Source: Capital Bikeshare, Washington DC Time Period: January 1, 2011 - December 31, 2012 (2 years) Records: 731 daily records Description: Bike rental data combined with weather and seasonal information
Files:
day.csv- Daily aggregated data (731 rows) - Downloadhour.csv- Hourly data (17,379 rows) - Download
Data Dictionary:
| Column | Type | Description | Values/Range |
|---|---|---|---|
instant | int | Record index | 1 to 731 |
dteday | date | Date | 2011-01-01 to 2012-12-31 |
season | int | Season | 1=Spring, 2=Summer, 3=Fall, 4=Winter |
yr | int | Year | 0=2011, 1=2012 |
mnth | int | Month | 1 to 12 |
holiday | int | Is holiday? | 0=No, 1=Yes |
weekday | int | Day of week | 0=Sunday to 6=Saturday |
workingday | int | Is working day? | 0=No, 1=Yes (not weekend/holiday) |
weathersit | int | Weather situation | 1=Clear, 2=Mist, 3=Light Snow/Rain, 4=Heavy Rain/Storm |
temp | float | Normalized temperature | 0.0 to 1.0 (actual: -8°C to +39°C) |
atemp | float | Normalized feels-like temp | 0.0 to 1.0 (actual: -16°C to +50°C) |
hum | float | Normalized humidity | 0.0 to 1.0 (actual: 0% to 100%) |
windspeed | float | Normalized wind speed | 0.0 to 1.0 (actual: 0 to 67 km/h) |
casual | int | Casual user rentals | Count of rentals |
registered | int | Registered user rentals | Count of rentals |
cnt | int | Total rentals | casual + registered |
Normalization Formulas:
- Temperature:
temp = (t - t_min) / (t_max - t_min)where t_min = -8°C, t_max = +39°C - Humidity:
hum = h / 100where h is percentage - Wind speed:
windspeed = w / 67where w is in km/h
Usage in Course:
- Lessons 2-4: Line plots and time series
- Lessons 6-8: Scatter plots and correlation
- Lessons 11-15: Distributions and Pandas plotting
Download:
# Load the dataset
import pandas as pd
day = pd.read_csv('day.csv')Dataset Source
This dataset is derived from Capital Bikeshare system data. Original data available from the UCI Machine Learning Repository.
Dataset 2: COVID-19 Monthly Data
Source: Simulated data for educational purposes Time Period: January - July 2020 (7 months) Records: 7 monthly observations Description: Monthly new COVID-19 cases and deaths for visualization practice
Data Dictionary:
| Column | Type | Description | Values/Range |
|---|---|---|---|
month | str | Month name | Jan, Feb, Mar, Apr, May, Jun, Jul |
new_cases | int | New cases in month | Count (thousands) |
new_deaths | int | New deaths in month | Count (thousands) |
Sample Data:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul']
new_cases = [2, 5, 15, 30, 25, 20, 18] # in thousands
new_deaths = [0.1, 0.3, 0.8, 2.0, 1.5, 1.0, 0.8] # in thousandsUsage in Course:
- Lessons 1-2: Introduction to graphs and basic line plots
- Simple arrays for learning fundamentals
Note: This is simplified data for teaching visualization concepts. Not actual COVID-19 statistics.
Dataset 3: I-94 Highway Traffic Data
Source: I-94 westbound traffic monitoring, Minneapolis-St Paul, Minnesota Time Period: October 2012 - September 2018 (6 years) Records: 48,205 hourly measurements Description: Highway traffic volume combined with weather conditions
File: i94_traffic.csv - Download
Data Dictionary:
| Column | Type | Description | Values/Range |
|---|---|---|---|
holiday | str | US National holidays + regional | “None”, “New Years Day”, etc. |
temp | float | Average temperature | Kelvin (243.5 to 310.07) |
rain_1h | float | Rain in last hour | mm (0 if no rain, max 55.63) |
snow_1h | float | Snow in last hour | mm (0 if no snow) |
clouds_all | int | Cloud coverage | 0 to 100 (percentage) |
weather_main | str | Weather condition | Clear, Clouds, Rain, Snow, Mist, etc. |
weather_description | str | Detailed weather | light rain, few clouds, etc. |
date_time | datetime | Date and hour | 2012-10-02 09:00:00 to 2018-09-30 23:00:00 |
traffic_volume | int | Hourly I-94 westbound traffic | 0 to 7,280 vehicles |
Traffic Volume Patterns:
- Average: ~3,260 vehicles/hour
- Peak: ~7,280 vehicles/hour (rush hours)
- Low: 0-500 vehicles/hour (late night/early morning)
- Typical rush hours: 7-9 AM, 4-6 PM on weekdays
Weather Conditions: The dataset includes various weather conditions:
- Clear: Clear sky conditions
- Clouds: Various cloud coverage (few, scattered, broken, overcast)
- Rain: Light rain, moderate rain, heavy rain
- Snow: Light snow, snow
- Mist/Fog: Reduced visibility conditions
- Drizzle: Light precipitation
- Thunderstorm: Severe weather events
Usage in Course:
- Lesson 16: Final project - comprehensive traffic analysis
- Apply all visualization techniques learned
- Analyze weather impact on traffic patterns
- Explore temporal trends and rush hour patterns
Download:
# Load the dataset
import pandas as pd
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])Dataset Source
This dataset combines I-94 highway traffic volume data with weather observations from the Minneapolis-St Paul area, providing rich data for analyzing how weather affects traffic patterns.
Lesson Structure
Each lesson follows a consistent structure:
- Introduction - Context and motivation
- Learning Objectives - Clear, measurable goals
- Concepts - Theory with visual examples
- Code Examples - Practical implementations
- Practice Exercises - Hands-on problems
- Summary - Key concepts and syntax reference
- Navigation - Links to previous and next lessons
Detailed Lesson Descriptions
Lesson 1: Understanding Graphs and Coordinates
Weight: 910
Learn graph fundamentals before coding. Understand coordinate systems, axes, data points, and how to read different types of charts.
Key Topics:
- Graph anatomy (axes, title, labels, legend)
- Cartesian coordinate system
- Reading line plots, scatter plots, bar charts
- When to use each chart type
Learning Outcomes:
- Identify components of a graph
- Understand x and y coordinates
- Choose appropriate chart types for different data
Lesson 2: Introduction to Matplotlib
Weight: 910
Create your first plots using pyplot. Learn the visualization workflow and create simple line plots.
Key Topics:
- Installing and importing matplotlib
- Creating basic line plots with
plt.plot() - Displaying plots with
plt.show() - Understanding the state machine interface
Learning Outcomes:
- Import matplotlib.pyplot
- Create and display line plots
- Plot COVID-19 data
Lesson 3: Customizing Plots
Weight: 920
Transform basic plots into professional visualizations with titles, labels, colors, and styles.
Key Topics:
- Adding titles and axis labels
- Customizing colors and line styles
- Adding legends
- Gridlines and formatting
Learning Outcomes:
- Add descriptive titles and labels
- Choose effective colors
- Create publication-ready plots
Lesson 4: Multiple Lines and Series
Weight: 930
Plot multiple datasets on one graph for comparison and trend analysis.
Key Topics:
- Plotting multiple lines
- Distinguishing series with colors and styles
- Creating effective legends
- Comparing time series data
Learning Outcomes:
- Plot multiple series on one chart
- Compare trends visually
- Use bikeshare data for analysis
Lesson 5: Scatter Plots Basics
Weight: 940
Create scatter plots to visualize relationships between two continuous variables.
Key Topics:
- Creating scatter plots with
plt.scatter() - Understanding correlation from scatter plots
- When to use scatter vs line plots
- Basic marker customization
Learning Outcomes:
- Create scatter plots
- Identify correlations visually
- Explore temperature vs rentals relationship
Lesson 6: Customizing Scatter Plots
Weight: 950
Control marker appearance, size, color, and transparency for advanced scatter plots.
Key Topics:
- Marker sizes based on data values
- Color mapping with colormaps
- Adding colorbars
- Transparency with alpha
Learning Outcomes:
- Create multi-dimensional scatter plots
- Use color to encode additional variables
- Add colorbars for scale reference
Lesson 7: Correlation and Trendlines
Weight: 960
Calculate correlation coefficients and add regression lines to visualize trends.
Key Topics:
- Calculating correlation with NumPy
- Understanding correlation coefficients
- Adding trendlines with
numpy.polyfit() - Interpreting linear relationships
Learning Outcomes:
- Calculate correlation values
- Add trendlines to scatter plots
- Interpret strength and direction of relationships
Lesson 8: Scatter Matrix
Weight: 970
Create pairwise scatter plots to explore relationships among multiple variables simultaneously.
Key Topics:
- Creating scatter matrices with pandas
- Using
scatter_matrix()function - Interpreting pairwise relationships
- Identifying multivariate patterns
Learning Outcomes:
- Create scatter plot matrices
- Explore multiple variable relationships
- Identify interesting correlations
Lesson 9: Bar Plots
Weight: 980
Create vertical and horizontal bar charts for categorical data comparison.
Key Topics:
- Creating bar plots with
plt.bar()andplt.barh() - Customizing bar colors and width
- Adding value labels
- Sorting categories effectively
Learning Outcomes:
- Create vertical and horizontal bar charts
- Compare categorical data visually
- Customize bar appearance
Lesson 10: Grouped and Stacked Bars
Weight: 990
Compare multiple categories with grouped and stacked bar charts.
Key Topics:
- Creating grouped bar plots
- Creating stacked bar plots
- Positioning bars correctly
- Choosing between grouped and stacked
Learning Outcomes:
- Create grouped bar charts
- Create stacked bar charts
- Compare multiple categories simultaneously
Lesson 11: Histograms
Weight: 1000
Visualize data distributions with histograms and understand binning concepts.
Key Topics:
- Creating histograms with
plt.hist() - Understanding bins and bin width
- Choosing appropriate bin counts
- Interpreting distribution shapes
Learning Outcomes:
- Create histograms
- Choose effective bin sizes
- Identify distribution characteristics
Lesson 12: Distribution Comparison
Weight: 1010
Compare multiple distributions using overlapping and side-by-side histograms.
Key Topics:
- Overlaying multiple histograms
- Using transparency for overlap
- Creating side-by-side histograms
- Comparing distribution shapes
Learning Outcomes:
- Compare multiple distributions
- Choose appropriate comparison methods
- Analyze distribution differences
Lesson 13: Pandas Plotting
Weight: 1020
Create visualizations directly from DataFrames using Pandas plotting methods.
Key Topics:
- Using
.plot()method - Different plot types: line, bar, scatter, hist
- Plotting from grouped data
- Quick exploratory visualizations
Learning Outcomes:
- Use Pandas plotting methods
- Create quick visualizations
- Plot directly from DataFrames
Lesson 14: Subplot Grids
Weight: 1030
Create multi-panel figures with plt.subplots() for comprehensive analysis.
Key Topics:
- Creating subplot grids
- Accessing individual subplots
- Sharing axes across subplots
- Creating dashboard-style figures
Learning Outcomes:
- Create multi-panel figures
- Organize multiple plots effectively
- Build comprehensive dashboards
Lesson 15: Figure Sizing and Layout
Weight: 1040
Control figure dimensions, aspect ratios, and spacing for publication-quality output.
Key Topics:
- Setting figure size with
figsize - Controlling aspect ratios
- Adjusting subplot spacing
- Saving figures with
plt.savefig()
Learning Outcomes:
- Control figure dimensions
- Optimize layout for different outputs
- Save publication-quality figures
Lesson 16: Final Project - Traffic Analysis
Weight: 1050
Apply all visualization techniques to analyze I-94 traffic patterns in a comprehensive project.
Key Topics:
- Loading and exploring traffic data
- Time series analysis of congestion
- Weather impact on traffic
- Creating comprehensive reports
- Combining multiple visualization types
Learning Outcomes:
- Complete end-to-end analysis project
- Apply all learned techniques
- Create professional analysis report
- Draw insights from real-world data
Project Deliverables:
- Traffic pattern analysis over time
- Weather vs congestion correlation
- Hourly and daily trend visualization
- Multi-panel comprehensive dashboard
Learning Approach
Best Practices for This Course
- Follow lesson order - Each lesson builds on previous concepts
- Type code yourself - Don’t copy-paste; build muscle memory
- Complete practice exercises - Essential for retention
- Experiment with data - Try different parameters and datasets
- Create your own visualizations - Apply to your own data
Time Commitment
- Minimum: Complete all 16 lessons (4 hours)
- Recommended: Add 2-4 hours for practice exercises
- Mastery: Additional 4-6 hours experimenting with own data
Support Resources
- Matplotlib documentation: https://matplotlib.org/
- Pandas plotting documentation: https://pandas.pydata.org/courses/user_guide/visualization.html
- Course datasets provided in each lesson
Prerequisites Review
Required Knowledge
Python Basics:
- Variables and data types
- Lists and dictionaries
- Functions and control flow
- Importing libraries
NumPy Fundamentals:
- Creating arrays
- Array operations
- Basic statistical functions
Pandas Fundamentals:
- Creating and loading DataFrames
- Selecting data with .loc[] and .iloc[]
- Basic data manipulation
Installation Requirements
# Install required libraries
pip install matplotlib pandas numpyVerify installation:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
print(f"Matplotlib version: {matplotlib.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")Next Steps
Ready to start visualizing data? Begin with Lesson 1 to understand graph fundamentals, then progress through the lessons systematically.
Start with Lesson 1
Understanding Graphs and Coordinates
Review Pandas Module
Refresh data manipulation skills
Transform Data into Visual Stories
Data visualization is a critical skill for data scientists, analysts, and researchers. Master these techniques to communicate insights effectively and make data-driven decisions.
Start creating compelling visualizations today!