Data Visualization with Python - Course Overview

Course Overview

This module teaches you to create professional data visualizations using Matplotlib and Pandas plotting. You will learn to transform raw data into clear, informative charts that reveal patterns and communicate insights effectively.

Lessons: 16 Difficulty: Beginner to Intermediate Prerequisites: Python basics, NumPy, Pandas

Datasets Required:


What You Will Learn

This comprehensive course covers all essential visualization techniques:

Module 1: Line Plots and Time Series

  • Graph anatomy and coordinate systems
  • Creating line plots with pyplot
  • Customizing plot appearance
  • Plotting multiple series for comparison
  • Time series visualization

Module 2: Scatter Plots and Correlation

  • Creating scatter plots
  • Customizing markers and colors
  • Calculating correlation coefficients
  • Adding trendlines and regression
  • Creating scatter plot matrices

Module 3: Bar Plots and Histograms

  • Creating bar charts for categorical data
  • Grouped and stacked bar plots
  • Histogram fundamentals
  • Comparing distributions
  • Choosing appropriate bin sizes

Module 4: Pandas Plotting and Grid Charts

  • Using Pandas plotting methods
  • Creating subplot grids
  • Controlling figure size and layout
  • Publication-quality formatting

Module 5: Final Project

  • Real-world traffic analysis project
  • Applying all visualization techniques
  • Professional report creation

Dataset Documentation

This course uses three real-world datasets. Complete data dictionaries are provided below.

Dataset 1: Capital Bikeshare

Source: Capital Bikeshare, Washington DC Time Period: January 1, 2011 - December 31, 2012 (2 years) Records: 731 daily records Description: Bike rental data combined with weather and seasonal information

Files:

  • day.csv - Daily aggregated data (731 rows) - Download
  • hour.csv - Hourly data (17,379 rows) - Download

Data Dictionary:

ColumnTypeDescriptionValues/Range
instantintRecord index1 to 731
dtedaydateDate2011-01-01 to 2012-12-31
seasonintSeason1=Spring, 2=Summer, 3=Fall, 4=Winter
yrintYear0=2011, 1=2012
mnthintMonth1 to 12
holidayintIs holiday?0=No, 1=Yes
weekdayintDay of week0=Sunday to 6=Saturday
workingdayintIs working day?0=No, 1=Yes (not weekend/holiday)
weathersitintWeather situation1=Clear, 2=Mist, 3=Light Snow/Rain, 4=Heavy Rain/Storm
tempfloatNormalized temperature0.0 to 1.0 (actual: -8°C to +39°C)
atempfloatNormalized feels-like temp0.0 to 1.0 (actual: -16°C to +50°C)
humfloatNormalized humidity0.0 to 1.0 (actual: 0% to 100%)
windspeedfloatNormalized wind speed0.0 to 1.0 (actual: 0 to 67 km/h)
casualintCasual user rentalsCount of rentals
registeredintRegistered user rentalsCount of rentals
cntintTotal rentalscasual + registered

Normalization Formulas:

  • Temperature: temp = (t - t_min) / (t_max - t_min) where t_min = -8°C, t_max = +39°C
  • Humidity: hum = h / 100 where h is percentage
  • Wind speed: windspeed = w / 67 where w is in km/h

Usage in Course:

  • Lessons 2-4: Line plots and time series
  • Lessons 6-8: Scatter plots and correlation
  • Lessons 11-15: Distributions and Pandas plotting

Download:

# Load the dataset
import pandas as pd
day = pd.read_csv('day.csv')

Dataset Source

This dataset is derived from Capital Bikeshare system data. Original data available from the UCI Machine Learning Repository.


Dataset 2: COVID-19 Monthly Data

Source: Simulated data for educational purposes Time Period: January - July 2020 (7 months) Records: 7 monthly observations Description: Monthly new COVID-19 cases and deaths for visualization practice

Data Dictionary:

ColumnTypeDescriptionValues/Range
monthstrMonth nameJan, Feb, Mar, Apr, May, Jun, Jul
new_casesintNew cases in monthCount (thousands)
new_deathsintNew deaths in monthCount (thousands)

Sample Data:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul']
new_cases = [2, 5, 15, 30, 25, 20, 18]  # in thousands
new_deaths = [0.1, 0.3, 0.8, 2.0, 1.5, 1.0, 0.8]  # in thousands

Usage in Course:

  • Lessons 1-2: Introduction to graphs and basic line plots
  • Simple arrays for learning fundamentals

Note: This is simplified data for teaching visualization concepts. Not actual COVID-19 statistics.


Dataset 3: I-94 Highway Traffic Data

Source: I-94 westbound traffic monitoring, Minneapolis-St Paul, Minnesota Time Period: October 2012 - September 2018 (6 years) Records: 48,205 hourly measurements Description: Highway traffic volume combined with weather conditions

File: i94_traffic.csv - Download

Data Dictionary:

ColumnTypeDescriptionValues/Range
holidaystrUS National holidays + regional“None”, “New Years Day”, etc.
tempfloatAverage temperatureKelvin (243.5 to 310.07)
rain_1hfloatRain in last hourmm (0 if no rain, max 55.63)
snow_1hfloatSnow in last hourmm (0 if no snow)
clouds_allintCloud coverage0 to 100 (percentage)
weather_mainstrWeather conditionClear, Clouds, Rain, Snow, Mist, etc.
weather_descriptionstrDetailed weatherlight rain, few clouds, etc.
date_timedatetimeDate and hour2012-10-02 09:00:00 to 2018-09-30 23:00:00
traffic_volumeintHourly I-94 westbound traffic0 to 7,280 vehicles

Traffic Volume Patterns:

  • Average: ~3,260 vehicles/hour
  • Peak: ~7,280 vehicles/hour (rush hours)
  • Low: 0-500 vehicles/hour (late night/early morning)
  • Typical rush hours: 7-9 AM, 4-6 PM on weekdays

Weather Conditions: The dataset includes various weather conditions:

  • Clear: Clear sky conditions
  • Clouds: Various cloud coverage (few, scattered, broken, overcast)
  • Rain: Light rain, moderate rain, heavy rain
  • Snow: Light snow, snow
  • Mist/Fog: Reduced visibility conditions
  • Drizzle: Light precipitation
  • Thunderstorm: Severe weather events

Usage in Course:

  • Lesson 16: Final project - comprehensive traffic analysis
  • Apply all visualization techniques learned
  • Analyze weather impact on traffic patterns
  • Explore temporal trends and rush hour patterns

Download:

# Load the dataset
import pandas as pd
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])

Dataset Source

This dataset combines I-94 highway traffic volume data with weather observations from the Minneapolis-St Paul area, providing rich data for analyzing how weather affects traffic patterns.


Lesson Structure

Each lesson follows a consistent structure:

  1. Introduction - Context and motivation
  2. Learning Objectives - Clear, measurable goals
  3. Concepts - Theory with visual examples
  4. Code Examples - Practical implementations
  5. Practice Exercises - Hands-on problems
  6. Summary - Key concepts and syntax reference
  7. Navigation - Links to previous and next lessons

Detailed Lesson Descriptions

Lesson 1: Understanding Graphs and Coordinates

Weight: 910

Learn graph fundamentals before coding. Understand coordinate systems, axes, data points, and how to read different types of charts.

Key Topics:

  • Graph anatomy (axes, title, labels, legend)
  • Cartesian coordinate system
  • Reading line plots, scatter plots, bar charts
  • When to use each chart type

Learning Outcomes:

  • Identify components of a graph
  • Understand x and y coordinates
  • Choose appropriate chart types for different data

Lesson 2: Introduction to Matplotlib

Weight: 910

Create your first plots using pyplot. Learn the visualization workflow and create simple line plots.

Key Topics:

  • Installing and importing matplotlib
  • Creating basic line plots with plt.plot()
  • Displaying plots with plt.show()
  • Understanding the state machine interface

Learning Outcomes:

  • Import matplotlib.pyplot
  • Create and display line plots
  • Plot COVID-19 data

Lesson 3: Customizing Plots

Weight: 920

Transform basic plots into professional visualizations with titles, labels, colors, and styles.

Key Topics:

  • Adding titles and axis labels
  • Customizing colors and line styles
  • Adding legends
  • Gridlines and formatting

Learning Outcomes:

  • Add descriptive titles and labels
  • Choose effective colors
  • Create publication-ready plots

Lesson 4: Multiple Lines and Series

Weight: 930

Plot multiple datasets on one graph for comparison and trend analysis.

Key Topics:

  • Plotting multiple lines
  • Distinguishing series with colors and styles
  • Creating effective legends
  • Comparing time series data

Learning Outcomes:

  • Plot multiple series on one chart
  • Compare trends visually
  • Use bikeshare data for analysis

Lesson 5: Scatter Plots Basics

Weight: 940

Create scatter plots to visualize relationships between two continuous variables.

Key Topics:

  • Creating scatter plots with plt.scatter()
  • Understanding correlation from scatter plots
  • When to use scatter vs line plots
  • Basic marker customization

Learning Outcomes:

  • Create scatter plots
  • Identify correlations visually
  • Explore temperature vs rentals relationship

Lesson 6: Customizing Scatter Plots

Weight: 950

Control marker appearance, size, color, and transparency for advanced scatter plots.

Key Topics:

  • Marker sizes based on data values
  • Color mapping with colormaps
  • Adding colorbars
  • Transparency with alpha

Learning Outcomes:

  • Create multi-dimensional scatter plots
  • Use color to encode additional variables
  • Add colorbars for scale reference

Lesson 7: Correlation and Trendlines

Weight: 960

Calculate correlation coefficients and add regression lines to visualize trends.

Key Topics:

  • Calculating correlation with NumPy
  • Understanding correlation coefficients
  • Adding trendlines with numpy.polyfit()
  • Interpreting linear relationships

Learning Outcomes:

  • Calculate correlation values
  • Add trendlines to scatter plots
  • Interpret strength and direction of relationships

Lesson 8: Scatter Matrix

Weight: 970

Create pairwise scatter plots to explore relationships among multiple variables simultaneously.

Key Topics:

  • Creating scatter matrices with pandas
  • Using scatter_matrix() function
  • Interpreting pairwise relationships
  • Identifying multivariate patterns

Learning Outcomes:

  • Create scatter plot matrices
  • Explore multiple variable relationships
  • Identify interesting correlations

Lesson 9: Bar Plots

Weight: 980

Create vertical and horizontal bar charts for categorical data comparison.

Key Topics:

  • Creating bar plots with plt.bar() and plt.barh()
  • Customizing bar colors and width
  • Adding value labels
  • Sorting categories effectively

Learning Outcomes:

  • Create vertical and horizontal bar charts
  • Compare categorical data visually
  • Customize bar appearance

Lesson 10: Grouped and Stacked Bars

Weight: 990

Compare multiple categories with grouped and stacked bar charts.

Key Topics:

  • Creating grouped bar plots
  • Creating stacked bar plots
  • Positioning bars correctly
  • Choosing between grouped and stacked

Learning Outcomes:

  • Create grouped bar charts
  • Create stacked bar charts
  • Compare multiple categories simultaneously

Lesson 11: Histograms

Weight: 1000

Visualize data distributions with histograms and understand binning concepts.

Key Topics:

  • Creating histograms with plt.hist()
  • Understanding bins and bin width
  • Choosing appropriate bin counts
  • Interpreting distribution shapes

Learning Outcomes:

  • Create histograms
  • Choose effective bin sizes
  • Identify distribution characteristics

Lesson 12: Distribution Comparison

Weight: 1010

Compare multiple distributions using overlapping and side-by-side histograms.

Key Topics:

  • Overlaying multiple histograms
  • Using transparency for overlap
  • Creating side-by-side histograms
  • Comparing distribution shapes

Learning Outcomes:

  • Compare multiple distributions
  • Choose appropriate comparison methods
  • Analyze distribution differences

Lesson 13: Pandas Plotting

Weight: 1020

Create visualizations directly from DataFrames using Pandas plotting methods.

Key Topics:

  • Using .plot() method
  • Different plot types: line, bar, scatter, hist
  • Plotting from grouped data
  • Quick exploratory visualizations

Learning Outcomes:

  • Use Pandas plotting methods
  • Create quick visualizations
  • Plot directly from DataFrames

Lesson 14: Subplot Grids

Weight: 1030

Create multi-panel figures with plt.subplots() for comprehensive analysis.

Key Topics:

  • Creating subplot grids
  • Accessing individual subplots
  • Sharing axes across subplots
  • Creating dashboard-style figures

Learning Outcomes:

  • Create multi-panel figures
  • Organize multiple plots effectively
  • Build comprehensive dashboards

Lesson 15: Figure Sizing and Layout

Weight: 1040

Control figure dimensions, aspect ratios, and spacing for publication-quality output.

Key Topics:

  • Setting figure size with figsize
  • Controlling aspect ratios
  • Adjusting subplot spacing
  • Saving figures with plt.savefig()

Learning Outcomes:

  • Control figure dimensions
  • Optimize layout for different outputs
  • Save publication-quality figures

Lesson 16: Final Project - Traffic Analysis

Weight: 1050

Apply all visualization techniques to analyze I-94 traffic patterns in a comprehensive project.

Key Topics:

  • Loading and exploring traffic data
  • Time series analysis of congestion
  • Weather impact on traffic
  • Creating comprehensive reports
  • Combining multiple visualization types

Learning Outcomes:

  • Complete end-to-end analysis project
  • Apply all learned techniques
  • Create professional analysis report
  • Draw insights from real-world data

Project Deliverables:

  • Traffic pattern analysis over time
  • Weather vs congestion correlation
  • Hourly and daily trend visualization
  • Multi-panel comprehensive dashboard

Learning Approach

Best Practices for This Course

  1. Follow lesson order - Each lesson builds on previous concepts
  2. Type code yourself - Don’t copy-paste; build muscle memory
  3. Complete practice exercises - Essential for retention
  4. Experiment with data - Try different parameters and datasets
  5. Create your own visualizations - Apply to your own data

Time Commitment

  • Minimum: Complete all 16 lessons (4 hours)
  • Recommended: Add 2-4 hours for practice exercises
  • Mastery: Additional 4-6 hours experimenting with own data

Support Resources

  • Matplotlib documentation: https://matplotlib.org/
  • Pandas plotting documentation: https://pandas.pydata.org/courses/user_guide/visualization.html
  • Course datasets provided in each lesson

Prerequisites Review

Required Knowledge

Python Basics:

  • Variables and data types
  • Lists and dictionaries
  • Functions and control flow
  • Importing libraries

NumPy Fundamentals:

  • Creating arrays
  • Array operations
  • Basic statistical functions

Pandas Fundamentals:

  • Creating and loading DataFrames
  • Selecting data with .loc[] and .iloc[]
  • Basic data manipulation

Installation Requirements

# Install required libraries
pip install matplotlib pandas numpy

Verify installation:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

print(f"Matplotlib version: {matplotlib.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Next Steps

Ready to start visualizing data? Begin with Lesson 1 to understand graph fundamentals, then progress through the lessons systematically.

Start with Lesson 1

Understanding Graphs and Coordinates

Review Pandas Module

Refresh data manipulation skills


Transform Data into Visual Stories

Data visualization is a critical skill for data scientists, analysts, and researchers. Master these techniques to communicate insights effectively and make data-driven decisions.

Start creating compelling visualizations today!