Lesson 2 - Introduction to Matplotlib
Creating Your First Plot
You understand how graphs work. Now you will learn Matplotlib—Python’s foundational visualization library for creating professional charts and graphs.
By the end of this lesson, you will be able to:
- Import Matplotlib using the standard convention
- Create basic line plots with
plt.plot() - Display plots in Jupyter with
plt.show() - Understand how data lists become visual coordinates
- Read scientific notation on axes (1e6 = 1,000,000)
- Create line plots from lists, arrays, and Pandas Series
This lesson transforms your data into visual stories with just a few lines of code.
What is Matplotlib?
The Standard Plotting Library
Matplotlib is Python’s foundational visualization library, created in 2003.
Think of it as:
- The “Excel charts” of Python
- A digital canvas for creating graphs
- The tool underlying many other plotting libraries
Why Matplotlib?
Industry Standard:
- Used by millions of data scientists worldwide
- Powers other libraries (Seaborn, Pandas plotting)
- Employers expect you to know it
Powerful and Flexible:
- Creates publication-quality figures
- Full control over every element
- Can create any type of plot
Well-Documented:
- Extensive documentation and examples
- Large community support
- Abundant tutorials and Stack Overflow answers
The pyplot Module
We use matplotlib.pyplot - a collection of functions that make Matplotlib work like MATLAB:
- Simple function calls create plots
- Stateful interface (remembers current figure)
- Easy to get started, powerful when you need it
You will use pyplot for 95% of your visualization work.
Importing Matplotlib
Standard Import Convention
ALWAYS import Matplotlib this way:
import matplotlib.pyplot as pltBreaking It Down
import matplotlib.pyplot- Import the pyplot submodule from matplotlib package
- Not the entire matplotlib library (too large)
as plt- Create alias
plt(shorter, easier to type) - Universal convention - everyone uses
plt - Makes your code readable to other data scientists
- Create alias
Why This Convention Matters
Good (Standard):
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])Bad (Non-standard):
import matplotlib.pyplot as p # Don't use 'p'
import matplotlib.pyplot # Too long to type
from matplotlib.pyplot import plot # Pollutes namespaceFollowing conventions makes your code professional and collaborative.
Import Libraries
# Import Matplotlib (standard way)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Check versions
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")Your First Line Plot
The Simplest Plot: Two Lists
Let’s visualize COVID-19 new cases by month (January to July 2020, globally).
Step 1: Prepare the Data
# COVID-19 new cases by month (January to July 2020)
months = [1, 2, 3, 4, 5, 6, 7] # Month numbers
new_cases = [9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042] # New cases
print("Month Numbers:", months)
print("New Cases:", new_cases)
print(f"\nData points: {len(months)}")Step 2: Create the Plot
The Magic Command: plt.plot(x, y)
# Create line plot
plt.plot(months, new_cases)
plt.show()
# That's it! Just two lines of code for a graph!Understanding What Happened
Behind the scenes:
plt.plot(months, new_cases)creates the plot- First argument = x-coordinates (months)
- Second argument = y-coordinates (new_cases)
- Matplotlib connects the points with lines
plt.show()displays the plot- Shows the figure in Jupyter notebook
- Necessary in some environments (scripts, terminals)
- Not always needed in Jupyter, but good practice
The Data-to-Visual Translation
Each pair of values becomes a coordinate:
months[0]=1, new_cases[0]=9926 → Point (1, 9926)
months[1]=2, new_cases[1]=76246 → Point (2, 76246)
months[2]=3, new_cases[2]=681488 → Point (3, 681488)
...and so onMatplotlib automatically:
- Creates axes with appropriate scales
- Connects points with lines
- Adds tick marks
- Chooses reasonable limits
Key Insight: Two lists + two lines of code = A complete graph!
Understanding Scientific Notation
Reading the Y-Axis
Notice the y-axis label: 1e6, 2e6, 4e6, 6e6
What does this mean?
Scientific Notation Explained
Format: 1e6 = 1 × 10^6 = 1,000,000
Breaking it down:
emeans “times 10 to the power of”- Number after
eis the exponent e6= × 1,000,000 (six zeros)
Examples:
1e6 = 1 × 10^6 = 1,000,000 (1 million)
2e6 = 2 × 10^6 = 2,000,000 (2 million)
4e6 = 4 × 10^6 = 4,000,000 (4 million)
7e6 = 7 × 10^6 = 7,000,000 (7 million)
1e3 = 1 × 10^3 = 1,000 (1 thousand)
1e9 = 1 × 10^9 = 1,000,000,000 (1 billion)Why Use Scientific Notation?
Readability: Compare these:
Bad: 4000000 (hard to count zeros)
Good: 4e6 (immediately clear: 4 million)Space: Large numbers do not fit well on axes
0, 1000000, 2000000, 3000000 ← Crowded!
0, 1e6, 2e6, 3e6 ← Clean!You will see scientific notation in many plots with large values.
Verify Understanding
# Let's verify our understanding
print("Y-axis values in scientific notation:")
print("1e6 =", 1e6, "= 1,000,000")
print("2e6 =", 2e6, "= 2,000,000")
print("4e6 =", 4e6, "= 4,000,000")
print("6e6 =", 6e6, "= 6,000,000")
print("\nSo our peak (month 7) is about:")
print(f"{new_cases[-1]:,} new cases")
print(f"Or approximately {new_cases[-1]/1e6:.1f} million cases")Different Data Sources
Matplotlib Works with Multiple Data Types
You can plot from:
- Python lists (what we just did)
- NumPy arrays
- Pandas Series
All work the same way!
Example: NumPy Arrays
# Using NumPy arrays
months_array = np.array([1, 2, 3, 4, 5, 6, 7])
cases_array = np.array([9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042])
plt.plot(months_array, cases_array)
plt.show()
# Same result! Matplotlib handles NumPy arrays nativelyExample: Pandas Series
# Using Pandas Series
months_series = pd.Series([1, 2, 3, 4, 5, 6, 7])
cases_series = pd.Series([9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042])
plt.plot(months_series, cases_series)
plt.show()
# Also works perfectly!Working with DataFrames
When working with Pandas DataFrames (most common in real work):
# Create a DataFrame
covid_data = pd.DataFrame({
'month': [1, 2, 3, 4, 5, 6, 7],
'new_cases': [9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042]
})
print(covid_data)
print("\nNow we can plot columns:")
# Plot DataFrame columns
plt.plot(covid_data['month'], covid_data['new_cases'])
plt.show()
# Access columns, then plot - very common pattern!Key Insight: Whether your data is in lists, arrays, or Pandas columns, plt.plot() handles it all!
Practice Exercises
Apply your new skills with these exercises.
Exercise 1: COVID Deaths
Plot COVID-19 deaths by month (same time period):
# Deaths data (January-July 2020)
months = [1, 2, 3, 4, 5, 6, 7]
deaths = [213, 2465, 34822, 175893, 292801, 416086, 550164]
# YOUR CODE HERE:
# Create a line plot of months vs deathsHint
Use plt.plot(months, deaths) followed by plt.show()
Exercise 2: Temperature Trend
Create a line plot showing daily high temperature for a week:
# Temperature data
days = [1, 2, 3, 4, 5, 6, 7] # Monday to Sunday
temperature_high = [22, 24, 26, 25, 23, 21, 20] # Celsius
# YOUR CODE HERE:
# 1. Import matplotlib if not already done
# 2. Create line plot
# 3. Display with plt.show()Exercise 3: Sales Data
Plot monthly sales for a small business:
# Create a DataFrame with sales data
sales_data = pd.DataFrame({
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'revenue': [45000, 52000, 49000, 61000, 58000, 67000]
})
# For now, we'll use month numbers instead of names
month_numbers = [1, 2, 3, 4, 5, 6]
# YOUR CODE HERE:
# Plot month_numbers vs sales_data['revenue']
# Note: We'll learn how to use month names on x-axis in next lesson!Summary
You now create line plots with Matplotlib. Let’s review the key concepts.
Key Concepts
Importing Matplotlib
- Always use:
import matplotlib.pyplot as plt pltis the universal convention- Import pyplot submodule, not entire matplotlib
Creating Line Plots
plt.plot(x, y)creates the plotplt.show()displays it- First argument = x-coordinates
- Second argument = y-coordinates
Scientific Notation
1e6= 1,000,000 (1 million)emeans “× 10 to the power of”- Common for large numbers on axes
Data Sources
- Works with lists, NumPy arrays, Pandas Series
- Access DataFrame columns with
df['column'] - Same plot function for all data types
The Essential Pattern
import matplotlib.pyplot as plt
# Prepare data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
# Create and display plot
plt.plot(x, y)
plt.show()This is the foundation of all Matplotlib visualizations!
Common Questions
Q: Do I always need plt.show()?
A: In Jupyter notebooks, often not needed (plots appear automatically). But it is good practice and required in Python scripts.
Q: Can I plot without connecting the lines? A: Yes! We will learn about markers and scatter plots in Module 2.
Q: What if my x and y lists are different lengths? A: Matplotlib will throw an error. Lists must have equal length!
Q: Can I plot multiple lines on one graph? A: Yes! Coming up in Lesson 4.
Next Steps
Your plots work, but they are plain and hard to understand. In the next lesson, you will learn to add titles, labels, colors, and styles to create professional-looking graphs.
Continue to Lesson 3 - Customizing Plots
Add titles, labels, legends, colors, and styles for professional charts
Back to Lesson 1 - Understanding Graphs
Review graph anatomy and coordinate systems
Create Visualizations from Any Data
You can now create line plots from any numeric data. Next, we make them beautiful and informative.
Transform your data into visual insights with Matplotlib!