Lesson 2 - Introduction to Matplotlib

Creating Your First Plot

You understand how graphs work. Now you will learn Matplotlib—Python’s foundational visualization library for creating professional charts and graphs.

By the end of this lesson, you will be able to:

  • Import Matplotlib using the standard convention
  • Create basic line plots with plt.plot()
  • Display plots in Jupyter with plt.show()
  • Understand how data lists become visual coordinates
  • Read scientific notation on axes (1e6 = 1,000,000)
  • Create line plots from lists, arrays, and Pandas Series

This lesson transforms your data into visual stories with just a few lines of code.


What is Matplotlib?

The Standard Plotting Library

Matplotlib is Python’s foundational visualization library, created in 2003.

Think of it as:

  • The “Excel charts” of Python
  • A digital canvas for creating graphs
  • The tool underlying many other plotting libraries

Why Matplotlib?

Industry Standard:

  • Used by millions of data scientists worldwide
  • Powers other libraries (Seaborn, Pandas plotting)
  • Employers expect you to know it

Powerful and Flexible:

  • Creates publication-quality figures
  • Full control over every element
  • Can create any type of plot

Well-Documented:

  • Extensive documentation and examples
  • Large community support
  • Abundant tutorials and Stack Overflow answers

The pyplot Module

We use matplotlib.pyplot - a collection of functions that make Matplotlib work like MATLAB:

  • Simple function calls create plots
  • Stateful interface (remembers current figure)
  • Easy to get started, powerful when you need it

You will use pyplot for 95% of your visualization work.


Importing Matplotlib

Standard Import Convention

ALWAYS import Matplotlib this way:

import matplotlib.pyplot as plt

Breaking It Down

  1. import matplotlib.pyplot

    • Import the pyplot submodule from matplotlib package
    • Not the entire matplotlib library (too large)
  2. as plt

    • Create alias plt (shorter, easier to type)
    • Universal convention - everyone uses plt
    • Makes your code readable to other data scientists

Why This Convention Matters

Good (Standard):

import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])

Bad (Non-standard):

import matplotlib.pyplot as p  # Don't use 'p'
import matplotlib.pyplot       # Too long to type
from matplotlib.pyplot import plot  # Pollutes namespace

Following conventions makes your code professional and collaborative.

Import Libraries

# Import Matplotlib (standard way)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Check versions
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

Your First Line Plot

The Simplest Plot: Two Lists

Let’s visualize COVID-19 new cases by month (January to July 2020, globally).

Step 1: Prepare the Data

# COVID-19 new cases by month (January to July 2020)
months = [1, 2, 3, 4, 5, 6, 7]  # Month numbers
new_cases = [9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042]  # New cases

print("Month Numbers:", months)
print("New Cases:", new_cases)
print(f"\nData points: {len(months)}")

Step 2: Create the Plot

The Magic Command: plt.plot(x, y)

# Create line plot
plt.plot(months, new_cases)
plt.show()

# That's it! Just two lines of code for a graph!

Understanding What Happened

Behind the scenes:

  1. plt.plot(months, new_cases) creates the plot

    • First argument = x-coordinates (months)
    • Second argument = y-coordinates (new_cases)
    • Matplotlib connects the points with lines
  2. plt.show() displays the plot

    • Shows the figure in Jupyter notebook
    • Necessary in some environments (scripts, terminals)
    • Not always needed in Jupyter, but good practice

The Data-to-Visual Translation

Each pair of values becomes a coordinate:

months[0]=1, new_cases[0]=9926     → Point (1, 9926)
months[1]=2, new_cases[1]=76246    → Point (2, 76246)
months[2]=3, new_cases[2]=681488   → Point (3, 681488)
...and so on

Matplotlib automatically:

  • Creates axes with appropriate scales
  • Connects points with lines
  • Adds tick marks
  • Chooses reasonable limits

Key Insight: Two lists + two lines of code = A complete graph!


Understanding Scientific Notation

Reading the Y-Axis

Notice the y-axis label: 1e6, 2e6, 4e6, 6e6

What does this mean?

Scientific Notation Explained

Format: 1e6 = 1 × 10^6 = 1,000,000

Breaking it down:

  • e means “times 10 to the power of”
  • Number after e is the exponent
  • e6 = × 1,000,000 (six zeros)

Examples:

1e6 = 1 × 10^6 = 1,000,000 (1 million)
2e6 = 2 × 10^6 = 2,000,000 (2 million)
4e6 = 4 × 10^6 = 4,000,000 (4 million)
7e6 = 7 × 10^6 = 7,000,000 (7 million)

1e3 = 1 × 10^3 = 1,000 (1 thousand)
1e9 = 1 × 10^9 = 1,000,000,000 (1 billion)

Why Use Scientific Notation?

Readability: Compare these:

Bad:  4000000 (hard to count zeros)
Good: 4e6     (immediately clear: 4 million)

Space: Large numbers do not fit well on axes

0, 1000000, 2000000, 3000000  ← Crowded!
0,  1e6,    2e6,     3e6      ← Clean!

You will see scientific notation in many plots with large values.

Verify Understanding

# Let's verify our understanding
print("Y-axis values in scientific notation:")
print("1e6 =", 1e6, "= 1,000,000")
print("2e6 =", 2e6, "= 2,000,000")
print("4e6 =", 4e6, "= 4,000,000")
print("6e6 =", 6e6, "= 6,000,000")

print("\nSo our peak (month 7) is about:")
print(f"{new_cases[-1]:,} new cases")
print(f"Or approximately {new_cases[-1]/1e6:.1f} million cases")

Different Data Sources

Matplotlib Works with Multiple Data Types

You can plot from:

  1. Python lists (what we just did)
  2. NumPy arrays
  3. Pandas Series

All work the same way!

Example: NumPy Arrays

# Using NumPy arrays
months_array = np.array([1, 2, 3, 4, 5, 6, 7])
cases_array = np.array([9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042])

plt.plot(months_array, cases_array)
plt.show()

# Same result! Matplotlib handles NumPy arrays natively

Example: Pandas Series

# Using Pandas Series
months_series = pd.Series([1, 2, 3, 4, 5, 6, 7])
cases_series = pd.Series([9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042])

plt.plot(months_series, cases_series)
plt.show()

# Also works perfectly!

Working with DataFrames

When working with Pandas DataFrames (most common in real work):

# Create a DataFrame
covid_data = pd.DataFrame({
    'month': [1, 2, 3, 4, 5, 6, 7],
    'new_cases': [9926, 76246, 681488, 2336640, 2835147, 4226655, 6942042]
})

print(covid_data)
print("\nNow we can plot columns:")

# Plot DataFrame columns
plt.plot(covid_data['month'], covid_data['new_cases'])
plt.show()

# Access columns, then plot - very common pattern!

Key Insight: Whether your data is in lists, arrays, or Pandas columns, plt.plot() handles it all!


Practice Exercises

Apply your new skills with these exercises.

Exercise 1: COVID Deaths

Plot COVID-19 deaths by month (same time period):

# Deaths data (January-July 2020)
months = [1, 2, 3, 4, 5, 6, 7]
deaths = [213, 2465, 34822, 175893, 292801, 416086, 550164]

# YOUR CODE HERE:
# Create a line plot of months vs deaths

Hint

Use plt.plot(months, deaths) followed by plt.show()


Exercise 2: Temperature Trend

Create a line plot showing daily high temperature for a week:

# Temperature data
days = [1, 2, 3, 4, 5, 6, 7]  # Monday to Sunday
temperature_high = [22, 24, 26, 25, 23, 21, 20]  # Celsius

# YOUR CODE HERE:
# 1. Import matplotlib if not already done
# 2. Create line plot
# 3. Display with plt.show()

Exercise 3: Sales Data

Plot monthly sales for a small business:

# Create a DataFrame with sales data
sales_data = pd.DataFrame({
    'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
    'revenue': [45000, 52000, 49000, 61000, 58000, 67000]
})

# For now, we'll use month numbers instead of names
month_numbers = [1, 2, 3, 4, 5, 6]

# YOUR CODE HERE:
# Plot month_numbers vs sales_data['revenue']

# Note: We'll learn how to use month names on x-axis in next lesson!

Summary

You now create line plots with Matplotlib. Let’s review the key concepts.

Key Concepts

Importing Matplotlib

  • Always use: import matplotlib.pyplot as plt
  • plt is the universal convention
  • Import pyplot submodule, not entire matplotlib

Creating Line Plots

  • plt.plot(x, y) creates the plot
  • plt.show() displays it
  • First argument = x-coordinates
  • Second argument = y-coordinates

Scientific Notation

  • 1e6 = 1,000,000 (1 million)
  • e means “× 10 to the power of”
  • Common for large numbers on axes

Data Sources

  • Works with lists, NumPy arrays, Pandas Series
  • Access DataFrame columns with df['column']
  • Same plot function for all data types

The Essential Pattern

import matplotlib.pyplot as plt

# Prepare data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]

# Create and display plot
plt.plot(x, y)
plt.show()

This is the foundation of all Matplotlib visualizations!

Common Questions

Q: Do I always need plt.show()? A: In Jupyter notebooks, often not needed (plots appear automatically). But it is good practice and required in Python scripts.

Q: Can I plot without connecting the lines? A: Yes! We will learn about markers and scatter plots in Module 2.

Q: What if my x and y lists are different lengths? A: Matplotlib will throw an error. Lists must have equal length!

Q: Can I plot multiple lines on one graph? A: Yes! Coming up in Lesson 4.


Next Steps

Your plots work, but they are plain and hard to understand. In the next lesson, you will learn to add titles, labels, colors, and styles to create professional-looking graphs.

Continue to Lesson 3 - Customizing Plots

Add titles, labels, legends, colors, and styles for professional charts

Back to Lesson 1 - Understanding Graphs

Review graph anatomy and coordinate systems


Create Visualizations from Any Data

You can now create line plots from any numeric data. Next, we make them beautiful and informative.

Transform your data into visual insights with Matplotlib!