Lesson 4 - Multiple Lines and Series

Comparing Multiple Datasets

You can create and customize single-line plots. Now you will learn to plot multiple lines on the same graph, load real-world time series data, and compare trends visually.

By the end of this lesson, you will be able to:

Load time series data from CSV files using Pandas
Convert string dates to datetime objects
Plot time series data with proper date formatting
Create multiple lines on the same graph
Add legends to distinguish different lines
Work with real-world bike-sharing data

Comparing multiple series on one graph reveals relationships, differences, and patterns that single plots cannot show.

What is Time Series Data?

Understanding Time Series

Time series data is any data collected over time at regular intervals:

Stock prices (every second, minute, or day)
Weather measurements (hourly, daily)
Sales figures (daily, monthly, yearly)
Website traffic (hourly, daily)
Bike rentals (hourly, daily)

Why It Matters

Most business data is time series
Helps identify trends and patterns
Critical for forecasting and planning
Reveals seasonal effects
Shows growth or decline over time

Time series analysis is one of the most common tasks in data science.

Loading Data with Pandas

We will use the Bike Sharing Dataset - real data from Washington D.C.’s Capital Bikeshare system (2011-2012).

About the Dataset

The dataset contains:

Date of each day (731 days total)
Number of bike rentals
Weather conditions (temperature, humidity, windspeed)
Whether it was a holiday or working day
Casual vs registered users

Load the Data

import pandas as pd
import matplotlib.pyplot as plt

# Load the bike sharing data
bikes = pd.read_csv('day.csv')

# Look at the first few rows
print(bikes.head())

Key Columns

dteday - The date (currently a string)
cnt - Total bike rentals that day
casual - Rentals by casual (non-registered) users
registered - Rentals by registered users
temp - Normalized temperature
weathersit - Weather situation (1=Clear, 2=Mist, 3=Light Rain/Snow)

Notice: The dteday column contains dates like “2011-01-01”, but they are stored as text (strings), not dates.

Converting Strings to Datetime

To properly work with dates, we need to convert them from strings to datetime objects.

Why Convert?

Matplotlib can automatically format date axes
You can do date math (“what was 30 days ago?”)
You can extract parts like month, year, day of week
Enables proper time-based operations

Use pd.to_datetime()

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Convert dteday column from string to datetime
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Check the data type
print("Data type:", bikes['dteday'].dtype)
print("\nFirst 5 dates:")
print(bikes['dteday'].head())

What Changed?

Before: dtype: object (string)
After: dtype: datetime64[ns] (datetime)

Now Python knows these are dates, not just random text.

Plotting Time Series Data

Now we can plot bike rentals over time. Let’s see how demand changed across 2 years:

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Plot total rentals over time
plt.plot(bikes['dteday'], bikes['cnt'])
plt.title('Daily Bike Rentals in Washington D.C. (2011-2012)')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

plt.show()

What Do You Notice?

Clear seasonal pattern - high in summer, low in winter
Overall upward trend - more rentals in 2012 than 2011
Lots of daily variation (the “noise”)

About plt.xticks(rotation=45)

Dates are long text (“2011-01-01”)
Rotating by 45 degrees prevents overlap
Makes the graph much more readable

Common rotation values:

0° (horizontal) - default
45° (diagonal) - good for dates
90° (vertical) - extreme cases

Multiple Lines on One Graph

Let’s compare casual users vs registered users. Do they have different patterns?

Plotting Two Lines

To plot multiple lines, call plt.plot() multiple times before plt.show():

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Plot both casual and registered users
plt.plot(bikes['dteday'], bikes['casual'], label='Casual Users')
plt.plot(bikes['dteday'], bikes['registered'], label='Registered Users')

plt.title('Casual vs Registered Bike Users (2011-2012)')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)

# Add legend to show which line is which
plt.legend()

plt.show()

Understanding the Code

label='Casual Users' - Names the first line
label='Registered Users' - Names the second line
plt.legend() - Shows a box with the labels

What Patterns Do You See?

Registered users (blue) are consistently higher
Both show seasonal patterns
Casual users have more extreme peaks (summer weekends?)
Registered users more stable year-round

Hypothesis: Casual users are tourists/recreational riders (weather-dependent). Registered users are commuters (ride year-round regardless of weather).

Customizing Multiple Lines

Make each line distinctive with colors, styles, and markers:

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Plot with custom styling
plt.plot(bikes['dteday'], bikes['casual'],
         color='orange',
         linestyle='--',
         linewidth=2,
         label='Casual Users')

plt.plot(bikes['dteday'], bikes['registered'],
         color='blue',
         linestyle='-',
         linewidth=2,
         label='Registered Users')

plt.title('Casual vs Registered Bike Users (2011-2012)')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)
plt.legend()

plt.show()

Styling Tips for Multiple Lines

Use different colors:

Makes lines easy to distinguish
Choose contrasting colors

Use different line styles:

Solid vs dashed helps colorblind readers
Provides redundant encoding

Consistent line width:

Use same width for equal importance
Use thicker lines for emphasis

Practice Exercises

Apply time series and multiple line techniques.

Exercise 1: First 3 Months

Create a line plot showing only the first 3 months of 2011 (January - March).

Hint: Use slicing to get the first 90 rows (approximately 3 months)

bikes_3months = bikes.head(90)

Requirements:

Plot total rentals (cnt) for the first 90 days
Title: “Bike Rentals - First Quarter 2011”
Proper axis labels
Rotated x-axis labels

# Your code here

Solution

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

bikes_3months = bikes.head(90)

plt.plot(bikes_3months['dteday'], bikes_3months['cnt'])
plt.title('Bike Rentals - First Quarter 2011')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)

plt.show()

Exercise 2: Seasonal Comparison

The dataset has a season column:

1 = Spring
2 = Summer
3 = Fall
4 = Winter

Task: Create separate plots for Summer and Winter to compare them.

Hint: Filter the data like this:

summer = bikes[bikes['season'] == 2]
winter = bikes[bikes['season'] == 4]

Requirements:

Plot both seasons on the same graph with different colors
Add a legend
Use meaningful title and labels

# Your code here

Solution

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

summer = bikes[bikes['season'] == 2]
winter = bikes[bikes['season'] == 4]

plt.plot(summer['dteday'], summer['cnt'],
         color='orange', label='Summer', marker='o', markersize=3)
plt.plot(winter['dteday'], winter['cnt'],
         color='blue', label='Winter', marker='s', markersize=3)

plt.title('Bike Rentals: Summer vs Winter')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)
plt.legend()

plt.show()

Exercise 3: Growth Analysis

Compare the same month across two years to see year-over-year growth.

Task: Plot July 2011 vs July 2012 on the same graph.

Hint: Filter by month:

bikes['month'] = bikes['dteday'].dt.month
july_2011 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2011)]
july_2012 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2012)]

Challenge: Can you calculate the average growth percentage?

# Your code here

Solution

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])
bikes['month'] = bikes['dteday'].dt.month

july_2011 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2011)]
july_2012 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2012)]

plt.plot(july_2011['dteday'].dt.day, july_2011['cnt'],
         label='July 2011', marker='o')
plt.plot(july_2012['dteday'].dt.day, july_2012['cnt'],
         label='July 2012', marker='s')

plt.title('Year-over-Year Growth: July Comparison')
plt.xlabel('Day of Month')
plt.ylabel('Number of Rentals')
plt.legend()

# Calculate growth
avg_2011 = july_2011['cnt'].mean()
avg_2012 = july_2012['cnt'].mean()
growth = ((avg_2012 - avg_2011) / avg_2011) * 100
print(f"Average growth from July 2011 to July 2012: {growth:.1f}%")

plt.show()

Summary

You now visualize and compare time series data. Let’s review the key concepts.

Key Concepts

Load CSV Data

Use pd.read_csv('filename.csv')
Returns a DataFrame
Check with .head() to see structure

Convert Dates

Use pd.to_datetime(df['date_column'])
Enables proper date handling
Automatic axis formatting in plots

Rotate Labels

Use plt.xticks(rotation=45) for readability
Essential for date labels
Prevents overlapping text

Multiple Lines

Call plt.plot() multiple times
Use label= parameter for each line
Add plt.legend() to show labels
Use different colors and styles

Time Series Patterns

Trends: Overall direction (upward/downward)
Seasonality: Repeating patterns (yearly, weekly, daily)
Noise: Daily/random fluctuations

Filter Data

Focus on specific time periods
.head(n) for first n rows
df[df['column'] == value] for conditions
Extract date parts: .dt.year, .dt.month, .dt.day

Syntax Reference

# Load and prepare data
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'])

# Plot multiple lines
plt.plot(df['date'], df['series1'], label='Series 1')
plt.plot(df['date'], df['series2'], label='Series 2')

# Formatting
plt.title('Title Here')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.legend()

plt.show()

Common Questions

Q: My dates show weird formats like “2011-01-01 00:00:00”. How do I fix this? A: This is normal. Datetime objects include time. Matplotlib usually formats them nicely automatically.

Q: What if my CSV has dates in a different format like “01/15/2023”? A: pd.to_datetime() is smart and handles most formats automatically. For unusual formats, use: pd.to_datetime(df['date'], format='%m/%d/%Y')

Q: Can I plot more than 2 lines? A: Yes! Plot as many as needed, but more than 5-6 lines gets hard to read. Consider using multiple subplots instead (coming later).

Next Steps

You have completed Module 1: Line Plots and Time Series. You now know how to create, customize, and compare line plots with real-world data.

In Module 2: Scatter Plots and Correlation, you will learn to explore relationships between variables, identify correlations, and understand causation.

Continue to Lesson 5 - Scatter Plots Basics

Create scatter plots to visualize relationships between two variables

Back to Lesson 3 - Customizing Plots

Review plot customization techniques

Master Time Series Visualization

You can now load, process, and visualize time series data from real-world sources. These skills apply to sales data, website traffic, stock prices, and any metric that changes over time.

Use these techniques to reveal trends and patterns in your data!

Lesson 3 - Customizing Plots

Lesson 5 - Scatter Plots Basics

Courses

DATATWEETS

Title here

Lesson 4 - Multiple Lines and Series

Comparing Multiple Datasets

What is Time Series Data?

Understanding Time Series

Why It Matters

Loading Data with Pandas

About the Dataset

Load the Data

Key Columns

Converting Strings to Datetime

Why Convert?

Use pd.to_datetime()

What Changed?

Plotting Time Series Data

What Do You Notice?

About plt.xticks(rotation=45)

Multiple Lines on One Graph

Plotting Two Lines

Understanding the Code

What Patterns Do You See?

Customizing Multiple Lines

Styling Tips for Multiple Lines

Practice Exercises

Exercise 1: First 3 Months

Exercise 2: Seasonal Comparison

Exercise 3: Growth Analysis

Summary

Key Concepts

Syntax Reference

Common Questions

Next Steps

Continue to Lesson 5 - Scatter Plots Basics

Back to Lesson 3 - Customizing Plots

Master Time Series Visualization

Lesson 4 - Multiple Lines and Series

Comparing Multiple Datasets#

What is Time Series Data?#

Understanding Time Series#

Why It Matters#

Loading Data with Pandas#

About the Dataset#

Load the Data#

Key Columns#

Converting Strings to Datetime#

Why Convert?#

Use pd.to_datetime()#

What Changed?#

Plotting Time Series Data#

What Do You Notice?#

About plt.xticks(rotation=45)#

Multiple Lines on One Graph#

Plotting Two Lines#

Understanding the Code#

What Patterns Do You See?#

Customizing Multiple Lines#

Styling Tips for Multiple Lines#

Practice Exercises#

Exercise 1: First 3 Months#

Exercise 2: Seasonal Comparison#

Exercise 3: Growth Analysis#

Summary#

Key Concepts#

Syntax Reference#

Common Questions#

Next Steps#

Continue to Lesson 5 - Scatter Plots Basics

Back to Lesson 3 - Customizing Plots

Master Time Series Visualization#

Comparing Multiple Datasets

What is Time Series Data?

Understanding Time Series

Why It Matters

Loading Data with Pandas

About the Dataset

Load the Data

Key Columns

Converting Strings to Datetime

Why Convert?

Use pd.to_datetime()

What Changed?

Plotting Time Series Data

What Do You Notice?

About plt.xticks(rotation=45)

Multiple Lines on One Graph

Plotting Two Lines

Understanding the Code

What Patterns Do You See?

Customizing Multiple Lines

Styling Tips for Multiple Lines

Practice Exercises

Exercise 1: First 3 Months

Exercise 2: Seasonal Comparison

Exercise 3: Growth Analysis

Summary

Key Concepts

Syntax Reference

Common Questions

Next Steps

Master Time Series Visualization