Lesson 4 - Multiple Lines and Series

Comparing Multiple Datasets

You can create and customize single-line plots. Now you will learn to plot multiple lines on the same graph, load real-world time series data, and compare trends visually.

By the end of this lesson, you will be able to:

  • Load time series data from CSV files using Pandas
  • Convert string dates to datetime objects
  • Plot time series data with proper date formatting
  • Create multiple lines on the same graph
  • Add legends to distinguish different lines
  • Work with real-world bike-sharing data

Comparing multiple series on one graph reveals relationships, differences, and patterns that single plots cannot show.


What is Time Series Data?

Understanding Time Series

Time series data is any data collected over time at regular intervals:

  • Stock prices (every second, minute, or day)
  • Weather measurements (hourly, daily)
  • Sales figures (daily, monthly, yearly)
  • Website traffic (hourly, daily)
  • Bike rentals (hourly, daily)

Why It Matters

  • Most business data is time series
  • Helps identify trends and patterns
  • Critical for forecasting and planning
  • Reveals seasonal effects
  • Shows growth or decline over time

Time series analysis is one of the most common tasks in data science.


Loading Data with Pandas

We will use the Bike Sharing Dataset - real data from Washington D.C.’s Capital Bikeshare system (2011-2012).

About the Dataset

The dataset contains:

  • Date of each day (731 days total)
  • Number of bike rentals
  • Weather conditions (temperature, humidity, windspeed)
  • Whether it was a holiday or working day
  • Casual vs registered users

Load the Data

import pandas as pd
import matplotlib.pyplot as plt

# Load the bike sharing data
bikes = pd.read_csv('day.csv')

# Look at the first few rows
print(bikes.head())

Key Columns

  • dteday - The date (currently a string)
  • cnt - Total bike rentals that day
  • casual - Rentals by casual (non-registered) users
  • registered - Rentals by registered users
  • temp - Normalized temperature
  • weathersit - Weather situation (1=Clear, 2=Mist, 3=Light Rain/Snow)

Notice: The dteday column contains dates like “2011-01-01”, but they are stored as text (strings), not dates.


Converting Strings to Datetime

To properly work with dates, we need to convert them from strings to datetime objects.

Why Convert?

  • Matplotlib can automatically format date axes
  • You can do date math (“what was 30 days ago?”)
  • You can extract parts like month, year, day of week
  • Enables proper time-based operations

Use pd.to_datetime()

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Convert dteday column from string to datetime
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Check the data type
print("Data type:", bikes['dteday'].dtype)
print("\nFirst 5 dates:")
print(bikes['dteday'].head())

What Changed?

  • Before: dtype: object (string)
  • After: dtype: datetime64[ns] (datetime)

Now Python knows these are dates, not just random text.


Plotting Time Series Data

Now we can plot bike rentals over time. Let’s see how demand changed across 2 years:

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Plot total rentals over time
plt.plot(bikes['dteday'], bikes['cnt'])
plt.title('Daily Bike Rentals in Washington D.C. (2011-2012)')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

plt.show()

What Do You Notice?

  • Clear seasonal pattern - high in summer, low in winter
  • Overall upward trend - more rentals in 2012 than 2011
  • Lots of daily variation (the “noise”)

About plt.xticks(rotation=45)

  • Dates are long text (“2011-01-01”)
  • Rotating by 45 degrees prevents overlap
  • Makes the graph much more readable

Common rotation values:

  • 0° (horizontal) - default
  • 45° (diagonal) - good for dates
  • 90° (vertical) - extreme cases

Multiple Lines on One Graph

Let’s compare casual users vs registered users. Do they have different patterns?

Plotting Two Lines

To plot multiple lines, call plt.plot() multiple times before plt.show():

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Plot both casual and registered users
plt.plot(bikes['dteday'], bikes['casual'], label='Casual Users')
plt.plot(bikes['dteday'], bikes['registered'], label='Registered Users')

plt.title('Casual vs Registered Bike Users (2011-2012)')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)

# Add legend to show which line is which
plt.legend()

plt.show()

Understanding the Code

  • label='Casual Users' - Names the first line
  • label='Registered Users' - Names the second line
  • plt.legend() - Shows a box with the labels

What Patterns Do You See?

  • Registered users (blue) are consistently higher
  • Both show seasonal patterns
  • Casual users have more extreme peaks (summer weekends?)
  • Registered users more stable year-round

Hypothesis: Casual users are tourists/recreational riders (weather-dependent). Registered users are commuters (ride year-round regardless of weather).


Customizing Multiple Lines

Make each line distinctive with colors, styles, and markers:

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

# Plot with custom styling
plt.plot(bikes['dteday'], bikes['casual'],
         color='orange',
         linestyle='--',
         linewidth=2,
         label='Casual Users')

plt.plot(bikes['dteday'], bikes['registered'],
         color='blue',
         linestyle='-',
         linewidth=2,
         label='Registered Users')

plt.title('Casual vs Registered Bike Users (2011-2012)')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)
plt.legend()

plt.show()

Styling Tips for Multiple Lines

Use different colors:

  • Makes lines easy to distinguish
  • Choose contrasting colors

Use different line styles:

  • Solid vs dashed helps colorblind readers
  • Provides redundant encoding

Consistent line width:

  • Use same width for equal importance
  • Use thicker lines for emphasis

Practice Exercises

Apply time series and multiple line techniques.

Exercise 1: First 3 Months

Create a line plot showing only the first 3 months of 2011 (January - March).

Hint: Use slicing to get the first 90 rows (approximately 3 months)

bikes_3months = bikes.head(90)

Requirements:

  1. Plot total rentals (cnt) for the first 90 days
  2. Title: “Bike Rentals - First Quarter 2011”
  3. Proper axis labels
  4. Rotated x-axis labels
# Your code here

Solution

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

bikes_3months = bikes.head(90)

plt.plot(bikes_3months['dteday'], bikes_3months['cnt'])
plt.title('Bike Rentals - First Quarter 2011')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)

plt.show()

Exercise 2: Seasonal Comparison

The dataset has a season column:

  • 1 = Spring
  • 2 = Summer
  • 3 = Fall
  • 4 = Winter

Task: Create separate plots for Summer and Winter to compare them.

Hint: Filter the data like this:

summer = bikes[bikes['season'] == 2]
winter = bikes[bikes['season'] == 4]

Requirements:

  1. Plot both seasons on the same graph with different colors
  2. Add a legend
  3. Use meaningful title and labels
# Your code here

Solution

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])

summer = bikes[bikes['season'] == 2]
winter = bikes[bikes['season'] == 4]

plt.plot(summer['dteday'], summer['cnt'],
         color='orange', label='Summer', marker='o', markersize=3)
plt.plot(winter['dteday'], winter['cnt'],
         color='blue', label='Winter', marker='s', markersize=3)

plt.title('Bike Rentals: Summer vs Winter')
plt.xlabel('Date')
plt.ylabel('Number of Rentals')
plt.xticks(rotation=45)
plt.legend()

plt.show()

Exercise 3: Growth Analysis

Compare the same month across two years to see year-over-year growth.

Task: Plot July 2011 vs July 2012 on the same graph.

Hint: Filter by month:

bikes['month'] = bikes['dteday'].dt.month
july_2011 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2011)]
july_2012 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2012)]

Challenge: Can you calculate the average growth percentage?

# Your code here

Solution

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
bikes['dteday'] = pd.to_datetime(bikes['dteday'])
bikes['month'] = bikes['dteday'].dt.month

july_2011 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2011)]
july_2012 = bikes[(bikes['month'] == 7) & (bikes['dteday'].dt.year == 2012)]

plt.plot(july_2011['dteday'].dt.day, july_2011['cnt'],
         label='July 2011', marker='o')
plt.plot(july_2012['dteday'].dt.day, july_2012['cnt'],
         label='July 2012', marker='s')

plt.title('Year-over-Year Growth: July Comparison')
plt.xlabel('Day of Month')
plt.ylabel('Number of Rentals')
plt.legend()

# Calculate growth
avg_2011 = july_2011['cnt'].mean()
avg_2012 = july_2012['cnt'].mean()
growth = ((avg_2012 - avg_2011) / avg_2011) * 100
print(f"Average growth from July 2011 to July 2012: {growth:.1f}%")

plt.show()

Summary

You now visualize and compare time series data. Let’s review the key concepts.

Key Concepts

Load CSV Data

  • Use pd.read_csv('filename.csv')
  • Returns a DataFrame
  • Check with .head() to see structure

Convert Dates

  • Use pd.to_datetime(df['date_column'])
  • Enables proper date handling
  • Automatic axis formatting in plots

Rotate Labels

  • Use plt.xticks(rotation=45) for readability
  • Essential for date labels
  • Prevents overlapping text

Multiple Lines

  • Call plt.plot() multiple times
  • Use label= parameter for each line
  • Add plt.legend() to show labels
  • Use different colors and styles

Time Series Patterns

  • Trends: Overall direction (upward/downward)
  • Seasonality: Repeating patterns (yearly, weekly, daily)
  • Noise: Daily/random fluctuations

Filter Data

  • Focus on specific time periods
  • .head(n) for first n rows
  • df[df['column'] == value] for conditions
  • Extract date parts: .dt.year, .dt.month, .dt.day

Syntax Reference

# Load and prepare data
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'])

# Plot multiple lines
plt.plot(df['date'], df['series1'], label='Series 1')
plt.plot(df['date'], df['series2'], label='Series 2')

# Formatting
plt.title('Title Here')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.legend()

plt.show()

Common Questions

Q: My dates show weird formats like “2011-01-01 00:00:00”. How do I fix this? A: This is normal. Datetime objects include time. Matplotlib usually formats them nicely automatically.

Q: What if my CSV has dates in a different format like “01/15/2023”? A: pd.to_datetime() is smart and handles most formats automatically. For unusual formats, use: pd.to_datetime(df['date'], format='%m/%d/%Y')

Q: Can I plot more than 2 lines? A: Yes! Plot as many as needed, but more than 5-6 lines gets hard to read. Consider using multiple subplots instead (coming later).


Next Steps

You have completed Module 1: Line Plots and Time Series. You now know how to create, customize, and compare line plots with real-world data.

In Module 2: Scatter Plots and Correlation, you will learn to explore relationships between variables, identify correlations, and understand causation.

Continue to Lesson 5 - Scatter Plots Basics

Create scatter plots to visualize relationships between two variables

Back to Lesson 3 - Customizing Plots

Review plot customization techniques


Master Time Series Visualization

You can now load, process, and visualize time series data from real-world sources. These skills apply to sales data, website traffic, stock prices, and any metric that changes over time.

Use these techniques to reveal trends and patterns in your data!