Lesson 16 - Final Project - Traffic Analysis

Final Project Overview

Congratulations! You have learned all core data visualization techniques. Now you will apply everything in a comprehensive traffic analysis project.

Project Goal: Analyze I-94 highway traffic patterns to understand:

  • How traffic volume changes over time
  • Impact of weather conditions on traffic
  • Rush hour patterns and temporal trends
  • Correlation between weather variables and traffic volume

Dataset: i94_traffic.csv contains 48,205 hourly records from I-94 westbound (Minneapolis-St Paul) with 9 columns.

Skills You Will Use:

  • Data loading and datetime conversion
  • Line plots for time series
  • Scatter plots for correlations
  • Bar charts for categorical comparisons
  • Histograms for distributions
  • Subplots for multi-panel dashboards
  • Statistical analysis with correlation coefficients

This project simulates a real-world analysis you might present to transportation planners.


Understanding the Dataset

Data Dictionary

The i94_traffic.csv dataset contains these columns:

ColumnDescriptionTypeExample
holidayUS National holidays + regional holidaysCategorical“None”, “New Years Day”
tempAverage temperature (Kelvin)Numeric288.28
rain_1hRain amount in last hour (mm)Numeric0.0, 0.51
snow_1hSnow amount in last hour (mm)Numeric0.0
clouds_allCloud cover percentageNumeric40, 90
weather_mainWeather condition categoryCategorical“Clear”, “Clouds”, “Rain”
weather_descriptionDetailed weather descriptionCategorical“light rain”, “few clouds”
date_timeDate and time of observationDateTime“2012-10-02 09:00:00”
traffic_volumeHourly I-94 westbound trafficNumeric0 to 7280

Time Period: October 2012 to September 2018 (6 years)

Location: I-94 westbound between Minneapolis and St Paul, Minnesota


Part 1: Data Loading and Initial Exploration

Load and Inspect Data

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load traffic data
traffic = pd.read_csv('i94_traffic.csv')

# Display basic information
print("Dataset Shape:", traffic.shape)
print("\nFirst 5 Rows:")
print(traffic.head())

print("\nColumn Data Types:")
print(traffic.dtypes)

print("\nBasic Statistics:")
print(traffic.describe())

Expected Output:

Dataset Shape: (48205, 9)

First 5 Rows:
           holiday    temp  rain_1h  ...        weather_description              date_time  traffic_volume
0             None  288.28      0.0  ...                  light rain  2012-10-02 09:00:00            5545
1             None  289.36      0.0  ...                  light rain  2012-10-02 10:00:00            4516
2             None  289.58      0.0  ...                  light rain  2012-10-02 11:00:00            4767
...

Basic Statistics:
              temp      rain_1h  ...  traffic_volume
count  48205.000000  48205.000000  ...   48205.000000
mean     281.201955      0.319519  ...    3259.818355
std        11.297139      1.703283  ...    1986.860670
min       243.500000      0.000000  ...       0.000000
25%       272.380000      0.000000  ...    1193.000000
50%       281.310000      0.000000  ...    3380.000000
75%       289.530000      0.000000  ...    4933.000000
max       310.070000     55.630000  ...    7280.000000

Convert DateTime

import pandas as pd

traffic = pd.read_csv('i94_traffic.csv')

# Convert to datetime
traffic['date_time'] = pd.to_datetime(traffic['date_time'])

# Extract time components
traffic['year'] = traffic['date_time'].dt.year
traffic['month'] = traffic['date_time'].dt.month
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek  # 0=Monday, 6=Sunday
traffic['hour'] = traffic['date_time'].dt.hour

print("\nNew columns added:")
print(traffic[['date_time', 'year', 'month', 'day_of_week', 'hour']].head())

Part 2: Traffic Volume Over Time

Daily Traffic Pattern

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])

# Calculate daily total traffic
daily_traffic = traffic.groupby(traffic['date_time'].dt.date)['traffic_volume'].sum()

# Plot
plt.figure(figsize=(14, 6))
plt.plot(daily_traffic.index, daily_traffic.values, linewidth=1, color='steelblue', alpha=0.7)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Daily Total Traffic Volume', fontsize=12)
plt.title('I-94 Westbound Daily Traffic Volume (2012-2018)', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Analysis Question: Do you see any overall trends? Seasonal patterns?

Monthly Average Traffic

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['month'] = traffic['date_time'].dt.month

# Calculate monthly average
monthly_avg = traffic.groupby('month')['traffic_volume'].mean()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

# Plot
plt.figure(figsize=(12, 6))
bars = plt.bar(month_names, monthly_avg.values, color='coral', edgecolor='black')
plt.xlabel('Month', fontsize=12)
plt.ylabel('Average Hourly Traffic Volume', fontsize=12)
plt.title('Average Traffic Volume by Month', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=10)

plt.tight_layout()
plt.show()

Insight: Traffic typically lower in winter months (Jan-Feb) due to weather.


Part 3: Hourly Patterns

Average Hourly Traffic

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour

# Calculate hourly average
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()

# Plot
plt.figure(figsize=(14, 6))
plt.plot(hourly_avg.index, hourly_avg.values, linewidth=3,
         color='darkblue', marker='o', markersize=6)
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Average Traffic Volume', fontsize=12)
plt.title('Average Hourly Traffic Pattern on I-94', fontsize=14, fontweight='bold')
plt.xticks(range(0, 24))
plt.grid(True, alpha=0.3)

# Highlight rush hours
plt.axvspan(7, 9, alpha=0.2, color='red', label='Morning Rush')
plt.axvspan(16, 18, alpha=0.2, color='orange', label='Evening Rush')
plt.legend(fontsize=11)

plt.tight_layout()
plt.show()

Expected Pattern: Two peaks—morning rush (7-9 AM) and evening rush (4-6 PM).

Weekday vs Weekend

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek

# Separate weekday and weekend
weekday = traffic[traffic['day_of_week'] < 5]  # Mon-Fri
weekend = traffic[traffic['day_of_week'] >= 5]  # Sat-Sun

weekday_hourly = weekday.groupby('hour')['traffic_volume'].mean()
weekend_hourly = weekend.groupby('hour')['traffic_volume'].mean()

# Plot comparison
plt.figure(figsize=(14, 6))
plt.plot(weekday_hourly.index, weekday_hourly.values,
         linewidth=3, marker='o', label='Weekday', color='steelblue')
plt.plot(weekend_hourly.index, weekend_hourly.values,
         linewidth=3, marker='s', label='Weekend', color='coral')
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Average Traffic Volume', fontsize=12)
plt.title('Hourly Traffic Pattern: Weekday vs Weekend', fontsize=14, fontweight='bold')
plt.xticks(range(0, 24))
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Key Difference: Weekdays have sharp commute peaks; weekends have broader midday pattern.


Part 4: Weather Impact Analysis

Temperature vs Traffic

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')

# Convert Kelvin to Celsius
traffic['temp_celsius'] = traffic['temp'] - 273.15

# Scatter plot
plt.figure(figsize=(12, 6))
plt.scatter(traffic['temp_celsius'], traffic['traffic_volume'],
            alpha=0.3, s=10, color='red')
plt.xlabel('Temperature (°C)', fontsize=12)
plt.ylabel('Hourly Traffic Volume', fontsize=12)
plt.title('Temperature vs Traffic Volume', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)

# Add correlation
corr = traffic['temp_celsius'].corr(traffic['traffic_volume'])
plt.text(0.05, 0.95, f'Correlation: r = {corr:.3f}',
         transform=plt.gca().transAxes, fontsize=12,
         bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()

Expected Correlation: Moderate positive (warmer weather → more traffic).

Weather Condition Impact

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')

# Calculate average traffic by weather
weather_avg = traffic.groupby('weather_main')['traffic_volume'].mean().sort_values()

# Plot
plt.figure(figsize=(10, 6))
plt.barh(weather_avg.index, weather_avg.values, color='skyblue', edgecolor='black')
plt.xlabel('Average Hourly Traffic Volume', fontsize=12)
plt.ylabel('Weather Condition', fontsize=12)
plt.title('Traffic Volume by Weather Condition', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='x')

# Add value labels
for i, (condition, value) in enumerate(weather_avg.items()):
    plt.text(value + 50, i, f'{int(value)}', va='center', fontsize=10)

plt.tight_layout()
plt.show()

Insight: Traffic generally lower during rain, snow, or fog.

Rain Impact

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')

# Categorize rain levels
traffic['rain_category'] = pd.cut(traffic['rain_1h'],
                                   bins=[-0.1, 0, 1, 5, 100],
                                   labels=['No Rain', 'Light', 'Moderate', 'Heavy'])

# Calculate average traffic by rain category
rain_avg = traffic.groupby('rain_category')['traffic_volume'].mean()

# Plot
plt.figure(figsize=(10, 6))
colors = ['lightgreen', 'yellow', 'orange', 'red']
bars = plt.bar(rain_avg.index, rain_avg.values, color=colors, edgecolor='black')
plt.ylabel('Average Hourly Traffic Volume', fontsize=12)
plt.xlabel('Rain Intensity', fontsize=12)
plt.title('Traffic Volume by Rain Intensity', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{int(height)}', ha='center', va='bottom', fontsize=11)

plt.tight_layout()
plt.show()

Pattern: Traffic decreases with increasing rain intensity.


Part 5: Traffic Distribution Analysis

Overall Distribution

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')

# Plot histogram
plt.figure(figsize=(12, 6))
plt.hist(traffic['traffic_volume'], bins=50, color='steelblue',
         edgecolor='black', alpha=0.7)
plt.xlabel('Hourly Traffic Volume', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.title('Distribution of Hourly Traffic Volume', fontsize=14, fontweight='bold')

# Add statistical lines
mean_val = traffic['traffic_volume'].mean()
median_val = traffic['traffic_volume'].median()
plt.axvline(mean_val, color='red', linestyle='--', linewidth=2, label=f'Mean = {int(mean_val)}')
plt.axvline(median_val, color='orange', linestyle='--', linewidth=2, label=f'Median = {int(median_val)}')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

Distribution Shape: Bimodal (two peaks) due to rush hour vs non-rush hour traffic.

Weekday vs Weekend Distribution

import pandas as pd
import matplotlib.pyplot as plt

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek

weekday = traffic[traffic['day_of_week'] < 5]['traffic_volume']
weekend = traffic[traffic['day_of_week'] >= 5]['traffic_volume']

# Plot overlapping histograms
plt.figure(figsize=(12, 6))
plt.hist(weekday, bins=50, alpha=0.6, color='steelblue',
         edgecolor='black', label='Weekday', density=True)
plt.hist(weekend, bins=50, alpha=0.6, color='coral',
         edgecolor='black', label='Weekend', density=True)
plt.xlabel('Hourly Traffic Volume', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('Traffic Distribution: Weekday vs Weekend', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

Observation: Weekend distribution more right-skewed (lower traffic overall).


Part 6: Comprehensive Dashboard

Multi-Panel Analysis

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek
traffic['month'] = traffic['date_time'].dt.month
traffic['temp_celsius'] = traffic['temp'] - 273.15

# Create dashboard
fig = plt.figure(figsize=(16, 12))
gs = GridSpec(3, 2, figure=fig, hspace=0.35, wspace=0.3)

# ===== Plot 1: Hourly Pattern (top-left) =====
ax1 = fig.add_subplot(gs[0, 0])
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
ax1.plot(hourly_avg.index, hourly_avg.values, linewidth=3,
         color='darkblue', marker='o', markersize=5)
ax1.set_xlabel('Hour of Day', fontsize=11)
ax1.set_ylabel('Avg Traffic Volume', fontsize=11)
ax1.set_title('Average Hourly Traffic Pattern', fontsize=12, fontweight='bold')
ax1.set_xticks(range(0, 24, 3))
ax1.grid(True, alpha=0.3)

# ===== Plot 2: Monthly Pattern (top-right) =====
ax2 = fig.add_subplot(gs[0, 1])
monthly_avg = traffic.groupby('month')['traffic_volume'].mean()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax2.bar(month_names, monthly_avg.values, color='coral', edgecolor='black')
ax2.set_ylabel('Avg Traffic Volume', fontsize=11)
ax2.set_title('Average Traffic by Month', fontsize=12, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)
ax2.grid(True, alpha=0.3, axis='y')

# ===== Plot 3: Temperature Correlation (middle-left) =====
ax3 = fig.add_subplot(gs[1, 0])
ax3.scatter(traffic['temp_celsius'], traffic['traffic_volume'],
            alpha=0.2, s=10, color='red')
corr_temp = traffic['temp_celsius'].corr(traffic['traffic_volume'])
ax3.set_xlabel('Temperature (°C)', fontsize=11)
ax3.set_ylabel('Traffic Volume', fontsize=11)
ax3.set_title(f'Temperature vs Traffic (r = {corr_temp:.3f})',
              fontsize=12, fontweight='bold')
ax3.grid(True, alpha=0.3)

# ===== Plot 4: Weather Impact (middle-right) =====
ax4 = fig.add_subplot(gs[1, 1])
weather_avg = traffic.groupby('weather_main')['traffic_volume'].mean().sort_values()
ax4.barh(weather_avg.index, weather_avg.values, color='skyblue', edgecolor='black')
ax4.set_xlabel('Avg Traffic Volume', fontsize=11)
ax4.set_title('Traffic by Weather Condition', fontsize=12, fontweight='bold')
ax4.grid(True, alpha=0.3, axis='x')

# ===== Plot 5: Traffic Distribution (bottom-left) =====
ax5 = fig.add_subplot(gs[2, 0])
ax5.hist(traffic['traffic_volume'], bins=50, color='steelblue',
         edgecolor='black', alpha=0.7)
mean_val = traffic['traffic_volume'].mean()
ax5.axvline(mean_val, color='red', linestyle='--', linewidth=2,
            label=f'Mean = {int(mean_val)}')
ax5.set_xlabel('Hourly Traffic Volume', fontsize=11)
ax5.set_ylabel('Frequency', fontsize=11)
ax5.set_title('Traffic Volume Distribution', fontsize=12, fontweight='bold')
ax5.legend(fontsize=10)
ax5.grid(True, alpha=0.3, axis='y')

# ===== Plot 6: Weekday vs Weekend (bottom-right) =====
ax6 = fig.add_subplot(gs[2, 1])
weekday = traffic[traffic['day_of_week'] < 5]['traffic_volume']
weekend = traffic[traffic['day_of_week'] >= 5]['traffic_volume']
ax6.hist(weekday, bins=40, alpha=0.6, color='steelblue',
         edgecolor='black', label='Weekday', density=True)
ax6.hist(weekend, bins=40, alpha=0.6, color='coral',
         edgecolor='black', label='Weekend', density=True)
ax6.set_xlabel('Hourly Traffic Volume', fontsize=11)
ax6.set_ylabel('Density', fontsize=11)
ax6.set_title('Weekday vs Weekend Distribution', fontsize=12, fontweight='bold')
ax6.legend(fontsize=10)
ax6.grid(True, alpha=0.3, axis='y')

# Overall title
fig.suptitle('I-94 Westbound Traffic Analysis Dashboard (2012-2018)',
             fontsize=16, fontweight='bold')

plt.show()

Part 7: Key Findings Summary

Statistical Summary

import pandas as pd

traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek
traffic['temp_celsius'] = traffic['temp'] - 273.15

print("=" * 60)
print("I-94 TRAFFIC ANALYSIS - KEY FINDINGS")
print("=" * 60)

# Overall statistics
print(f"\nTotal observations: {len(traffic):,}")
print(f"Time period: {traffic['date_time'].min()} to {traffic['date_time'].max()}")
print(f"\nAverage hourly traffic: {traffic['traffic_volume'].mean():.0f}")
print(f"Median hourly traffic: {traffic['traffic_volume'].median():.0f}")
print(f"Max hourly traffic: {traffic['traffic_volume'].max():.0f}")
print(f"Min hourly traffic: {traffic['traffic_volume'].min():.0f}")

# Peak hours
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
peak_hour = hourly_avg.idxmax()
peak_volume = hourly_avg.max()
print(f"\nPeak hour: {peak_hour}:00 ({peak_volume:.0f} vehicles)")

# Weekday vs Weekend
weekday_avg = traffic[traffic['day_of_week'] < 5]['traffic_volume'].mean()
weekend_avg = traffic[traffic['day_of_week'] >= 5]['traffic_volume'].mean()
print(f"\nWeekday average: {weekday_avg:.0f}")
print(f"Weekend average: {weekend_avg:.0f}")
print(f"Difference: {weekday_avg - weekend_avg:.0f} ({(weekday_avg/weekend_avg - 1)*100:.1f}% higher)")

# Weather correlations
print("\n" + "=" * 60)
print("WEATHER CORRELATIONS")
print("=" * 60)
print(f"Temperature: r = {traffic['temp_celsius'].corr(traffic['traffic_volume']):.3f}")
print(f"Rain: r = {traffic['rain_1h'].corr(traffic['traffic_volume']):.3f}")
print(f"Cloud cover: r = {traffic['clouds_all'].corr(traffic['traffic_volume']):.3f}")

# Weather impact
print("\n" + "=" * 60)
print("AVERAGE TRAFFIC BY WEATHER")
print("=" * 60)
weather_avg = traffic.groupby('weather_main')['traffic_volume'].mean().sort_values(ascending=False)
for condition, avg in weather_avg.items():
    print(f"{condition:15s}: {avg:6.0f}")

print("\n" + "=" * 60)

Challenge Extensions

Want to go deeper? Try these extensions:

1. Holiday Impact

# Compare traffic on holidays vs non-holidays
holiday_traffic = traffic[traffic['holiday'] != 'None']['traffic_volume'].mean()
normal_traffic = traffic[traffic['holiday'] == 'None']['traffic_volume'].mean()
print(f"Holiday: {holiday_traffic:.0f}, Normal: {normal_traffic:.0f}")

2. Season Analysis

# Define seasons and compare
def get_season(month):
    if month in [12, 1, 2]: return 'Winter'
    elif month in [3, 4, 5]: return 'Spring'
    elif month in [6, 7, 8]: return 'Summer'
    else: return 'Fall'

traffic['season'] = traffic['month'].apply(get_season)
season_avg = traffic.groupby('season')['traffic_volume'].mean()

3. Rush Hour Detection

# Identify rush hours programmatically
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
threshold = hourly_avg.quantile(0.75)
rush_hours = hourly_avg[hourly_avg > threshold].index.tolist()
print(f"Rush hours: {rush_hours}")

Summary and Reflection

You completed a comprehensive traffic analysis project using:

  • Line plots: Time series trends, hourly patterns
  • Scatter plots: Temperature correlation, weather impact
  • Bar charts: Monthly comparison, weather conditions, weekday vs weekend
  • Histograms: Traffic distribution, density comparisons
  • Subplots: Multi-panel dashboards with GridSpec
  • Statistical analysis: Correlation coefficients, mean/median
  • Pandas operations: Grouping, filtering, datetime manipulation

Real-World Skills Applied:

  • Loading and cleaning datasets
  • Converting datetime formats
  • Exploratory data analysis
  • Creating publication-quality visualizations
  • Building comprehensive dashboards
  • Communicating insights through charts

Congratulations! You now have the data visualization skills to analyze and present data professionally. Apply these techniques to your own datasets and continue exploring the rich ecosystem of Python visualization libraries.

Next Steps:

  • Explore seaborn for statistical visualizations
  • Learn plotly for interactive charts
  • Study advanced matplotlib customization
  • Practice with diverse datasets
  • Build your portfolio with visualization projects