Lesson 16 - Final Project - Traffic Analysis
On this page
- Final Project Overview
- Understanding the Dataset
- Part 1: Data Loading and Initial Exploration
- Part 2: Traffic Volume Over Time
- Part 3: Hourly Patterns
- Part 4: Weather Impact Analysis
- Part 5: Traffic Distribution Analysis
- Part 6: Comprehensive Dashboard
- Part 7: Key Findings Summary
- Challenge Extensions
- Summary and Reflection
Final Project Overview
Congratulations! You have learned all core data visualization techniques. Now you will apply everything in a comprehensive traffic analysis project.
Project Goal: Analyze I-94 highway traffic patterns to understand:
- How traffic volume changes over time
- Impact of weather conditions on traffic
- Rush hour patterns and temporal trends
- Correlation between weather variables and traffic volume
Dataset: i94_traffic.csv contains 48,205 hourly records from I-94 westbound (Minneapolis-St Paul) with 9 columns.
Skills You Will Use:
- Data loading and datetime conversion
- Line plots for time series
- Scatter plots for correlations
- Bar charts for categorical comparisons
- Histograms for distributions
- Subplots for multi-panel dashboards
- Statistical analysis with correlation coefficients
This project simulates a real-world analysis you might present to transportation planners.
Understanding the Dataset
Data Dictionary
The i94_traffic.csv dataset contains these columns:
| Column | Description | Type | Example |
|---|---|---|---|
holiday | US National holidays + regional holidays | Categorical | “None”, “New Years Day” |
temp | Average temperature (Kelvin) | Numeric | 288.28 |
rain_1h | Rain amount in last hour (mm) | Numeric | 0.0, 0.51 |
snow_1h | Snow amount in last hour (mm) | Numeric | 0.0 |
clouds_all | Cloud cover percentage | Numeric | 40, 90 |
weather_main | Weather condition category | Categorical | “Clear”, “Clouds”, “Rain” |
weather_description | Detailed weather description | Categorical | “light rain”, “few clouds” |
date_time | Date and time of observation | DateTime | “2012-10-02 09:00:00” |
traffic_volume | Hourly I-94 westbound traffic | Numeric | 0 to 7280 |
Time Period: October 2012 to September 2018 (6 years)
Location: I-94 westbound between Minneapolis and St Paul, Minnesota
Part 1: Data Loading and Initial Exploration
Load and Inspect Data
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Load traffic data
traffic = pd.read_csv('i94_traffic.csv')
# Display basic information
print("Dataset Shape:", traffic.shape)
print("\nFirst 5 Rows:")
print(traffic.head())
print("\nColumn Data Types:")
print(traffic.dtypes)
print("\nBasic Statistics:")
print(traffic.describe())Expected Output:
Dataset Shape: (48205, 9)
First 5 Rows:
holiday temp rain_1h ... weather_description date_time traffic_volume
0 None 288.28 0.0 ... light rain 2012-10-02 09:00:00 5545
1 None 289.36 0.0 ... light rain 2012-10-02 10:00:00 4516
2 None 289.58 0.0 ... light rain 2012-10-02 11:00:00 4767
...
Basic Statistics:
temp rain_1h ... traffic_volume
count 48205.000000 48205.000000 ... 48205.000000
mean 281.201955 0.319519 ... 3259.818355
std 11.297139 1.703283 ... 1986.860670
min 243.500000 0.000000 ... 0.000000
25% 272.380000 0.000000 ... 1193.000000
50% 281.310000 0.000000 ... 3380.000000
75% 289.530000 0.000000 ... 4933.000000
max 310.070000 55.630000 ... 7280.000000Convert DateTime
import pandas as pd
traffic = pd.read_csv('i94_traffic.csv')
# Convert to datetime
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
# Extract time components
traffic['year'] = traffic['date_time'].dt.year
traffic['month'] = traffic['date_time'].dt.month
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek # 0=Monday, 6=Sunday
traffic['hour'] = traffic['date_time'].dt.hour
print("\nNew columns added:")
print(traffic[['date_time', 'year', 'month', 'day_of_week', 'hour']].head())Part 2: Traffic Volume Over Time
Daily Traffic Pattern
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
# Calculate daily total traffic
daily_traffic = traffic.groupby(traffic['date_time'].dt.date)['traffic_volume'].sum()
# Plot
plt.figure(figsize=(14, 6))
plt.plot(daily_traffic.index, daily_traffic.values, linewidth=1, color='steelblue', alpha=0.7)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Daily Total Traffic Volume', fontsize=12)
plt.title('I-94 Westbound Daily Traffic Volume (2012-2018)', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Analysis Question: Do you see any overall trends? Seasonal patterns?
Monthly Average Traffic
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['month'] = traffic['date_time'].dt.month
# Calculate monthly average
monthly_avg = traffic.groupby('month')['traffic_volume'].mean()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
# Plot
plt.figure(figsize=(12, 6))
bars = plt.bar(month_names, monthly_avg.values, color='coral', edgecolor='black')
plt.xlabel('Month', fontsize=12)
plt.ylabel('Average Hourly Traffic Volume', fontsize=12)
plt.title('Average Traffic Volume by Month', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}', ha='center', va='bottom', fontsize=10)
plt.tight_layout()
plt.show()Insight: Traffic typically lower in winter months (Jan-Feb) due to weather.
Part 3: Hourly Patterns
Average Hourly Traffic
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
# Calculate hourly average
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
# Plot
plt.figure(figsize=(14, 6))
plt.plot(hourly_avg.index, hourly_avg.values, linewidth=3,
color='darkblue', marker='o', markersize=6)
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Average Traffic Volume', fontsize=12)
plt.title('Average Hourly Traffic Pattern on I-94', fontsize=14, fontweight='bold')
plt.xticks(range(0, 24))
plt.grid(True, alpha=0.3)
# Highlight rush hours
plt.axvspan(7, 9, alpha=0.2, color='red', label='Morning Rush')
plt.axvspan(16, 18, alpha=0.2, color='orange', label='Evening Rush')
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()Expected Pattern: Two peaks—morning rush (7-9 AM) and evening rush (4-6 PM).
Weekday vs Weekend
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek
# Separate weekday and weekend
weekday = traffic[traffic['day_of_week'] < 5] # Mon-Fri
weekend = traffic[traffic['day_of_week'] >= 5] # Sat-Sun
weekday_hourly = weekday.groupby('hour')['traffic_volume'].mean()
weekend_hourly = weekend.groupby('hour')['traffic_volume'].mean()
# Plot comparison
plt.figure(figsize=(14, 6))
plt.plot(weekday_hourly.index, weekday_hourly.values,
linewidth=3, marker='o', label='Weekday', color='steelblue')
plt.plot(weekend_hourly.index, weekend_hourly.values,
linewidth=3, marker='s', label='Weekend', color='coral')
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Average Traffic Volume', fontsize=12)
plt.title('Hourly Traffic Pattern: Weekday vs Weekend', fontsize=14, fontweight='bold')
plt.xticks(range(0, 24))
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Key Difference: Weekdays have sharp commute peaks; weekends have broader midday pattern.
Part 4: Weather Impact Analysis
Temperature vs Traffic
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
# Convert Kelvin to Celsius
traffic['temp_celsius'] = traffic['temp'] - 273.15
# Scatter plot
plt.figure(figsize=(12, 6))
plt.scatter(traffic['temp_celsius'], traffic['traffic_volume'],
alpha=0.3, s=10, color='red')
plt.xlabel('Temperature (°C)', fontsize=12)
plt.ylabel('Hourly Traffic Volume', fontsize=12)
plt.title('Temperature vs Traffic Volume', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
# Add correlation
corr = traffic['temp_celsius'].corr(traffic['traffic_volume'])
plt.text(0.05, 0.95, f'Correlation: r = {corr:.3f}',
transform=plt.gca().transAxes, fontsize=12,
bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
plt.tight_layout()
plt.show()Expected Correlation: Moderate positive (warmer weather → more traffic).
Weather Condition Impact
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
# Calculate average traffic by weather
weather_avg = traffic.groupby('weather_main')['traffic_volume'].mean().sort_values()
# Plot
plt.figure(figsize=(10, 6))
plt.barh(weather_avg.index, weather_avg.values, color='skyblue', edgecolor='black')
plt.xlabel('Average Hourly Traffic Volume', fontsize=12)
plt.ylabel('Weather Condition', fontsize=12)
plt.title('Traffic Volume by Weather Condition', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='x')
# Add value labels
for i, (condition, value) in enumerate(weather_avg.items()):
plt.text(value + 50, i, f'{int(value)}', va='center', fontsize=10)
plt.tight_layout()
plt.show()Insight: Traffic generally lower during rain, snow, or fog.
Rain Impact
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
# Categorize rain levels
traffic['rain_category'] = pd.cut(traffic['rain_1h'],
bins=[-0.1, 0, 1, 5, 100],
labels=['No Rain', 'Light', 'Moderate', 'Heavy'])
# Calculate average traffic by rain category
rain_avg = traffic.groupby('rain_category')['traffic_volume'].mean()
# Plot
plt.figure(figsize=(10, 6))
colors = ['lightgreen', 'yellow', 'orange', 'red']
bars = plt.bar(rain_avg.index, rain_avg.values, color=colors, edgecolor='black')
plt.ylabel('Average Hourly Traffic Volume', fontsize=12)
plt.xlabel('Rain Intensity', fontsize=12)
plt.title('Traffic Volume by Rain Intensity', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{int(height)}', ha='center', va='bottom', fontsize=11)
plt.tight_layout()
plt.show()Pattern: Traffic decreases with increasing rain intensity.
Part 5: Traffic Distribution Analysis
Overall Distribution
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
# Plot histogram
plt.figure(figsize=(12, 6))
plt.hist(traffic['traffic_volume'], bins=50, color='steelblue',
edgecolor='black', alpha=0.7)
plt.xlabel('Hourly Traffic Volume', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.title('Distribution of Hourly Traffic Volume', fontsize=14, fontweight='bold')
# Add statistical lines
mean_val = traffic['traffic_volume'].mean()
median_val = traffic['traffic_volume'].median()
plt.axvline(mean_val, color='red', linestyle='--', linewidth=2, label=f'Mean = {int(mean_val)}')
plt.axvline(median_val, color='orange', linestyle='--', linewidth=2, label=f'Median = {int(median_val)}')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()Distribution Shape: Bimodal (two peaks) due to rush hour vs non-rush hour traffic.
Weekday vs Weekend Distribution
import pandas as pd
import matplotlib.pyplot as plt
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek
weekday = traffic[traffic['day_of_week'] < 5]['traffic_volume']
weekend = traffic[traffic['day_of_week'] >= 5]['traffic_volume']
# Plot overlapping histograms
plt.figure(figsize=(12, 6))
plt.hist(weekday, bins=50, alpha=0.6, color='steelblue',
edgecolor='black', label='Weekday', density=True)
plt.hist(weekend, bins=50, alpha=0.6, color='coral',
edgecolor='black', label='Weekend', density=True)
plt.xlabel('Hourly Traffic Volume', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('Traffic Distribution: Weekday vs Weekend', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()Observation: Weekend distribution more right-skewed (lower traffic overall).
Part 6: Comprehensive Dashboard
Multi-Panel Analysis
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek
traffic['month'] = traffic['date_time'].dt.month
traffic['temp_celsius'] = traffic['temp'] - 273.15
# Create dashboard
fig = plt.figure(figsize=(16, 12))
gs = GridSpec(3, 2, figure=fig, hspace=0.35, wspace=0.3)
# ===== Plot 1: Hourly Pattern (top-left) =====
ax1 = fig.add_subplot(gs[0, 0])
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
ax1.plot(hourly_avg.index, hourly_avg.values, linewidth=3,
color='darkblue', marker='o', markersize=5)
ax1.set_xlabel('Hour of Day', fontsize=11)
ax1.set_ylabel('Avg Traffic Volume', fontsize=11)
ax1.set_title('Average Hourly Traffic Pattern', fontsize=12, fontweight='bold')
ax1.set_xticks(range(0, 24, 3))
ax1.grid(True, alpha=0.3)
# ===== Plot 2: Monthly Pattern (top-right) =====
ax2 = fig.add_subplot(gs[0, 1])
monthly_avg = traffic.groupby('month')['traffic_volume'].mean()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
ax2.bar(month_names, monthly_avg.values, color='coral', edgecolor='black')
ax2.set_ylabel('Avg Traffic Volume', fontsize=11)
ax2.set_title('Average Traffic by Month', fontsize=12, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)
ax2.grid(True, alpha=0.3, axis='y')
# ===== Plot 3: Temperature Correlation (middle-left) =====
ax3 = fig.add_subplot(gs[1, 0])
ax3.scatter(traffic['temp_celsius'], traffic['traffic_volume'],
alpha=0.2, s=10, color='red')
corr_temp = traffic['temp_celsius'].corr(traffic['traffic_volume'])
ax3.set_xlabel('Temperature (°C)', fontsize=11)
ax3.set_ylabel('Traffic Volume', fontsize=11)
ax3.set_title(f'Temperature vs Traffic (r = {corr_temp:.3f})',
fontsize=12, fontweight='bold')
ax3.grid(True, alpha=0.3)
# ===== Plot 4: Weather Impact (middle-right) =====
ax4 = fig.add_subplot(gs[1, 1])
weather_avg = traffic.groupby('weather_main')['traffic_volume'].mean().sort_values()
ax4.barh(weather_avg.index, weather_avg.values, color='skyblue', edgecolor='black')
ax4.set_xlabel('Avg Traffic Volume', fontsize=11)
ax4.set_title('Traffic by Weather Condition', fontsize=12, fontweight='bold')
ax4.grid(True, alpha=0.3, axis='x')
# ===== Plot 5: Traffic Distribution (bottom-left) =====
ax5 = fig.add_subplot(gs[2, 0])
ax5.hist(traffic['traffic_volume'], bins=50, color='steelblue',
edgecolor='black', alpha=0.7)
mean_val = traffic['traffic_volume'].mean()
ax5.axvline(mean_val, color='red', linestyle='--', linewidth=2,
label=f'Mean = {int(mean_val)}')
ax5.set_xlabel('Hourly Traffic Volume', fontsize=11)
ax5.set_ylabel('Frequency', fontsize=11)
ax5.set_title('Traffic Volume Distribution', fontsize=12, fontweight='bold')
ax5.legend(fontsize=10)
ax5.grid(True, alpha=0.3, axis='y')
# ===== Plot 6: Weekday vs Weekend (bottom-right) =====
ax6 = fig.add_subplot(gs[2, 1])
weekday = traffic[traffic['day_of_week'] < 5]['traffic_volume']
weekend = traffic[traffic['day_of_week'] >= 5]['traffic_volume']
ax6.hist(weekday, bins=40, alpha=0.6, color='steelblue',
edgecolor='black', label='Weekday', density=True)
ax6.hist(weekend, bins=40, alpha=0.6, color='coral',
edgecolor='black', label='Weekend', density=True)
ax6.set_xlabel('Hourly Traffic Volume', fontsize=11)
ax6.set_ylabel('Density', fontsize=11)
ax6.set_title('Weekday vs Weekend Distribution', fontsize=12, fontweight='bold')
ax6.legend(fontsize=10)
ax6.grid(True, alpha=0.3, axis='y')
# Overall title
fig.suptitle('I-94 Westbound Traffic Analysis Dashboard (2012-2018)',
fontsize=16, fontweight='bold')
plt.show()Part 7: Key Findings Summary
Statistical Summary
import pandas as pd
traffic = pd.read_csv('i94_traffic.csv')
traffic['date_time'] = pd.to_datetime(traffic['date_time'])
traffic['hour'] = traffic['date_time'].dt.hour
traffic['day_of_week'] = traffic['date_time'].dt.dayofweek
traffic['temp_celsius'] = traffic['temp'] - 273.15
print("=" * 60)
print("I-94 TRAFFIC ANALYSIS - KEY FINDINGS")
print("=" * 60)
# Overall statistics
print(f"\nTotal observations: {len(traffic):,}")
print(f"Time period: {traffic['date_time'].min()} to {traffic['date_time'].max()}")
print(f"\nAverage hourly traffic: {traffic['traffic_volume'].mean():.0f}")
print(f"Median hourly traffic: {traffic['traffic_volume'].median():.0f}")
print(f"Max hourly traffic: {traffic['traffic_volume'].max():.0f}")
print(f"Min hourly traffic: {traffic['traffic_volume'].min():.0f}")
# Peak hours
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
peak_hour = hourly_avg.idxmax()
peak_volume = hourly_avg.max()
print(f"\nPeak hour: {peak_hour}:00 ({peak_volume:.0f} vehicles)")
# Weekday vs Weekend
weekday_avg = traffic[traffic['day_of_week'] < 5]['traffic_volume'].mean()
weekend_avg = traffic[traffic['day_of_week'] >= 5]['traffic_volume'].mean()
print(f"\nWeekday average: {weekday_avg:.0f}")
print(f"Weekend average: {weekend_avg:.0f}")
print(f"Difference: {weekday_avg - weekend_avg:.0f} ({(weekday_avg/weekend_avg - 1)*100:.1f}% higher)")
# Weather correlations
print("\n" + "=" * 60)
print("WEATHER CORRELATIONS")
print("=" * 60)
print(f"Temperature: r = {traffic['temp_celsius'].corr(traffic['traffic_volume']):.3f}")
print(f"Rain: r = {traffic['rain_1h'].corr(traffic['traffic_volume']):.3f}")
print(f"Cloud cover: r = {traffic['clouds_all'].corr(traffic['traffic_volume']):.3f}")
# Weather impact
print("\n" + "=" * 60)
print("AVERAGE TRAFFIC BY WEATHER")
print("=" * 60)
weather_avg = traffic.groupby('weather_main')['traffic_volume'].mean().sort_values(ascending=False)
for condition, avg in weather_avg.items():
print(f"{condition:15s}: {avg:6.0f}")
print("\n" + "=" * 60)Challenge Extensions
Want to go deeper? Try these extensions:
1. Holiday Impact
# Compare traffic on holidays vs non-holidays
holiday_traffic = traffic[traffic['holiday'] != 'None']['traffic_volume'].mean()
normal_traffic = traffic[traffic['holiday'] == 'None']['traffic_volume'].mean()
print(f"Holiday: {holiday_traffic:.0f}, Normal: {normal_traffic:.0f}")2. Season Analysis
# Define seasons and compare
def get_season(month):
if month in [12, 1, 2]: return 'Winter'
elif month in [3, 4, 5]: return 'Spring'
elif month in [6, 7, 8]: return 'Summer'
else: return 'Fall'
traffic['season'] = traffic['month'].apply(get_season)
season_avg = traffic.groupby('season')['traffic_volume'].mean()3. Rush Hour Detection
# Identify rush hours programmatically
hourly_avg = traffic.groupby('hour')['traffic_volume'].mean()
threshold = hourly_avg.quantile(0.75)
rush_hours = hourly_avg[hourly_avg > threshold].index.tolist()
print(f"Rush hours: {rush_hours}")Summary and Reflection
You completed a comprehensive traffic analysis project using:
- Line plots: Time series trends, hourly patterns
- Scatter plots: Temperature correlation, weather impact
- Bar charts: Monthly comparison, weather conditions, weekday vs weekend
- Histograms: Traffic distribution, density comparisons
- Subplots: Multi-panel dashboards with GridSpec
- Statistical analysis: Correlation coefficients, mean/median
- Pandas operations: Grouping, filtering, datetime manipulation
Real-World Skills Applied:
- Loading and cleaning datasets
- Converting datetime formats
- Exploratory data analysis
- Creating publication-quality visualizations
- Building comprehensive dashboards
- Communicating insights through charts
Congratulations! You now have the data visualization skills to analyze and present data professionally. Apply these techniques to your own datasets and continue exploring the rich ecosystem of Python visualization libraries.
Next Steps:
- Explore seaborn for statistical visualizations
- Learn plotly for interactive charts
- Study advanced matplotlib customization
- Practice with diverse datasets
- Build your portfolio with visualization projects