Lesson 10 - Creating Bar Plots

Visualizing Categories

You have learned to plot continuous data with line plots and scatter plots. Now you will learn bar plots—the best way to compare categories and display summary statistics.

By the end of this lesson, you will be able to:

  • Create vertical bar plots with plt.bar()
  • Create horizontal bar plots with plt.barh()
  • Customize bar colors, width, and appearance
  • Compare categories effectively
  • Add value labels to bars
  • Choose appropriate sorting for clarity

Bar charts excel at showing comparisons: Which day has most rentals? Which season is busiest? Which weather condition affects usage most?


When to Use Bar Plots

Bar Plots vs Other Chart Types

Use bar plots when:

  • Comparing categories (days, seasons, products, countries)
  • Showing totals, counts, or averages by group
  • Displaying rankings or top/bottom performers
  • Data has distinct categories (not continuous)

Do NOT use bar plots for:

  • Time series trends (use line plots)
  • Relationships between continuous variables (use scatter plots)
  • Distributions (use histograms)

Bar Plot Anatomy

        Revenue by Product
  400 ──── ├─────┐
           │     │
  300 ──── ├─────┤  ┌─────┐
           │     │  │     │
  200 ──── ├─────┤  ├─────┤  ┌─────┐
           │     │  │     │  │     │
  100 ──── ├─────┤  ├─────┤  ├─────┤
           │     │  │     │  │     │
    0 ──── └─────┴──┴─────┴──┴─────┴──
              A      B      C
           Category Labels

Components:

  • Bars: Height represents value
  • Categories: X-axis labels
  • Values: Y-axis scale
  • Spacing: Gaps between bars for clarity

Creating Basic Bar Plots

Rentals by Day of Week

import pandas as pd
import matplotlib.pyplot as plt

# Load data
bikes = pd.read_csv('day.csv')

# Calculate average rentals by day of week
# weekday: 0=Sunday, 1=Monday, ..., 6=Saturday
day_avg = bikes.groupby('weekday')['cnt'].mean()

# Day names for labels
day_names = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']

# Create bar plot
plt.figure(figsize=(10, 6))
plt.bar(day_names, day_avg.values)
plt.xlabel('Day of Week')
plt.ylabel('Average Bike Rentals')
plt.title('Average Daily Bike Rentals by Day of Week')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

What the plot shows:

  • Weekdays (Mon-Fri) have similar rental levels
  • Weekends (Sat-Sun) show slightly different patterns
  • Friday often peaks as people start weekend activities

Horizontal Bar Plots

When to Use Horizontal Bars

Use horizontal bars when:

  • Category names are long
  • Many categories to display
  • Comparing magnitudes where horizontal layout is clearer
import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Calculate average by season
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']

# Create horizontal bar plot
plt.figure(figsize=(10, 6))
plt.barh(season_names, season_avg.values)
plt.xlabel('Average Bike Rentals')
plt.ylabel('Season')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='x')
plt.show()

Notice:

  • plt.barh() instead of plt.bar()
  • Categories on y-axis, values on x-axis
  • Grid on x-axis instead of y-axis

Customizing Bar Colors

Single Color

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']

plt.figure(figsize=(10, 6))
plt.bar(season_names, season_avg.values, color='steelblue', alpha=0.8, edgecolor='black')
plt.xlabel('Season')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Parameters:

  • color='steelblue': Set bar color
  • alpha=0.8: Transparency (0=invisible, 1=solid)
  • edgecolor='black': Border color around bars

Different Colors per Category

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']

# Define colors for each season
colors = ['lightgreen', 'gold', 'orange', 'lightblue']

plt.figure(figsize=(10, 6))
plt.bar(season_names, season_avg.values, color=colors, edgecolor='black')
plt.xlabel('Season')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Why different colors?

  • Visual association (green=spring, orange=fall)
  • Helps distinguish categories
  • Makes charts more engaging

Sorting for Clarity

Sorted by Value

Sorting bars by value (ascending or descending) makes comparisons easier.

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Average rentals by weather situation
weather_avg = bikes.groupby('weathersit')['cnt'].mean()
weather_names = {1: 'Clear', 2: 'Mist', 3: 'Light Rain/Snow', 4: 'Heavy Rain'}

# Create sorted data
weather_data = [(weather_names[i], weather_avg[i]) for i in weather_avg.index if i in weather_names]
weather_data.sort(key=lambda x: x[1], reverse=True)  # Sort by value descending

categories = [x[0] for x in weather_data]
values = [x[1] for x in weather_data]

# Plot
plt.figure(figsize=(10, 6))
plt.bar(categories, values, color='skyblue', edgecolor='black')
plt.xlabel('Weather Condition')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Weather (Sorted by Value)')
plt.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=15)
plt.show()

Benefits of sorting:

  • Immediately see highest and lowest
  • Easier to identify patterns
  • Professional appearance

Adding Value Labels

Display Values on Bars

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']

plt.figure(figsize=(10, 6))
bars = plt.bar(season_names, season_avg.values, color='steelblue', edgecolor='black')
plt.xlabel('Season')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels on top of bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.0f}',
             ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.show()

What it does:

  • Loops through each bar
  • Gets bar height (the value)
  • Places text above bar center
  • Displays value rounded to whole number

Customizing Bar Width

Adjusting Spacing

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']

# Create figure with two subplots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Default width
axes[0].bar(season_names, season_avg.values, color='steelblue')
axes[0].set_title('Default Width (0.8)')
axes[0].set_ylabel('Average Rentals')
axes[0].grid(True, alpha=0.3, axis='y')

# Custom narrow width
axes[1].bar(season_names, season_avg.values, width=0.4, color='coral')
axes[1].set_title('Narrow Width (0.4)')
axes[1].set_ylabel('Average Rentals')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

Width parameter:

  • Default = 0.8 (80% of space)
  • Smaller values create narrower bars with more spacing
  • Larger values (up to 1.0) create wider bars with less spacing

Practical Examples

Yearly Comparison

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Average rentals by year
yearly_avg = bikes.groupby('yr')['cnt'].mean()
year_labels = ['2011', '2012']

plt.figure(figsize=(8, 6))
bars = plt.bar(year_labels, yearly_avg.values, color=['#3498db', '#e74c3c'],
               edgecolor='black', width=0.6)
plt.xlabel('Year')
plt.ylabel('Average Daily Bike Rentals')
plt.title('Bike Sharing Growth: 2011 vs 2012')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.0f}',
             ha='center', va='bottom', fontsize=12, fontweight='bold')

# Calculate and display growth
growth = ((yearly_avg[1] - yearly_avg[0]) / yearly_avg[0]) * 100
plt.text(0.5, max(yearly_avg.values) * 0.5,
         f'Growth: +{growth:.1f}%',
         ha='center', fontsize=14, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

plt.show()

Holiday vs Non-Holiday

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Compare holiday vs non-holiday
holiday_avg = bikes.groupby('holiday')['cnt'].mean()
categories = ['Non-Holiday', 'Holiday']

plt.figure(figsize=(8, 6))
colors = ['green' if v > holiday_avg.mean() else 'red' for v in holiday_avg.values]
bars = plt.bar(categories, holiday_avg.values, color=colors, alpha=0.7, edgecolor='black')
plt.xlabel('Day Type')
plt.ylabel('Average Bike Rentals')
plt.title('Bike Rentals: Holiday vs Non-Holiday')
plt.grid(True, alpha=0.3, axis='y')

# Add value labels
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.0f}',
             ha='center', va='bottom', fontsize=12, fontweight='bold')

plt.show()

Complete Analysis Example

Weather Impact Report

import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv('day.csv')

# Calculate stats by weather
weather_stats = bikes.groupby('weathersit')['cnt'].agg(['mean', 'count'])
weather_names = {1: 'Clear', 2: 'Mist/Cloudy', 3: 'Light Rain/Snow'}

# Prepare data
categories = [weather_names[i] for i in weather_stats.index if i in weather_names]
avg_rentals = [weather_stats.loc[i, 'mean'] for i in weather_stats.index if i in weather_names]
num_days = [weather_stats.loc[i, 'count'] for i in weather_stats.index if i in weather_names]

# Create plot
fig, ax = plt.subplots(figsize=(12, 6))

colors = ['#2ecc71', '#f39c12', '#e74c3c']  # Green, orange, red
bars = ax.bar(categories, avg_rentals, color=colors, alpha=0.8, edgecolor='black')

ax.set_xlabel('Weather Condition', fontsize=12)
ax.set_ylabel('Average Daily Bike Rentals', fontsize=12)
ax.set_title('Impact of Weather on Bike Rentals', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')

# Add rental count labels
for bar, count in zip(bars, num_days):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.0f}\n({count} days)',
            ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.show()

# Print summary
print("WEATHER IMPACT ANALYSIS")
print("=" * 50)
for cat, avg, days in zip(categories, avg_rentals, num_days):
    pct_of_best = (avg / max(avg_rentals)) * 100
    print(f"{cat:20s}: {avg:6.0f} rentals  ({days:3d} days, {pct_of_best:5.1f}% of best)")
WEATHER IMPACT ANALYSIS
==================================================
Clear               :   4876 rentals  (463 days, 100.0% of best)
Mist/Cloudy         :   4035 rentals  (247 days,  82.8% of best)
Light Rain/Snow     :   1803 rentals  ( 21 days,  37.0% of best)

Summary

You learned to create and customize bar plots:

  • plt.bar() creates vertical bar charts
  • plt.barh() creates horizontal bar charts
  • Color, width, edgecolor customize appearance
  • Sorting by value improves clarity
  • Value labels add precise numbers to bars
  • Bar plots excel at categorical comparisons
  • Choose bar plots for distinct categories, not continuous data

Next Steps: In the next lesson, you will learn to create histograms to visualize data distributions and frequency patterns.

Practice: Create a bar plot comparing casual vs registered user averages by season. Which user type varies more across seasons?