Lesson 10 - Creating Bar Plots
Visualizing Categories
You have learned to plot continuous data with line plots and scatter plots. Now you will learn bar plots—the best way to compare categories and display summary statistics.
By the end of this lesson, you will be able to:
- Create vertical bar plots with
plt.bar() - Create horizontal bar plots with
plt.barh() - Customize bar colors, width, and appearance
- Compare categories effectively
- Add value labels to bars
- Choose appropriate sorting for clarity
Bar charts excel at showing comparisons: Which day has most rentals? Which season is busiest? Which weather condition affects usage most?
When to Use Bar Plots
Bar Plots vs Other Chart Types
Use bar plots when:
- Comparing categories (days, seasons, products, countries)
- Showing totals, counts, or averages by group
- Displaying rankings or top/bottom performers
- Data has distinct categories (not continuous)
Do NOT use bar plots for:
- Time series trends (use line plots)
- Relationships between continuous variables (use scatter plots)
- Distributions (use histograms)
Bar Plot Anatomy
Revenue by Product
│
400 ──── ├─────┐
│ │
300 ──── ├─────┤ ┌─────┐
│ │ │ │
200 ──── ├─────┤ ├─────┤ ┌─────┐
│ │ │ │ │ │
100 ──── ├─────┤ ├─────┤ ├─────┤
│ │ │ │ │ │
0 ──── └─────┴──┴─────┴──┴─────┴──
A B C
Category LabelsComponents:
- Bars: Height represents value
- Categories: X-axis labels
- Values: Y-axis scale
- Spacing: Gaps between bars for clarity
Creating Basic Bar Plots
Rentals by Day of Week
import pandas as pd
import matplotlib.pyplot as plt
# Load data
bikes = pd.read_csv('day.csv')
# Calculate average rentals by day of week
# weekday: 0=Sunday, 1=Monday, ..., 6=Saturday
day_avg = bikes.groupby('weekday')['cnt'].mean()
# Day names for labels
day_names = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
# Create bar plot
plt.figure(figsize=(10, 6))
plt.bar(day_names, day_avg.values)
plt.xlabel('Day of Week')
plt.ylabel('Average Bike Rentals')
plt.title('Average Daily Bike Rentals by Day of Week')
plt.grid(True, alpha=0.3, axis='y')
plt.show()What the plot shows:
- Weekdays (Mon-Fri) have similar rental levels
- Weekends (Sat-Sun) show slightly different patterns
- Friday often peaks as people start weekend activities
Horizontal Bar Plots
When to Use Horizontal Bars
Use horizontal bars when:
- Category names are long
- Many categories to display
- Comparing magnitudes where horizontal layout is clearer
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
# Calculate average by season
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']
# Create horizontal bar plot
plt.figure(figsize=(10, 6))
plt.barh(season_names, season_avg.values)
plt.xlabel('Average Bike Rentals')
plt.ylabel('Season')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='x')
plt.show()Notice:
plt.barh()instead ofplt.bar()- Categories on y-axis, values on x-axis
- Grid on x-axis instead of y-axis
Customizing Bar Colors
Single Color
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']
plt.figure(figsize=(10, 6))
plt.bar(season_names, season_avg.values, color='steelblue', alpha=0.8, edgecolor='black')
plt.xlabel('Season')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='y')
plt.show()Parameters:
color='steelblue': Set bar coloralpha=0.8: Transparency (0=invisible, 1=solid)edgecolor='black': Border color around bars
Different Colors per Category
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']
# Define colors for each season
colors = ['lightgreen', 'gold', 'orange', 'lightblue']
plt.figure(figsize=(10, 6))
plt.bar(season_names, season_avg.values, color=colors, edgecolor='black')
plt.xlabel('Season')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='y')
plt.show()Why different colors?
- Visual association (green=spring, orange=fall)
- Helps distinguish categories
- Makes charts more engaging
Sorting for Clarity
Sorted by Value
Sorting bars by value (ascending or descending) makes comparisons easier.
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
# Average rentals by weather situation
weather_avg = bikes.groupby('weathersit')['cnt'].mean()
weather_names = {1: 'Clear', 2: 'Mist', 3: 'Light Rain/Snow', 4: 'Heavy Rain'}
# Create sorted data
weather_data = [(weather_names[i], weather_avg[i]) for i in weather_avg.index if i in weather_names]
weather_data.sort(key=lambda x: x[1], reverse=True) # Sort by value descending
categories = [x[0] for x in weather_data]
values = [x[1] for x in weather_data]
# Plot
plt.figure(figsize=(10, 6))
plt.bar(categories, values, color='skyblue', edgecolor='black')
plt.xlabel('Weather Condition')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Weather (Sorted by Value)')
plt.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=15)
plt.show()Benefits of sorting:
- Immediately see highest and lowest
- Easier to identify patterns
- Professional appearance
Adding Value Labels
Display Values on Bars
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']
plt.figure(figsize=(10, 6))
bars = plt.bar(season_names, season_avg.values, color='steelblue', edgecolor='black')
plt.xlabel('Season')
plt.ylabel('Average Bike Rentals')
plt.title('Average Bike Rentals by Season')
plt.grid(True, alpha=0.3, axis='y')
# Add value labels on top of bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.0f}',
ha='center', va='bottom', fontsize=10, fontweight='bold')
plt.show()What it does:
- Loops through each bar
- Gets bar height (the value)
- Places text above bar center
- Displays value rounded to whole number
Customizing Bar Width
Adjusting Spacing
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
season_avg = bikes.groupby('season')['cnt'].mean()
season_names = ['Spring', 'Summer', 'Fall', 'Winter']
# Create figure with two subplots
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Default width
axes[0].bar(season_names, season_avg.values, color='steelblue')
axes[0].set_title('Default Width (0.8)')
axes[0].set_ylabel('Average Rentals')
axes[0].grid(True, alpha=0.3, axis='y')
# Custom narrow width
axes[1].bar(season_names, season_avg.values, width=0.4, color='coral')
axes[1].set_title('Narrow Width (0.4)')
axes[1].set_ylabel('Average Rentals')
axes[1].grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()Width parameter:
- Default = 0.8 (80% of space)
- Smaller values create narrower bars with more spacing
- Larger values (up to 1.0) create wider bars with less spacing
Practical Examples
Yearly Comparison
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
# Average rentals by year
yearly_avg = bikes.groupby('yr')['cnt'].mean()
year_labels = ['2011', '2012']
plt.figure(figsize=(8, 6))
bars = plt.bar(year_labels, yearly_avg.values, color=['#3498db', '#e74c3c'],
edgecolor='black', width=0.6)
plt.xlabel('Year')
plt.ylabel('Average Daily Bike Rentals')
plt.title('Bike Sharing Growth: 2011 vs 2012')
plt.grid(True, alpha=0.3, axis='y')
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.0f}',
ha='center', va='bottom', fontsize=12, fontweight='bold')
# Calculate and display growth
growth = ((yearly_avg[1] - yearly_avg[0]) / yearly_avg[0]) * 100
plt.text(0.5, max(yearly_avg.values) * 0.5,
f'Growth: +{growth:.1f}%',
ha='center', fontsize=14, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
plt.show()Holiday vs Non-Holiday
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
# Compare holiday vs non-holiday
holiday_avg = bikes.groupby('holiday')['cnt'].mean()
categories = ['Non-Holiday', 'Holiday']
plt.figure(figsize=(8, 6))
colors = ['green' if v > holiday_avg.mean() else 'red' for v in holiday_avg.values]
bars = plt.bar(categories, holiday_avg.values, color=colors, alpha=0.7, edgecolor='black')
plt.xlabel('Day Type')
plt.ylabel('Average Bike Rentals')
plt.title('Bike Rentals: Holiday vs Non-Holiday')
plt.grid(True, alpha=0.3, axis='y')
# Add value labels
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.0f}',
ha='center', va='bottom', fontsize=12, fontweight='bold')
plt.show()Complete Analysis Example
Weather Impact Report
import pandas as pd
import matplotlib.pyplot as plt
bikes = pd.read_csv('day.csv')
# Calculate stats by weather
weather_stats = bikes.groupby('weathersit')['cnt'].agg(['mean', 'count'])
weather_names = {1: 'Clear', 2: 'Mist/Cloudy', 3: 'Light Rain/Snow'}
# Prepare data
categories = [weather_names[i] for i in weather_stats.index if i in weather_names]
avg_rentals = [weather_stats.loc[i, 'mean'] for i in weather_stats.index if i in weather_names]
num_days = [weather_stats.loc[i, 'count'] for i in weather_stats.index if i in weather_names]
# Create plot
fig, ax = plt.subplots(figsize=(12, 6))
colors = ['#2ecc71', '#f39c12', '#e74c3c'] # Green, orange, red
bars = ax.bar(categories, avg_rentals, color=colors, alpha=0.8, edgecolor='black')
ax.set_xlabel('Weather Condition', fontsize=12)
ax.set_ylabel('Average Daily Bike Rentals', fontsize=12)
ax.set_title('Impact of Weather on Bike Rentals', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')
# Add rental count labels
for bar, count in zip(bars, num_days):
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.0f}\n({count} days)',
ha='center', va='bottom', fontsize=10, fontweight='bold')
plt.tight_layout()
plt.show()
# Print summary
print("WEATHER IMPACT ANALYSIS")
print("=" * 50)
for cat, avg, days in zip(categories, avg_rentals, num_days):
pct_of_best = (avg / max(avg_rentals)) * 100
print(f"{cat:20s}: {avg:6.0f} rentals ({days:3d} days, {pct_of_best:5.1f}% of best)")WEATHER IMPACT ANALYSIS
==================================================
Clear : 4876 rentals (463 days, 100.0% of best)
Mist/Cloudy : 4035 rentals (247 days, 82.8% of best)
Light Rain/Snow : 1803 rentals ( 21 days, 37.0% of best)Summary
You learned to create and customize bar plots:
plt.bar()creates vertical bar chartsplt.barh()creates horizontal bar charts- Color, width, edgecolor customize appearance
- Sorting by value improves clarity
- Value labels add precise numbers to bars
- Bar plots excel at categorical comparisons
- Choose bar plots for distinct categories, not continuous data
Next Steps: In the next lesson, you will learn to create histograms to visualize data distributions and frequency patterns.
Practice: Create a bar plot comparing casual vs registered user averages by season. Which user type varies more across seasons?