Lesson 11 - Advanced Collections and Data Structures
Introduction
Python’s built-in types (list, dict, set, tuple) are powerful, but the collections module provides specialized data structures that solve specific problems more elegantly and efficiently.
In this lesson, we’ll explore:
deque: Double-ended queue for fast appends/pops from both endsCounter: Count hashable objects easilydefaultdict: Dictionary with default valuesnamedtuple: Lightweight objects with named fieldsChainMap: Combine multiple dictionaries
These tools make your code more readable, efficient, and Pythonic.
deque: Double-Ended Queue
A deque (pronounced “deck”) is like a list but optimized for adding/removing items from both ends.
Why Use deque?
# List: slow for operations at the beginning
my_list = [1, 2, 3, 4, 5]
my_list.insert(0, 0) # O(n) - slow, shifts all elements
# deque: fast for both ends
from collections import deque
my_deque = deque([1, 2, 3, 4, 5])
my_deque.appendleft(0) # O(1) - fast!Basic deque Operations
from collections import deque
# Create a deque
books = deque(['Python Basics', 'Python Advanced'])
# Add to right end
books.append('Data Science 101')
print(books) # deque(['Python Basics', 'Python Advanced', 'Data Science 101'])
# Add to left end
books.appendleft('Programming 101')
print(books) # deque(['Programming 101', 'Python Basics', 'Python Advanced', 'Data Science 101'])
# Remove from right end
last = books.pop()
print(f"Removed: {last}") # Data Science 101
# Remove from left end
first = books.popleft()
print(f"Removed: {first}") # Programming 101
print(books) # deque(['Python Basics', 'Python Advanced'])Practical Example: Recently Viewed Books
from collections import deque
class RecentlyViewed:
def __init__(self, max_size=5):
self.recent = deque(maxlen=max_size) # Auto-removes oldest when full
def view(self, book):
# Remove if already present (to move to front)
if book in self.recent:
self.recent.remove(book)
self.recent.append(book)
def get_recent(self):
return list(self.recent)
# Use it
viewer = RecentlyViewed(max_size=3)
viewer.view("Python Basics")
viewer.view("Python Advanced")
viewer.view("Data Science 101")
print(viewer.get_recent()) # All 3 books
viewer.view("Machine Learning") # Oldest (Python Basics) is auto-removed
print(viewer.get_recent()) # Only last 3
viewer.view("Python Advanced") # Move to front
print(viewer.get_recent())Output:
['Python Basics', 'Python Advanced', 'Data Science 101']
['Python Advanced', 'Data Science 101', 'Machine Learning']
['Data Science 101', 'Machine Learning', 'Python Advanced']Rotating Elements
from collections import deque
books = deque(['Book 1', 'Book 2', 'Book 3', 'Book 4'])
# Rotate right
books.rotate(1)
print(books) # deque(['Book 4', 'Book 1', 'Book 2', 'Book 3'])
# Rotate left
books.rotate(-2)
print(books) # deque(['Book 2', 'Book 3', 'Book 4', 'Book 1'])Counter: Counting Made Easy
Counter is a dictionary subclass for counting hashable objects.
Basic Counter Usage
from collections import Counter
# Count items in a list
books = ['Python', 'Java', 'Python', 'C++', 'Python', 'Java']
book_counts = Counter(books)
print(book_counts) # Counter({'Python': 3, 'Java': 2, 'C++': 1})
print(book_counts['Python']) # 3
print(book_counts['Ruby']) # 0 (default for missing keys)Counting Characters
from collections import Counter
title = "Python Programming"
char_counts = Counter(title.lower())
print(char_counts)
print(f"Most common: {char_counts.most_common(3)}")Output:
Counter({'p': 2, 'r': 2, 'o': 2, 'm': 2, 'y': 1, 't': 1, 'h': 1, 'n': 1, ' ': 1, 'g': 1, 'i': 1, 'a': 1})
Most common: [('p', 2), ('r', 2), ('o', 2)]Practical Example: Word Frequency Analysis
from collections import Counter
import re
text = """
Python is a great programming language. Python is easy to learn.
Many people use Python for data science. Python is powerful.
"""
# Extract words (lowercase, alphabetic only)
words = re.findall(r'\b[a-z]+\b', text.lower())
# Count word frequencies
word_freq = Counter(words)
print("Top 5 most common words:")
for word, count in word_freq.most_common(5):
print(f" {word}: {count}")
# Total unique words
print(f"\nTotal unique words: {len(word_freq)}")
# Total words
print(f"Total words: {sum(word_freq.values())}")Output:
Top 5 most common words:
python: 4
is: 3
to: 1
a: 1
great: 1
Total unique words: 16
Total words: 23Counter Arithmetic
from collections import Counter
# Sales from Store 1 and Store 2
store1 = Counter({'Python Basics': 10, 'Python Advanced': 5, 'Data Science': 8})
store2 = Counter({'Python Basics': 7, 'Python Advanced': 12, 'Machine Learning': 6})
# Combine sales
total_sales = store1 + store2
print("Total sales:", total_sales)
# Difference
difference = store1 - store2
print("Store 1 sold more:", difference)
# Intersection (minimum)
common = store1 & store2
print("Minimum from both:", common)
# Union (maximum)
max_sales = store1 | store2
print("Maximum from either:", max_sales)Output:
Total sales: Counter({'Python Advanced': 17, 'Python Basics': 17, 'Data Science': 8, 'Machine Learning': 6})
Store 1 sold more: Counter({'Data Science': 8})
Minimum from both: Counter({'Python Basics': 7, 'Python Advanced': 5})
Maximum from either: Counter({'Python Advanced': 12, 'Python Basics': 10, 'Data Science': 8, 'Machine Learning': 6})defaultdict: Dictionary with Default Values
defaultdict provides default values for missing keys, avoiding KeyError.
Basic defaultdict Usage
from collections import defaultdict
# Regular dict - raises KeyError
regular_dict = {}
# regular_dict['unknown'] += 1 # KeyError!
# defaultdict with default int (0)
book_counts = defaultdict(int)
book_counts['Python Basics'] += 1
book_counts['Python Basics'] += 1
book_counts['Python Advanced'] += 1
print(book_counts) # defaultdict(<class 'int'>, {'Python Basics': 2, 'Python Advanced': 1})Grouping with defaultdict(list)
from collections import defaultdict
books = [
{'title': 'Python Basics', 'category': 'Programming'},
{'title': 'Java Intro', 'category': 'Programming'},
{'title': 'Data Science 101', 'category': 'Data Science'},
{'title': 'Statistics', 'category': 'Data Science'},
{'title': 'Python Advanced', 'category': 'Programming'},
]
# Group books by category
by_category = defaultdict(list)
for book in books:
by_category[book['category']].append(book['title'])
for category, titles in by_category.items():
print(f"\n{category}:")
for title in titles:
print(f" - {title}")Output:
Programming:
- Python Basics
- Java Intro
- Python Advanced
Data Science:
- Data Science 101
- StatisticsPractical Example: Inverted Index
from collections import defaultdict
# Build an inverted index (which books contain which words)
books = {
'Book 1': 'Python programming basics',
'Book 2': 'Advanced Python techniques',
'Book 3': 'Data science with Python',
}
inverted_index = defaultdict(list)
for book_id, text in books.items():
for word in text.lower().split():
inverted_index[word].append(book_id)
# Find books containing specific words
def search_books(word):
return inverted_index.get(word.lower(), [])
print("Books with 'python':", search_books('python'))
print("Books with 'data':", search_books('data'))
print("Books with 'java':", search_books('java'))Output:
Books with 'python': ['Book 1', 'Book 2', 'Book 3']
Books with 'data': ['Book 3']
Books with 'java': []namedtuple: Lightweight Objects
namedtuple creates tuple subclasses with named fields—combining tuple efficiency with attribute access.
Basic namedtuple Usage
from collections import namedtuple
# Define a Book type
Book = namedtuple('Book', ['title', 'author', 'price'])
# Create instances
book1 = Book('Python Basics', 'John Doe', 29.99)
book2 = Book(title='Python Advanced', author='Jane Smith', price=49.99)
# Access by name (like an object)
print(f"{book1.title} by {book1.author}: ${book1.price}")
# Access by index (like a tuple)
print(f"{book1[0]} by {book1[1]}: ${book1[2]}")
# Immutable (like a tuple)
# book1.price = 24.99 # AttributeError!Output:
Python Basics by John Doe: $29.99
Python Basics by John Doe: $29.99Practical Example: Book Catalog
from collections import namedtuple
Book = namedtuple('Book', ['title', 'author', 'price', 'year'])
catalog = [
Book('Python Crash Course', 'Eric Matthes', 39.99, 2019),
Book('Automate the Boring Stuff', 'Al Sweigart', 29.99, 2020),
Book('Learning Python', 'Mark Lutz', 64.99, 2013),
]
# Sort by price
by_price = sorted(catalog, key=lambda b: b.price)
print("Books sorted by price:")
for book in by_price:
print(f" ${book.price:6.2f} - {book.title} ({book.year})")
# Filter by year
recent_books = [b for b in catalog if b.year >= 2019]
print("\nRecent books (2019+):")
for book in recent_books:
print(f" {book.title} by {book.author}")Output:
Books sorted by price:
$ 29.99 - Automate the Boring Stuff (2020)
$ 39.99 - Python Crash Course (2019)
$ 64.99 - Learning Python (2013)
Recent books (2019+):
Python Crash Course by Eric Matthes
Automate the Boring Stuff by Al Sweigartnamedtuple Methods
from collections import namedtuple
Book = namedtuple('Book', ['title', 'author', 'price'])
book = Book('Python Basics', 'John Doe', 29.99)
# Convert to dictionary
print(book._asdict()) # {'title': 'Python Basics', 'author': 'John Doe', 'price': 29.99}
# Replace fields (returns new instance)
cheaper_book = book._replace(price=19.99)
print(cheaper_book) # Book(title='Python Basics', author='John Doe', price=19.99)
# Create from sequence
book_data = ['Python Advanced', 'Jane Smith', 49.99]
book2 = Book._make(book_data)
print(book2) # Book(title='Python Advanced', author='Jane Smith', price=49.99)ChainMap: Combine Multiple Dictionaries
ChainMap groups multiple dictionaries into a single view.
Basic ChainMap Usage
from collections import ChainMap
# Default prices
default_prices = {'Python Basics': 29.99, 'Java Intro': 34.99}
# Sale prices (override defaults)
sale_prices = {'Python Basics': 19.99}
# Member prices (further override)
member_prices = {}
# Chain them (searches left to right)
all_prices = ChainMap(member_prices, sale_prices, default_prices)
print(all_prices['Python Basics']) # 19.99 (from sale_prices)
print(all_prices['Java Intro']) # 34.99 (from default_prices)
# Add member price
member_prices['Python Basics'] = 14.99
print(all_prices['Python Basics']) # 14.99 (from member_prices now)Output:
19.99
34.99
14.99Practical Example: Configuration Hierarchy
from collections import ChainMap
# System defaults
system_config = {
'theme': 'light',
'font_size': 12,
'auto_save': True,
}
# User preferences
user_config = {
'theme': 'dark',
'font_size': 14,
}
# Session overrides
session_config = {
'font_size': 16,
}
# Combine (session > user > system)
config = ChainMap(session_config, user_config, system_config)
print(f"Theme: {config['theme']}") # dark (from user_config)
print(f"Font size: {config['font_size']}") # 16 (from session_config)
print(f"Auto-save: {config['auto_save']}") # True (from system_config)
# All keys from all dicts
print(f"\nAll settings: {dict(config)}")Output:
Theme: dark
Font size: 16
Auto-save: True
All settings: {'font_size': 16, 'theme': 'dark', 'auto_save': True}Comprehensive Example: Book Store Analytics
Let’s combine multiple collection types:
from collections import Counter, defaultdict, namedtuple, deque
# Define Book type
Book = namedtuple('Book', ['title', 'author', 'category', 'price'])
# Sample sales data
sales = [
Book('Python Basics', 'John Doe', 'Programming', 29.99),
Book('Python Advanced', 'Jane Smith', 'Programming', 49.99),
Book('Python Basics', 'John Doe', 'Programming', 29.99),
Book('Data Science 101', 'Bob Johnson', 'Data Science', 39.99),
Book('Python Basics', 'John Doe', 'Programming', 29.99),
Book('Machine Learning', 'Alice Williams', 'Data Science', 59.99),
Book('Data Science 101', 'Bob Johnson', 'Data Science', 39.99),
]
# 1. Count best-selling books (Counter)
book_sales = Counter(book.title for book in sales)
print("=== Best-Selling Books ===")
for title, count in book_sales.most_common(3):
print(f" {count}x {title}")
# 2. Group by category (defaultdict)
by_category = defaultdict(list)
for book in sales:
by_category[book.category].append(book.title)
print("\n=== Sales by Category ===")
for category, titles in by_category.items():
print(f" {category}: {len(titles)} sales")
# 3. Calculate revenue by category
category_revenue = defaultdict(float)
for book in sales:
category_revenue[book.category] += book.price
print("\n=== Revenue by Category ===")
for category, revenue in sorted(category_revenue.items(), key=lambda x: x[1], reverse=True):
print(f" {category}: ${revenue:.2f}")
# 4. Recently viewed (deque)
recently_viewed = deque(maxlen=3)
for book in sales[-5:]: # Last 5 sales
if book.title not in recently_viewed:
recently_viewed.append(book.title)
print("\n=== Recently Viewed ===")
for title in recently_viewed:
print(f" - {title}")
# 5. Total statistics
total_revenue = sum(book.price for book in sales)
unique_books = len(set(book.title for book in sales))
print("\n=== Summary ===")
print(f" Total sales: {len(sales)}")
print(f" Unique books: {unique_books}")
print(f" Total revenue: ${total_revenue:.2f}")
print(f" Average sale: ${total_revenue/len(sales):.2f}")Output:
=== Best-Selling Books ===
3x Python Basics
2x Data Science 101
1x Python Advanced
=== Sales by Category ===
Programming: 5 sales
Data Science: 2 sales
=== Revenue by Category ===
Programming: $189.95
Data Science: $139.97
=== Recently Viewed ===
- Python Basics
- Machine Learning
- Data Science 101
=== Summary ===
Total sales: 7
Unique books: 4
Total revenue: $329.92
Average sale: $47.13Summary
In this lesson, we learned about advanced collection types:
deque: Fast appends/pops from both ends, great for queues and recent itemsCounter: Count hashable objects, find most common, arithmetic operationsdefaultdict: Dictionary with automatic default values for missing keysnamedtuple: Immutable objects with named fields, memory-efficientChainMap: Combine multiple dictionaries with priority
When to use each:
deque: Queue-like operations, limited-size collections, fast operations at both endsCounter: Counting, frequency analysis, finding most/least commondefaultdict: Grouping, counting without checking if key existsnamedtuple: Lightweight data objects, CSV rows, API responsesChainMap: Configuration hierarchies, scope chains, multiple contexts
These specialized collections make code more readable, efficient, and Pythonic!
In the next and final lesson, we’ll explore working with dates, times, and timezones in Python.