Lesson 11 - Advanced Collections and Data Structures

Introduction

Python’s built-in types (list, dict, set, tuple) are powerful, but the collections module provides specialized data structures that solve specific problems more elegantly and efficiently.

In this lesson, we’ll explore:

  • deque: Double-ended queue for fast appends/pops from both ends
  • Counter: Count hashable objects easily
  • defaultdict: Dictionary with default values
  • namedtuple: Lightweight objects with named fields
  • ChainMap: Combine multiple dictionaries

These tools make your code more readable, efficient, and Pythonic.

deque: Double-Ended Queue

A deque (pronounced “deck”) is like a list but optimized for adding/removing items from both ends.

Why Use deque?

# List: slow for operations at the beginning
my_list = [1, 2, 3, 4, 5]
my_list.insert(0, 0)  # O(n) - slow, shifts all elements

# deque: fast for both ends
from collections import deque

my_deque = deque([1, 2, 3, 4, 5])
my_deque.appendleft(0)  # O(1) - fast!

Basic deque Operations

from collections import deque

# Create a deque
books = deque(['Python Basics', 'Python Advanced'])

# Add to right end
books.append('Data Science 101')
print(books)  # deque(['Python Basics', 'Python Advanced', 'Data Science 101'])

# Add to left end
books.appendleft('Programming 101')
print(books)  # deque(['Programming 101', 'Python Basics', 'Python Advanced', 'Data Science 101'])

# Remove from right end
last = books.pop()
print(f"Removed: {last}")  # Data Science 101

# Remove from left end
first = books.popleft()
print(f"Removed: {first}")  # Programming 101

print(books)  # deque(['Python Basics', 'Python Advanced'])

Practical Example: Recently Viewed Books

from collections import deque

class RecentlyViewed:
    def __init__(self, max_size=5):
        self.recent = deque(maxlen=max_size)  # Auto-removes oldest when full

    def view(self, book):
        # Remove if already present (to move to front)
        if book in self.recent:
            self.recent.remove(book)

        self.recent.append(book)

    def get_recent(self):
        return list(self.recent)

# Use it
viewer = RecentlyViewed(max_size=3)

viewer.view("Python Basics")
viewer.view("Python Advanced")
viewer.view("Data Science 101")
print(viewer.get_recent())  # All 3 books

viewer.view("Machine Learning")  # Oldest (Python Basics) is auto-removed
print(viewer.get_recent())  # Only last 3

viewer.view("Python Advanced")  # Move to front
print(viewer.get_recent())

Output:

['Python Basics', 'Python Advanced', 'Data Science 101']
['Python Advanced', 'Data Science 101', 'Machine Learning']
['Data Science 101', 'Machine Learning', 'Python Advanced']

Rotating Elements

from collections import deque

books = deque(['Book 1', 'Book 2', 'Book 3', 'Book 4'])

# Rotate right
books.rotate(1)
print(books)  # deque(['Book 4', 'Book 1', 'Book 2', 'Book 3'])

# Rotate left
books.rotate(-2)
print(books)  # deque(['Book 2', 'Book 3', 'Book 4', 'Book 1'])

Counter: Counting Made Easy

Counter is a dictionary subclass for counting hashable objects.

Basic Counter Usage

from collections import Counter

# Count items in a list
books = ['Python', 'Java', 'Python', 'C++', 'Python', 'Java']
book_counts = Counter(books)

print(book_counts)  # Counter({'Python': 3, 'Java': 2, 'C++': 1})
print(book_counts['Python'])  # 3
print(book_counts['Ruby'])  # 0 (default for missing keys)

Counting Characters

from collections import Counter

title = "Python Programming"
char_counts = Counter(title.lower())

print(char_counts)
print(f"Most common: {char_counts.most_common(3)}")

Output:

Counter({'p': 2, 'r': 2, 'o': 2, 'm': 2, 'y': 1, 't': 1, 'h': 1, 'n': 1, ' ': 1, 'g': 1, 'i': 1, 'a': 1})
Most common: [('p', 2), ('r', 2), ('o', 2)]

Practical Example: Word Frequency Analysis

from collections import Counter
import re

text = """
Python is a great programming language. Python is easy to learn.
Many people use Python for data science. Python is powerful.
"""

# Extract words (lowercase, alphabetic only)
words = re.findall(r'\b[a-z]+\b', text.lower())

# Count word frequencies
word_freq = Counter(words)

print("Top 5 most common words:")
for word, count in word_freq.most_common(5):
    print(f"  {word}: {count}")

# Total unique words
print(f"\nTotal unique words: {len(word_freq)}")

# Total words
print(f"Total words: {sum(word_freq.values())}")

Output:

Top 5 most common words:
  python: 4
  is: 3
  to: 1
  a: 1
  great: 1

Total unique words: 16
Total words: 23

Counter Arithmetic

from collections import Counter

# Sales from Store 1 and Store 2
store1 = Counter({'Python Basics': 10, 'Python Advanced': 5, 'Data Science': 8})
store2 = Counter({'Python Basics': 7, 'Python Advanced': 12, 'Machine Learning': 6})

# Combine sales
total_sales = store1 + store2
print("Total sales:", total_sales)

# Difference
difference = store1 - store2
print("Store 1 sold more:", difference)

# Intersection (minimum)
common = store1 & store2
print("Minimum from both:", common)

# Union (maximum)
max_sales = store1 | store2
print("Maximum from either:", max_sales)

Output:

Total sales: Counter({'Python Advanced': 17, 'Python Basics': 17, 'Data Science': 8, 'Machine Learning': 6})
Store 1 sold more: Counter({'Data Science': 8})
Minimum from both: Counter({'Python Basics': 7, 'Python Advanced': 5})
Maximum from either: Counter({'Python Advanced': 12, 'Python Basics': 10, 'Data Science': 8, 'Machine Learning': 6})

defaultdict: Dictionary with Default Values

defaultdict provides default values for missing keys, avoiding KeyError.

Basic defaultdict Usage

from collections import defaultdict

# Regular dict - raises KeyError
regular_dict = {}
# regular_dict['unknown'] += 1  # KeyError!

# defaultdict with default int (0)
book_counts = defaultdict(int)
book_counts['Python Basics'] += 1
book_counts['Python Basics'] += 1
book_counts['Python Advanced'] += 1

print(book_counts)  # defaultdict(<class 'int'>, {'Python Basics': 2, 'Python Advanced': 1})

Grouping with defaultdict(list)

from collections import defaultdict

books = [
    {'title': 'Python Basics', 'category': 'Programming'},
    {'title': 'Java Intro', 'category': 'Programming'},
    {'title': 'Data Science 101', 'category': 'Data Science'},
    {'title': 'Statistics', 'category': 'Data Science'},
    {'title': 'Python Advanced', 'category': 'Programming'},
]

# Group books by category
by_category = defaultdict(list)

for book in books:
    by_category[book['category']].append(book['title'])

for category, titles in by_category.items():
    print(f"\n{category}:")
    for title in titles:
        print(f"  - {title}")

Output:

Programming:
  - Python Basics
  - Java Intro
  - Python Advanced

Data Science:
  - Data Science 101
  - Statistics

Practical Example: Inverted Index

from collections import defaultdict

# Build an inverted index (which books contain which words)
books = {
    'Book 1': 'Python programming basics',
    'Book 2': 'Advanced Python techniques',
    'Book 3': 'Data science with Python',
}

inverted_index = defaultdict(list)

for book_id, text in books.items():
    for word in text.lower().split():
        inverted_index[word].append(book_id)

# Find books containing specific words
def search_books(word):
    return inverted_index.get(word.lower(), [])

print("Books with 'python':", search_books('python'))
print("Books with 'data':", search_books('data'))
print("Books with 'java':", search_books('java'))

Output:

Books with 'python': ['Book 1', 'Book 2', 'Book 3']
Books with 'data': ['Book 3']
Books with 'java': []

namedtuple: Lightweight Objects

namedtuple creates tuple subclasses with named fields—combining tuple efficiency with attribute access.

Basic namedtuple Usage

from collections import namedtuple

# Define a Book type
Book = namedtuple('Book', ['title', 'author', 'price'])

# Create instances
book1 = Book('Python Basics', 'John Doe', 29.99)
book2 = Book(title='Python Advanced', author='Jane Smith', price=49.99)

# Access by name (like an object)
print(f"{book1.title} by {book1.author}: ${book1.price}")

# Access by index (like a tuple)
print(f"{book1[0]} by {book1[1]}: ${book1[2]}")

# Immutable (like a tuple)
# book1.price = 24.99  # AttributeError!

Output:

Python Basics by John Doe: $29.99
Python Basics by John Doe: $29.99

Practical Example: Book Catalog

from collections import namedtuple

Book = namedtuple('Book', ['title', 'author', 'price', 'year'])

catalog = [
    Book('Python Crash Course', 'Eric Matthes', 39.99, 2019),
    Book('Automate the Boring Stuff', 'Al Sweigart', 29.99, 2020),
    Book('Learning Python', 'Mark Lutz', 64.99, 2013),
]

# Sort by price
by_price = sorted(catalog, key=lambda b: b.price)

print("Books sorted by price:")
for book in by_price:
    print(f"  ${book.price:6.2f} - {book.title} ({book.year})")

# Filter by year
recent_books = [b for b in catalog if b.year >= 2019]

print("\nRecent books (2019+):")
for book in recent_books:
    print(f"  {book.title} by {book.author}")

Output:

Books sorted by price:
  $ 29.99 - Automate the Boring Stuff (2020)
  $ 39.99 - Python Crash Course (2019)
  $ 64.99 - Learning Python (2013)

Recent books (2019+):
  Python Crash Course by Eric Matthes
  Automate the Boring Stuff by Al Sweigart

namedtuple Methods

from collections import namedtuple

Book = namedtuple('Book', ['title', 'author', 'price'])

book = Book('Python Basics', 'John Doe', 29.99)

# Convert to dictionary
print(book._asdict())  # {'title': 'Python Basics', 'author': 'John Doe', 'price': 29.99}

# Replace fields (returns new instance)
cheaper_book = book._replace(price=19.99)
print(cheaper_book)  # Book(title='Python Basics', author='John Doe', price=19.99)

# Create from sequence
book_data = ['Python Advanced', 'Jane Smith', 49.99]
book2 = Book._make(book_data)
print(book2)  # Book(title='Python Advanced', author='Jane Smith', price=49.99)

ChainMap: Combine Multiple Dictionaries

ChainMap groups multiple dictionaries into a single view.

Basic ChainMap Usage

from collections import ChainMap

# Default prices
default_prices = {'Python Basics': 29.99, 'Java Intro': 34.99}

# Sale prices (override defaults)
sale_prices = {'Python Basics': 19.99}

# Member prices (further override)
member_prices = {}

# Chain them (searches left to right)
all_prices = ChainMap(member_prices, sale_prices, default_prices)

print(all_prices['Python Basics'])  # 19.99 (from sale_prices)
print(all_prices['Java Intro'])  # 34.99 (from default_prices)

# Add member price
member_prices['Python Basics'] = 14.99
print(all_prices['Python Basics'])  # 14.99 (from member_prices now)

Output:

19.99
34.99
14.99

Practical Example: Configuration Hierarchy

from collections import ChainMap

# System defaults
system_config = {
    'theme': 'light',
    'font_size': 12,
    'auto_save': True,
}

# User preferences
user_config = {
    'theme': 'dark',
    'font_size': 14,
}

# Session overrides
session_config = {
    'font_size': 16,
}

# Combine (session > user > system)
config = ChainMap(session_config, user_config, system_config)

print(f"Theme: {config['theme']}")  # dark (from user_config)
print(f"Font size: {config['font_size']}")  # 16 (from session_config)
print(f"Auto-save: {config['auto_save']}")  # True (from system_config)

# All keys from all dicts
print(f"\nAll settings: {dict(config)}")

Output:

Theme: dark
Font size: 16
Auto-save: True

All settings: {'font_size': 16, 'theme': 'dark', 'auto_save': True}

Comprehensive Example: Book Store Analytics

Let’s combine multiple collection types:

from collections import Counter, defaultdict, namedtuple, deque

# Define Book type
Book = namedtuple('Book', ['title', 'author', 'category', 'price'])

# Sample sales data
sales = [
    Book('Python Basics', 'John Doe', 'Programming', 29.99),
    Book('Python Advanced', 'Jane Smith', 'Programming', 49.99),
    Book('Python Basics', 'John Doe', 'Programming', 29.99),
    Book('Data Science 101', 'Bob Johnson', 'Data Science', 39.99),
    Book('Python Basics', 'John Doe', 'Programming', 29.99),
    Book('Machine Learning', 'Alice Williams', 'Data Science', 59.99),
    Book('Data Science 101', 'Bob Johnson', 'Data Science', 39.99),
]

# 1. Count best-selling books (Counter)
book_sales = Counter(book.title for book in sales)
print("=== Best-Selling Books ===")
for title, count in book_sales.most_common(3):
    print(f"  {count}x {title}")

# 2. Group by category (defaultdict)
by_category = defaultdict(list)
for book in sales:
    by_category[book.category].append(book.title)

print("\n=== Sales by Category ===")
for category, titles in by_category.items():
    print(f"  {category}: {len(titles)} sales")

# 3. Calculate revenue by category
category_revenue = defaultdict(float)
for book in sales:
    category_revenue[book.category] += book.price

print("\n=== Revenue by Category ===")
for category, revenue in sorted(category_revenue.items(), key=lambda x: x[1], reverse=True):
    print(f"  {category}: ${revenue:.2f}")

# 4. Recently viewed (deque)
recently_viewed = deque(maxlen=3)
for book in sales[-5:]:  # Last 5 sales
    if book.title not in recently_viewed:
        recently_viewed.append(book.title)

print("\n=== Recently Viewed ===")
for title in recently_viewed:
    print(f"  - {title}")

# 5. Total statistics
total_revenue = sum(book.price for book in sales)
unique_books = len(set(book.title for book in sales))

print("\n=== Summary ===")
print(f"  Total sales: {len(sales)}")
print(f"  Unique books: {unique_books}")
print(f"  Total revenue: ${total_revenue:.2f}")
print(f"  Average sale: ${total_revenue/len(sales):.2f}")

Output:

=== Best-Selling Books ===
  3x Python Basics
  2x Data Science 101
  1x Python Advanced

=== Sales by Category ===
  Programming: 5 sales
  Data Science: 2 sales

=== Revenue by Category ===
  Programming: $189.95
  Data Science: $139.97

=== Recently Viewed ===
  - Python Basics
  - Machine Learning
  - Data Science 101

=== Summary ===
  Total sales: 7
  Unique books: 4
  Total revenue: $329.92
  Average sale: $47.13

Summary

In this lesson, we learned about advanced collection types:

  • deque: Fast appends/pops from both ends, great for queues and recent items
  • Counter: Count hashable objects, find most common, arithmetic operations
  • defaultdict: Dictionary with automatic default values for missing keys
  • namedtuple: Immutable objects with named fields, memory-efficient
  • ChainMap: Combine multiple dictionaries with priority

When to use each:

  • deque: Queue-like operations, limited-size collections, fast operations at both ends
  • Counter: Counting, frequency analysis, finding most/least common
  • defaultdict: Grouping, counting without checking if key exists
  • namedtuple: Lightweight data objects, CSV rows, API responses
  • ChainMap: Configuration hierarchies, scope chains, multiple contexts

These specialized collections make code more readable, efficient, and Pythonic!

In the next and final lesson, we’ll explore working with dates, times, and timezones in Python.