Lesson 8 - Generators and Memory-Efficient Processing
On this page
- Introduction
- Generator Functions
- Comparing Class Iterator vs Generator
- Practical Example: Reading Large Files
- Generator Expressions
- Chaining Generators
- Infinite Generators
- Sending Values into Generators
- Generator Methods:
send(),throw(),close() - Using
yield from - Practical Example: Data Processing Pipeline
- The
itertoolsModule - Summary
Introduction
In the previous lesson, we learned about iterators. Creating iterators with classes requires writing __iter__ and __next__ methods, which can be verbose. Python provides a simpler way: generators.
Generators are a special type of iterator created using functions with the yield keyword. They’re:
- Easier to write than class-based iterators
- Memory-efficient (values generated on-demand)
- Perfect for processing large datasets
- Can be infinite
In this lesson, we’ll explore:
- Generator functions and the
yieldkeyword - Generator expressions
- Sending values into generators
- Using generators for data pipelines
- The
itertoolsmodule for generator utilities
Generator Functions
A generator function looks like a normal function but uses yield instead of return:
def count_up(start, end):
"""Generator that counts from start to end"""
current = start
while current <= end:
yield current
current += 1
# Create a generator
counter = count_up(1, 5)
# It's an iterator!
print(next(counter)) # 1
print(next(counter)) # 2
# Use in a for loop
for num in count_up(10, 13):
print(num)Output:
1
2
10
11
12
13How yield Works
When a function contains yield:
- Calling the function returns a generator object (doesn’t execute the function yet)
- Calling
next()on the generator executes until the firstyield - The value after
yieldis returned - The function’s state is frozen
- Next
next()call resumes from where it left off
def simple_generator():
print("Starting")
yield 1
print("Between yields")
yield 2
print("Ending")
yield 3
gen = simple_generator()
print("Generator created")
print(next(gen))
print(next(gen))
print(next(gen))Output:
Generator created
Starting
1
Between yields
2
Ending
3Comparing Class Iterator vs Generator
Here’s the same functionality as a class-based iterator and a generator:
# Class-based iterator (verbose)
class CounterIterator:
def __init__(self, start, end):
self.current = start
self.end = end
def __iter__(self):
return self
def __next__(self):
if self.current > self.end:
raise StopIteration
value = self.current
self.current += 1
return value
# Generator (simple!)
def counter_generator(start, end):
current = start
while current <= end:
yield current
current += 1
# Both work the same way
for num in CounterIterator(1, 3):
print(num, end=" ")
print()
for num in counter_generator(1, 3):
print(num, end=" ")Output:
1 2 3
1 2 3Generators are much more concise!
Practical Example: Reading Large Files
Generators are perfect for processing large files line-by-line without loading everything into memory:
def read_books(filename):
"""Generator that yields book info from a CSV file"""
try:
with open(filename, 'r') as f:
# Skip header
next(f)
for line in f:
line = line.strip()
if line:
parts = line.split(',')
yield {
'title': parts[0],
'author': parts[1],
'price': float(parts[2])
}
except FileNotFoundError:
print(f"File {filename} not found")
# Process books one at a time (memory-efficient)
for book in read_books('books.csv'):
if book['price'] < 40:
print(f"{book['title']} by {book['author']} - ${book['price']}")This reads one line at a time, never loading the entire file into memory.
Generator Expressions
Similar to list comprehensions, but use parentheses and create generators:
# List comprehension - creates entire list in memory
squares_list = [x**2 for x in range(1000000)]
# Generator expression - generates values on-demand
squares_gen = (x**2 for x in range(1000000))
# Generator is much more memory-efficient
import sys
print(f"List size: {sys.getsizeof(squares_list)} bytes")
print(f"Generator size: {sys.getsizeof(squares_gen)} bytes")
# Both work in for loops
for num in (x**2 for x in range(5)):
print(num, end=" ")Output:
List size: 8000056 bytes
Generator size: 112 bytes
0 1 4 9 16Practical Use: Filtering Books
books = [
{'title': 'Python Basics', 'price': 29.99},
{'title': 'Python Advanced', 'price': 49.99},
{'title': 'Data Science 101', 'price': 39.99},
{'title': 'Machine Learning', 'price': 59.99},
]
# Generator expression for affordable books
affordable = (book for book in books if book['price'] < 40)
# Memory-efficient iteration
for book in affordable:
print(f"{book['title']}: ${book['price']}")Output:
Python Basics: $29.99
Data Science 101: $39.99Chaining Generators
Generators can be chained to create data processing pipelines:
def read_books():
"""Simulate reading books from a database"""
books = [
{'title': 'Python Basics', 'author': 'John Doe', 'price': 29.99, 'rating': 4.5},
{'title': 'Python Advanced', 'author': 'Jane Smith', 'price': 49.99, 'rating': 4.8},
{'title': 'Data Science 101', 'author': 'Bob Johnson', 'price': 39.99, 'rating': 4.2},
{'title': 'Machine Learning', 'author': 'Alice Williams', 'price': 59.99, 'rating': 4.9},
{'title': 'Web Development', 'author': 'Charlie Brown', 'price': 34.99, 'rating': 3.8},
]
for book in books:
yield book
def filter_by_price(books, max_price):
"""Filter books by maximum price"""
for book in books:
if book['price'] <= max_price:
yield book
def filter_by_rating(books, min_rating):
"""Filter books by minimum rating"""
for book in books:
if book['rating'] >= min_rating:
yield book
def format_book(books):
"""Format book information"""
for book in books:
yield f"'{book['title']}' by {book['author']} - ${book['price']} (⭐ {book['rating']})"
# Create a processing pipeline
all_books = read_books()
affordable_books = filter_by_price(all_books, 40)
highly_rated = filter_by_rating(affordable_books, 4.3)
formatted = format_book(highly_rated)
# Process the pipeline
print("Affordable, highly-rated books:")
for book_info in formatted:
print(f" {book_info}")Output:
Affordable, highly-rated books:
'Python Basics' by John Doe - $29.99 (⭐ 4.5)
'Python Advanced' by Jane Smith - $49.99 (⭐ 4.8)Each generator processes one item at a time, making this extremely memory-efficient even for millions of books.
Infinite Generators
Generators can produce infinite sequences:
def infinite_counter(start=0):
"""Count forever"""
num = start
while True:
yield num
num += 1
# Use with break to avoid infinite loop
counter = infinite_counter(1)
for num in counter:
if num > 5:
break
print(num)Output:
1
2
3
4
5Practical: Infinite Book ID Generator
def book_id_generator(prefix="BOOK"):
"""Generate unique book IDs forever"""
counter = 1
while True:
yield f"{prefix}-{counter:05d}"
counter += 1
# Use it to assign IDs to books
id_gen = book_id_generator("LIB")
books = ["Python Basics", "Python Advanced", "Data Science 101"]
for book in books:
book_id = next(id_gen)
print(f"{book_id}: {book}")Output:
LIB-00001: Python Basics
LIB-00002: Python Advanced
LIB-00003: Data Science 101Sending Values into Generators
Generators can receive values using the send() method:
def running_average():
"""Calculate running average"""
total = 0
count = 0
average = None
while True:
# Receive a value
value = yield average
if value is not None:
total += value
count += 1
average = total / count
# Use the generator
avg = running_average()
next(avg) # Prime the generator
print(avg.send(100)) # 100.0
print(avg.send(150)) # 125.0
print(avg.send(200)) # 150.0
print(avg.send(50)) # 125.0Output:
100.0
125.0
150.0
125.0Generator Methods: send(), throw(), close()
Generators have special methods for advanced control:
def book_processor():
"""Process books with error handling"""
books_processed = 0
try:
while True:
book = yield books_processed
if book:
print(f"Processing: {book}")
books_processed += 1
except GeneratorExit:
print(f"Processed {books_processed} books total")
except Exception as e:
print(f"Error: {e}")
yield -1
processor = book_processor()
next(processor) # Prime it
print(processor.send("Python Basics"))
print(processor.send("Python Advanced"))
# Close the generator
processor.close()
# processor.send("Another book") # Would raise StopIterationOutput:
Processing: Python Basics
1
Processing: Python Advanced
2
Processed 2 books totalUsing yield from
The yield from expression delegates to another generator:
def read_programming_books():
"""Yield programming books"""
yield "Python Basics"
yield "JavaScript Intro"
yield "Go Programming"
def read_data_books():
"""Yield data science books"""
yield "Data Science 101"
yield "Statistics Fundamentals"
yield "Machine Learning"
def read_all_books():
"""Yield all books using yield from"""
yield from read_programming_books()
yield from read_data_books()
print("All books:")
for book in read_all_books():
print(f" - {book}")Output:
All books:
- Python Basics
- JavaScript Intro
- Go Programming
- Data Science 101
- Statistics Fundamentals
- Machine LearningThis is cleaner than manually looping and yielding each item.
Practical Example: Data Processing Pipeline
Let’s build a realistic data processing pipeline using generators:
import csv
from io import StringIO
# Simulated CSV data
csv_data = """title,author,price,pages
Python Crash Course,Eric Matthes,39.99,544
Automate the Boring Stuff,Al Sweigart,29.99,504
Learning Python,Mark Lutz,64.99,1648
Python Basics,John Doe,24.99,320
Data Science 101,Jane Smith,44.99,420"""
def read_csv_data(csv_text):
"""Generator: Read CSV data"""
reader = csv.DictReader(StringIO(csv_text))
for row in reader:
yield row
def parse_numeric_fields(books):
"""Generator: Convert numeric strings to numbers"""
for book in books:
book['price'] = float(book['price'])
book['pages'] = int(book['pages'])
yield book
def calculate_price_per_page(books):
"""Generator: Add price per page field"""
for book in books:
book['price_per_page'] = book['price'] / book['pages']
yield book
def filter_good_deals(books, max_price_per_page=0.08):
"""Generator: Filter books with good price per page ratio"""
for book in books:
if book['price_per_page'] <= max_price_per_page:
yield book
def format_output(books):
"""Generator: Format books for display"""
for book in books:
yield (
f"'{book['title']}' by {book['author']}\n"
f" ${book['price']:.2f} for {book['pages']} pages "
f"(${book['price_per_page']:.4f}/page)"
)
# Build the pipeline
pipeline = read_csv_data(csv_data)
pipeline = parse_numeric_fields(pipeline)
pipeline = calculate_price_per_page(pipeline)
pipeline = filter_good_deals(pipeline, max_price_per_page=0.10)
pipeline = format_output(pipeline)
# Execute the pipeline
print("Best Value Books:")
for book_info in pipeline:
print(book_info)Output:
Best Value Books:
'Python Crash Course' by Eric Matthes
$39.99 for 544 pages ($0.0735/page)
'Automate the Boring Stuff' by Al Sweigart
$29.99 for 504 pages ($0.0595/page)
'Python Basics' by John Doe
$24.99 for 320 pages ($0.0781/page)Each stage processes one book at a time—very memory-efficient even for millions of books!
The itertools Module
Python’s itertools module provides many powerful generator utilities:
import itertools
books = ["Python Basics", "Python Advanced", "Data Science 101"]
prices = [29.99, 49.99, 39.99]
# chain - combine iterables
all_items = itertools.chain(books, ["Statistics", "ML"])
print("Chained:", list(all_items))
# zip_longest - zip with different lengths
for book, price in itertools.zip_longest(books, prices, fillvalue=0):
print(f"{book}: ${price}")
# cycle - repeat infinitely
print("\nCycle (first 5):")
for i, book in enumerate(itertools.cycle(books)):
if i >= 5:
break
print(f" {book}")
# islice - slice an iterable
print("\nFirst 2 books:")
for book in itertools.islice(books, 2):
print(f" {book}")
# count - infinite counter
print("\nWith IDs:")
for book_id, book in zip(itertools.count(1), books):
print(f" {book_id}. {book}")Output:
Chained: ['Python Basics', 'Python Advanced', 'Data Science 101', 'Statistics', 'ML']
Python Basics: $29.99
Python Advanced: $49.99
Data Science 101: $39.99
Cycle (first 5):
Python Basics
Python Advanced
Data Science 101
Python Basics
Python Advanced
First 2 books:
Python Basics
Python Advanced
With IDs:
1. Python Basics
2. Python Advanced
3. Data Science 101Summary
In this lesson, we learned about generators:
- Generator functions use
yieldinstead ofreturn yieldpauses execution and returns a value- Generators are iterators but much easier to write
- Generator expressions are like list comprehensions but memory-efficient
- Generators can be chained to create data pipelines
- Generators can be infinite
send(),throw(), andclose()provide advanced controlyield fromdelegates to another generatoritertoolsmodule provides powerful generator utilities
Generators are essential for:
- Processing large files or datasets
- Creating memory-efficient data pipelines
- Working with infinite sequences
- Lazy evaluation (compute only when needed)
When to use generators:
- Processing large amounts of data
- Data doesn’t need to be in memory all at once
- Creating sequences (finite or infinite)
- Building data processing pipelines
In the next lesson, we’ll learn about context managers and the with statement—tools for managing resources safely.