Generators and Yield in Python
Generators create iterators using the yield keyword instead of return. They generate values on-the-fly, saving memory by not storing all values at once.
List vs Generator Comparison
# List: Stores all values in memory
def get_numbers_list(n):
result = []
for i in range(n):
result.append(i ** 2)
return result
numbers = get_numbers_list(1000000) # Uses ~40MB memory
# Generator: Produces values one at a time
def get_numbers_gen(n):
for i in range(n):
yield i ** 2
numbers = get_numbers_gen(1000000) # Uses ~120 bytes!
Basic Generator
def countdown(n):
while n > 0:
yield n
n -= 1
for num in countdown(5):
print(num)
# Output: 5, 4, 3, 2, 1
Reading Large Files
def read_large_file(file_path):
"""Read file line by line without loading entire file"""
with open(file_path) as f:
for line in f:
yield line.strip()
# Memory efficient even for 10GB files
for line in read_large_file('huge_file.txt'):
process(line)
Fibonacci Generator
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Generate first 10 Fibonacci numbers
fib = fibonacci()
for _ in range(10):
print(next(fib))
# Output: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34
Generator Expression
# List comprehension (stores all)
squares = [x**2 for x in range(1000000)]
# Generator expression (lazy evaluation)
squares = (x**2 for x in range(1000000)
# Use in functions that accept iterables
total = sum(x**2 for x in range(1000000)) # Memory efficient!
Data Pipeline with Generators
def read_csv(filename):
with open(filename) as f:
for line in f:
yield line.strip().split(',')
def filter_valid(rows):
for row in rows:
if len(row) == 3 and row[0]:
yield row
def extract_names(rows):
for row in rows:
yield row[0]
# Chain generators efficiently
names = extract_names(filter_valid(read_csv('data.csv')))
for name in names:
print(name)
When to Use Generators
- Processing large files or datasets
- Streaming data that doesn't fit in memory
- Infinite sequences (like Fibonacci)
- Data pipelines with multiple transformations
Pro Tip: Generators are lazy—they only compute values when requested. Use them for large datasets to save memory. Convert to list only when you need all values at once.
← Back to Python Tips