Useful Data Tips

Generators and Yield in Python

⏱️ 29 sec read 🐍 Python

Generators create iterators using the yield keyword instead of return. They generate values on-the-fly, saving memory by not storing all values at once.

List vs Generator Comparison

# List: Stores all values in memory
def get_numbers_list(n):
    result = []
    for i in range(n):
        result.append(i ** 2)
    return result

numbers = get_numbers_list(1000000)  # Uses ~40MB memory

# Generator: Produces values one at a time
def get_numbers_gen(n):
    for i in range(n):
        yield i ** 2

numbers = get_numbers_gen(1000000)  # Uses ~120 bytes!

Basic Generator

def countdown(n):
    while n > 0:
        yield n
        n -= 1

for num in countdown(5):
    print(num)
# Output: 5, 4, 3, 2, 1

Reading Large Files

def read_large_file(file_path):
    """Read file line by line without loading entire file"""
    with open(file_path) as f:
        for line in f:
            yield line.strip()

# Memory efficient even for 10GB files
for line in read_large_file('huge_file.txt'):
    process(line)

Fibonacci Generator

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Generate first 10 Fibonacci numbers
fib = fibonacci()
for _ in range(10):
    print(next(fib))
# Output: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34

Generator Expression

# List comprehension (stores all)
squares = [x**2 for x in range(1000000)]

# Generator expression (lazy evaluation)
squares = (x**2 for x in range(1000000)

# Use in functions that accept iterables
total = sum(x**2 for x in range(1000000))  # Memory efficient!

Data Pipeline with Generators

def read_csv(filename):
    with open(filename) as f:
        for line in f:
            yield line.strip().split(',')

def filter_valid(rows):
    for row in rows:
        if len(row) == 3 and row[0]:
            yield row

def extract_names(rows):
    for row in rows:
        yield row[0]

# Chain generators efficiently
names = extract_names(filter_valid(read_csv('data.csv')))
for name in names:
    print(name)

When to Use Generators

Pro Tip: Generators are lazy—they only compute values when requested. Use them for large datasets to save memory. Convert to list only when you need all values at once.

← Back to Python Tips