Useful Data Tips

Python List Comprehension

🐍 Python ⏱️ 30 sec read

What is List Comprehension?

List comprehension creates lists in a single line. It's more concise and often faster than traditional for loops.

Basic Syntax

[expression for item in iterable]

Simple Examples

# Traditional way
squares = []
for x in range(10):
    squares.append(x**2)

# List comprehension (better!)
squares = [x**2 for x in range(10)]
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Create list from existing list
names = ['alice', 'bob', 'charlie']
upper_names = [name.upper() for name in names]
# ['ALICE', 'BOB', 'CHARLIE']

# Apply function to each element
numbers = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in numbers]
# [2, 4, 6, 8, 10]

With Conditionals (if)

# Filter: only include if condition is True
[expression for item in iterable if condition]

# Example: Only even numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
evens = [x for x in numbers if x % 2 == 0]
# [2, 4, 6, 8]

# Example: Only positive numbers
values = [-2, -1, 0, 1, 2, 3]
positive = [x for x in values if x > 0]
# [1, 2, 3]

# Example: Filter strings
words = ['apple', 'bat', 'banana', 'cat']
long_words = [word for word in words if len(word) > 3]
# ['apple', 'banana']

With If-Else (Ternary)

# Transform based on condition
[expression_if_true if condition else expression_if_false for item in iterable]

# Example: Replace negative with 0
numbers = [-2, -1, 0, 1, 2]
result = [x if x > 0 else 0 for x in numbers]
# [0, 0, 0, 1, 2]

# Example: Categorize numbers
numbers = [1, 5, 10, 15, 20]
categories = ['small' if x < 10 else 'large' for x in numbers]
# ['small', 'small', 'large', 'large', 'large']

# Example: Absolute values
values = [-5, -2, 0, 3, 7]
absolute = [x if x >= 0 else -x for x in values]
# [5, 2, 0, 3, 7]

Nested Loops

# Flatten 2D list
matrix = [[1, 2], [3, 4], [5, 6]]
flat = [num for row in matrix for num in row]
# [1, 2, 3, 4, 5, 6]

# Cartesian product (all combinations)
colors = ['red', 'blue']
sizes = ['S', 'M', 'L']
products = [(color, size) for color in colors for size in sizes]
# [('red', 'S'), ('red', 'M'), ('red', 'L'),
#  ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]

# Equivalent to:
products = []
for color in colors:
    for size in sizes:
        products.append((color, size))

Dictionary Comprehension

{key: value for item in iterable}

# Example: Square dictionary
squares = {x: x**2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# Example: Invert dictionary
original = {'a': 1, 'b': 2, 'c': 3}
inverted = {value: key for key, value in original.items()}
# {1: 'a', 2: 'b', 3: 'c'}

# Example: Filter dictionary
prices = {'apple': 1.50, 'banana': 0.50, 'orange': 2.00}
expensive = {fruit: price for fruit, price in prices.items() if price > 1}
# {'apple': 1.5, 'orange': 2.0}

Set Comprehension

{expression for item in iterable}

# Example: Unique squares
numbers = [1, 2, 2, 3, 3, 3, 4]
unique_squares = {x**2 for x in numbers}
# {1, 4, 9, 16}  (set removes duplicates)

# Example: Unique lengths
words = ['cat', 'dog', 'bird', 'fish']
lengths = {len(word) for word in words}
# {3, 4}  (both 'cat' and 'dog' have length 3)

Real-World Examples

Data Cleaning

# Remove whitespace
data = ['  alice  ', 'bob', '  charlie']
cleaned = [name.strip() for name in data]
# ['alice', 'bob', 'charlie']

# Convert to numeric
strings = ['10', '20', '30']
numbers = [int(s) for s in strings]
# [10, 20, 30]

# Handle missing values
values = ['10', '', '20', 'N/A', '30']
clean_values = [int(v) for v in values if v and v != 'N/A']
# [10, 20, 30]

Data Transformation

# Extract from objects
users = [
    {'name': 'Alice', 'age': 25},
    {'name': 'Bob', 'age': 30},
    {'name': 'Charlie', 'age': 35}
]

names = [user['name'] for user in users]
# ['Alice', 'Bob', 'Charlie']

adults = [user for user in users if user['age'] >= 30]
# [{'name': 'Bob', 'age': 30}, {'name': 'Charlie', 'age': 35}]

File Processing

# Read and process file lines
with open('data.txt') as f:
    numbers = [int(line.strip()) for line in f if line.strip().isdigit()]

# Process CSV
lines = ['1,2,3', '4,5,6', '7,8,9']
rows = [list(map(int, line.split(','))) for line in lines]
# [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

When to Use List Comprehension

✅ Good Use Cases

❌ Avoid When

# Too complex (BAD)
result = [x**2 if x > 0 else abs(x)**2 if x < -10 else 0 for x in numbers if x != 5]

# Better: Use regular loop for complex logic
result = []
for x in numbers:
    if x == 5:
        continue
    if x > 0:
        result.append(x**2)
    elif x < -10:
        result.append(abs(x)**2)
    else:
        result.append(0)

Performance

import timeit

# List comprehension is faster
def with_comprehension():
    return [x**2 for x in range(1000)]

def with_loop():
    result = []
    for x in range(1000):
        result.append(x**2)
    return result

# Benchmark
comp_time = timeit.timeit(with_comprehension, number=10000)
loop_time = timeit.timeit(with_loop, number=10000)

# List comprehension typically 20-30% faster

Generator Expression (Memory Efficient)

# List comprehension: Creates entire list in memory
squares_list = [x**2 for x in range(1000000)]  # Uses lots of memory

# Generator expression: Lazy evaluation
squares_gen = (x**2 for x in range(1000000))   # Uses minimal memory

# Use generator when you only need to iterate once
total = sum(x**2 for x in range(1000000))  # Memory efficient

Common Patterns

# Flatten nested lists
nested = [[1, 2], [3, 4], [5, 6]]
flat = [item for sublist in nested for item in sublist]

# Remove duplicates while preserving order
items = [1, 2, 2, 3, 1, 4]
unique = list(dict.fromkeys(items))  # Or use set if order doesn't matter

# Apply function
def double(x):
    return x * 2

numbers = [1, 2, 3]
result = [double(x) for x in numbers]  # [2, 4, 6]

# Zip lists together
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
people = [f"{name}: {age}" for name, age in zip(names, ages)]
# ['Alice: 25', 'Bob: 30', 'Charlie: 35']

Key Takeaways:

  • List comprehension is shorter and faster than for loops
  • Basic syntax: [expression for item in iterable]
  • Add if at end to filter
  • Use if-else before for to transform
  • Works with dictionaries and sets too
  • Don't sacrifice readability for brevity