Useful Data Tips

Sets in Python: When and Why

⏱️ 27 sec read 🐍 Python

Sets are unordered collections of unique elements. They're optimized for membership testing, removing duplicates, and mathematical set operations like unions and intersections.

Creating Sets

# Using curly braces
fruits = {'apple', 'banana', 'orange'}

# Using set() constructor
numbers = set([1, 2, 3, 2, 1])  # {1, 2, 3} - duplicates removed

# Empty set (can't use {} - that's a dict!)
empty = set()

Fast Membership Testing

# Sets use hash tables: O(1) lookup
# Lists use linear search: O(n) lookup

# Slow with list (checks each element)
allowed_list = [1, 2, 3, ..., 10000]
if 9999 in allowed_list:  # Slow for large lists
    pass

# Fast with set (hash lookup)
allowed_set = {1, 2, 3, ..., 10000}
if 9999 in allowed_set:  # Instant, regardless of size
    pass

Removing Duplicates

# Remove duplicates from list
numbers = [1, 2, 2, 3, 4, 4, 5]
unique = list(set(numbers))  # [1, 2, 3, 4, 5]

# Keep original order with dict
from collections import OrderedDict
unique_ordered = list(OrderedDict.fromkeys(numbers))

Set Operations

Union (All Elements)

a = {1, 2, 3}
b = {3, 4, 5}

union = a | b  # {1, 2, 3, 4, 5}
# or
union = a.union(b)

Intersection (Common Elements)

a = {1, 2, 3}
b = {2, 3, 4}

intersection = a & b  # {2, 3}
# or
intersection = a.intersection(b)

Difference

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

diff = a - b  # {1, 2} - elements in a but not b
# or
diff = a.difference(b)

Symmetric Difference (XOR)

a = {1, 2, 3}
b = {3, 4, 5}

sym_diff = a ^ b  # {1, 2, 4, 5} - elements in either but not both
# or
sym_diff = a.symmetric_difference(b)

Practical Examples

Find Common Customers

email_campaign = {'[email protected]', '[email protected]', '[email protected]'}
purchasers = {'[email protected]', '[email protected]', '[email protected]'}

# Who received email AND purchased?
converted = email_campaign & purchasers  # {'[email protected]', '[email protected]'}

Find Missing Items

required_fields = {'name', 'email', 'age', 'phone'}
provided_fields = {'name', 'email', 'age'}

missing = required_fields - provided_fields  # {'phone'}

Set Methods

s = {1, 2, 3}

s.add(4)        # Add single element
s.update([5, 6])  # Add multiple elements
s.remove(2)     # Remove element (raises error if not found)
s.discard(10)   # Remove element (no error if not found)
s.clear()       # Remove all elements

When to Use Sets

Pro Tip: Use sets for membership testing when performance matters. Testing if an item is in a set is much faster than checking a list, especially with large datasets.

← Back to Python Tips