5 Ways to Speed Up Pandas Operations

⏱️ 35 sec read 🐍 Python

Pandas can be slow on large datasets. Here's how to make it faster:

1. Use Vectorized Operations (Not Loops)

# Slow ❌ (12 seconds on 1M rows)
for i in range(len(df)):
    df.loc[i, 'total'] = df.loc[i, 'price'] * df.loc[i, 'quantity']

# Fast ✅ (0.02 seconds)
df['total'] = df['price'] * df['quantity']

2. Use .values or .to_numpy() for Math

# Slower
df['new_col'] = df['col1'] + df['col2']

# Faster (avoids index alignment overhead)
df['new_col'] = df['col1'].values + df['col2'].values

3. Query Instead of Boolean Indexing

# Slower
df[(df['age'] > 25) & (df['salary'] > 50000)]

# Faster (15-20% improvement)
df.query('age > 25 and salary > 50000')

4. Use Categorical Data Types

# Memory: 8 bytes per row
df['country'] = df['country'].astype('category')
# Can reduce memory by 90% for repeated values

5. Read Only Needed Columns

# Slow: reads entire file
df = pd.read_csv('data.csv')

# Fast: reads only what you need
df = pd.read_csv('data.csv', usecols=['name', 'email', 'date'])

Biggest Win: Replace any loop with vectorized operations. This alone can give you 100x-1000x speedup.

← Back to Python Tips