5 Ways to Speed Up Pandas Operations
Pandas can be slow on large datasets. Here's how to make it faster:
1. Use Vectorized Operations (Not Loops)
# Slow ❌ (12 seconds on 1M rows)
for i in range(len(df)):
df.loc[i, 'total'] = df.loc[i, 'price'] * df.loc[i, 'quantity']
# Fast ✅ (0.02 seconds)
df['total'] = df['price'] * df['quantity']
2. Use .values or .to_numpy() for Math
# Slower
df['new_col'] = df['col1'] + df['col2']
# Faster (avoids index alignment overhead)
df['new_col'] = df['col1'].values + df['col2'].values
3. Query Instead of Boolean Indexing
# Slower
df[(df['age'] > 25) & (df['salary'] > 50000)]
# Faster (15-20% improvement)
df.query('age > 25 and salary > 50000')
4. Use Categorical Data Types
# Memory: 8 bytes per row
df['country'] = df['country'].astype('category')
# Can reduce memory by 90% for repeated values
5. Read Only Needed Columns
# Slow: reads entire file
df = pd.read_csv('data.csv')
# Fast: reads only what you need
df = pd.read_csv('data.csv', usecols=['name', 'email', 'date'])
Biggest Win: Replace any loop with vectorized operations. This alone can give you 100x-1000x speedup.
← Back to Python Tips