Pandas Merge how Parameter Cheat Sheet
The how parameter in pd.merge() decides which rows survive the join. This page shows the same two DataFrames merged five ways — inner, outer, left, right, and cross — so you can see exactly which rows each option keeps.
The Two Test DataFrames
import pandas as pd
orders = pd.DataFrame({
"order_id": [1, 2, 3, 4],
"customer_id":[10, 20, 30, 40],
"amount": [99, 50, 75, 20],
})
customers = pd.DataFrame({
"customer_id":[10, 20, 50],
"name": ["Alice", "Bob", "Eve"],
})
Customer 30 and 40 have orders but no profile. Customer 50 has a profile but no orders. Watch what happens to those three rows under each join.
how="inner" — Pandas Inner Join (Default)
Keep only rows where the key exists in both DataFrames.
pd.merge(orders, customers, on="customer_id", how="inner")
# order_id customer_id amount name
# 0 1 10 99 Alice
# 1 2 20 50 Bob
Customers 30, 40, and 50 are dropped. Use this when missing keys mean "ignore the row."
how="outer" — Pandas Full / Outer Join
Keep every row from both sides. Missing values become NaN.
pd.merge(orders, customers, on="customer_id", how="outer")
# order_id customer_id amount name
# 0 1.0 10 99.0 Alice
# 1 2.0 20 50.0 Bob
# 2 3.0 30 75.0 NaN
# 3 4.0 40 20.0 NaN
# 4 NaN 50 NaN Eve
This is the answer to "pandas full join." Use it for reconciliation — finding orders without customers and customers without orders in one pass.
how="left" — Pandas Left Join
Keep every row from the left DataFrame. Right-side columns are NaN when there's no match.
pd.merge(orders, customers, on="customer_id", how="left")
# order_id customer_id amount name
# 0 1 10 99 Alice
# 1 2 20 50 Bob
# 2 3 30 75 NaN
# 3 4 40 20 NaN
Most common in practice — "enrich orders with customer details, but don't lose orders that lack a profile."
how="right" — Pandas Right Join
Mirror image of left. Keep every row from the right DataFrame.
pd.merge(orders, customers, on="customer_id", how="right")
# order_id customer_id amount name
# 0 1.0 10 99.0 Alice
# 1 2.0 20 50.0 Bob
# 2 NaN 50 NaN Eve
Equivalent to swapping the inputs and using how="left". Most teams pick one direction and stick with it for readability.
how="cross" — Pandas Cross Join (Cartesian Product)
Every row on the left paired with every row on the right. No key column.
sizes = pd.DataFrame({"size": ["S", "M", "L"]})
colors = pd.DataFrame({"color": ["red", "blue"]})
pd.merge(sizes, colors, how="cross")
# size color
# 0 S red
# 1 S blue
# 2 M red
# 3 M blue
# 4 L red
# 5 L blue
Result size is len(left) × len(right). Useful for generating combinations (date × store, user × experiment), dangerous on large inputs.
Summary Table
how | rows from left | rows from right | unmatched fill
----------|---------------|----------------|----------------
"inner" | matched only | matched only | (dropped)
"outer" | all | all | NaN
"left" | all | matched only | NaN on right cols
"right" | matched only | all | NaN on left cols
"cross" | all × all | all × all | (no key used)
Use the indicator Flag to Debug a Merge
pd.merge(orders, customers, on="customer_id", how="outer", indicator=True)
# Adds a _merge column with values:
# 'left_only' row only in orders
# 'right_only' row only in customers
# 'both' matched
# Quick QA — count what came from where:
result["_merge"].value_counts()
Common Pitfalls
- Row explosion on duplicates: if the join key isn't unique on both sides, you get a many-to-many product. Check
orders["customer_id"].is_uniquebefore merging. - Different column names: use
left_on=andright_on=instead ofon=. - NaN keys never match: rows with
NaNin the join column are dropped from inner joins and never align in outer joins. Fill or drop them first. - Use
validate=to catch bugs:pd.merge(..., validate="one_to_one")raises if your assumption about uniqueness is wrong.
Pro Tip: If you're joining on the index rather than a column, use df.join() — it defaults to how="left" and skips having to write left_index=True, right_index=True every time.