Pandas Merge how Parameter: inner, outer, left, right, cross

Q: Pandas Merge how Parameter Cheat Sheet

The how parameter in pd.merge() decides which rows survive the join. This page shows the same two DataFrames merged five ways — inner, outer, left, right, and cross — so you can see exactly which rows each option keeps.

Q: The Two Test DataFrames

import pandas as pd orders = pd.DataFrame({ "order_id": [1, 2, 3, 4], "customer_id":[10, 20, 30, 40], "amount": [99, 50, 75, 20], }) customers = pd.DataFrame({ "customer_id":[10, 20, 50], "name": ["Alice", "Bob", "Eve"], }) Customer 30 and 40 have orders but no profile. Customer 50 has a profile but no orders. Watch what happens to those three rows under each join.

Q: how="inner" — Pandas Inner Join (Default)

Keep only rows where the key exists in both DataFrames.

Q: how="outer" — Pandas Full / Outer Join

Keep every row from both sides. Missing values become NaN.

The how parameter in pd.merge() decides which rows survive the join. This page shows the same two DataFrames merged five ways — inner, outer, left, right, and cross — so you can see exactly which rows each option keeps.

The Two Test DataFrames

import pandas as pd

orders = pd.DataFrame({
    "order_id":   [1, 2, 3, 4],
    "customer_id":[10, 20, 30, 40],
    "amount":     [99, 50, 75, 20],
})

customers = pd.DataFrame({
    "customer_id":[10, 20, 50],
    "name":       ["Alice", "Bob", "Eve"],
})

Customer 30 and 40 have orders but no profile. Customer 50 has a profile but no orders. Watch what happens to those three rows under each join.

how="inner" — Pandas Inner Join (Default)

Keep only rows where the key exists in both DataFrames.

pd.merge(orders, customers, on="customer_id", how="inner")
#    order_id  customer_id  amount   name
# 0         1           10      99  Alice
# 1         2           20      50    Bob

Customers 30, 40, and 50 are dropped. Use this when missing keys mean "ignore the row."

how="outer" — Pandas Full / Outer Join

Keep every row from both sides. Missing values become NaN.

pd.merge(orders, customers, on="customer_id", how="outer")
#    order_id  customer_id  amount   name
# 0       1.0           10    99.0  Alice
# 1       2.0           20    50.0    Bob
# 2       3.0           30    75.0    NaN
# 3       4.0           40    20.0    NaN
# 4       NaN           50     NaN    Eve

This is the answer to "pandas full join." Use it for reconciliation — finding orders without customers and customers without orders in one pass.

how="left" — Pandas Left Join

Keep every row from the left DataFrame. Right-side columns are NaN when there's no match.

pd.merge(orders, customers, on="customer_id", how="left")
#    order_id  customer_id  amount   name
# 0         1           10      99  Alice
# 1         2           20      50    Bob
# 2         3           30      75    NaN
# 3         4           40      20    NaN

Most common in practice — "enrich orders with customer details, but don't lose orders that lack a profile."

how="right" — Pandas Right Join

Mirror image of left. Keep every row from the right DataFrame.

pd.merge(orders, customers, on="customer_id", how="right")
#    order_id  customer_id  amount   name
# 0       1.0           10    99.0  Alice
# 1       2.0           20    50.0    Bob
# 2       NaN           50     NaN    Eve

Equivalent to swapping the inputs and using how="left". Most teams pick one direction and stick with it for readability.

how="cross" — Pandas Cross Join (Cartesian Product)

Every row on the left paired with every row on the right. No key column.

sizes = pd.DataFrame({"size": ["S", "M", "L"]})
colors = pd.DataFrame({"color": ["red", "blue"]})

pd.merge(sizes, colors, how="cross")
#   size color
# 0    S   red
# 1    S  blue
# 2    M   red
# 3    M  blue
# 4    L   red
# 5    L  blue

Result size is len(left) × len(right). Useful for generating combinations (date × store, user × experiment), dangerous on large inputs.

Summary Table

how       | rows from left | rows from right | unmatched fill
----------|---------------|----------------|----------------
"inner"   | matched only  | matched only   | (dropped)
"outer"   | all           | all            | NaN
"left"    | all           | matched only   | NaN on right cols
"right"   | matched only  | all            | NaN on left cols
"cross"   | all × all     | all × all      | (no key used)

Use the indicator Flag to Debug a Merge

pd.merge(orders, customers, on="customer_id", how="outer", indicator=True)
# Adds a _merge column with values:
#   'left_only'   row only in orders
#   'right_only'  row only in customers
#   'both'        matched

# Quick QA — count what came from where:
result["_merge"].value_counts()

Common Pitfalls

Row explosion on duplicates: if the join key isn't unique on both sides, you get a many-to-many product. Check orders["customer_id"].is_unique before merging.
Different column names: use left_on= and right_on= instead of on=.
NaN keys never match: rows with NaN in the join column are dropped from inner joins and never align in outer joins. Fill or drop them first.
Use validate= to catch bugs: pd.merge(..., validate="one_to_one") raises if your assumption about uniqueness is wrong.

Pro Tip: If you're joining on the index rather than a column, use df.join() — it defaults to how="left" and skips having to write left_index=True, right_index=True every time.

← Back to Python Tips

Related Tips

Written by Aaron Pitt · Useful Data Tips