Pandas Merge how Parameter Cheat Sheet

⏱️ 40 sec read 🐍 Python

The how parameter in pd.merge() decides which rows survive the join. This page shows the same two DataFrames merged five ways — inner, outer, left, right, and cross — so you can see exactly which rows each option keeps.

The Two Test DataFrames

import pandas as pd

orders = pd.DataFrame({
    "order_id":   [1, 2, 3, 4],
    "customer_id":[10, 20, 30, 40],
    "amount":     [99, 50, 75, 20],
})

customers = pd.DataFrame({
    "customer_id":[10, 20, 50],
    "name":       ["Alice", "Bob", "Eve"],
})

Customer 30 and 40 have orders but no profile. Customer 50 has a profile but no orders. Watch what happens to those three rows under each join.

how="inner" — Pandas Inner Join (Default)

Keep only rows where the key exists in both DataFrames.

pd.merge(orders, customers, on="customer_id", how="inner")
#    order_id  customer_id  amount   name
# 0         1           10      99  Alice
# 1         2           20      50    Bob

Customers 30, 40, and 50 are dropped. Use this when missing keys mean "ignore the row."

how="outer" — Pandas Full / Outer Join

Keep every row from both sides. Missing values become NaN.

pd.merge(orders, customers, on="customer_id", how="outer")
#    order_id  customer_id  amount   name
# 0       1.0           10    99.0  Alice
# 1       2.0           20    50.0    Bob
# 2       3.0           30    75.0    NaN
# 3       4.0           40    20.0    NaN
# 4       NaN           50     NaN    Eve

This is the answer to "pandas full join." Use it for reconciliation — finding orders without customers and customers without orders in one pass.

how="left" — Pandas Left Join

Keep every row from the left DataFrame. Right-side columns are NaN when there's no match.

pd.merge(orders, customers, on="customer_id", how="left")
#    order_id  customer_id  amount   name
# 0         1           10      99  Alice
# 1         2           20      50    Bob
# 2         3           30      75    NaN
# 3         4           40      20    NaN

Most common in practice — "enrich orders with customer details, but don't lose orders that lack a profile."

how="right" — Pandas Right Join

Mirror image of left. Keep every row from the right DataFrame.

pd.merge(orders, customers, on="customer_id", how="right")
#    order_id  customer_id  amount   name
# 0       1.0           10    99.0  Alice
# 1       2.0           20    50.0    Bob
# 2       NaN           50     NaN    Eve

Equivalent to swapping the inputs and using how="left". Most teams pick one direction and stick with it for readability.

how="cross" — Pandas Cross Join (Cartesian Product)

Every row on the left paired with every row on the right. No key column.

sizes = pd.DataFrame({"size": ["S", "M", "L"]})
colors = pd.DataFrame({"color": ["red", "blue"]})

pd.merge(sizes, colors, how="cross")
#   size color
# 0    S   red
# 1    S  blue
# 2    M   red
# 3    M  blue
# 4    L   red
# 5    L  blue

Result size is len(left) × len(right). Useful for generating combinations (date × store, user × experiment), dangerous on large inputs.

Summary Table

how       | rows from left | rows from right | unmatched fill
----------|---------------|----------------|----------------
"inner"   | matched only  | matched only   | (dropped)
"outer"   | all           | all            | NaN
"left"    | all           | matched only   | NaN on right cols
"right"   | matched only  | all            | NaN on left cols
"cross"   | all × all     | all × all      | (no key used)

Use the indicator Flag to Debug a Merge

pd.merge(orders, customers, on="customer_id", how="outer", indicator=True)
# Adds a _merge column with values:
#   'left_only'   row only in orders
#   'right_only'  row only in customers
#   'both'        matched

# Quick QA — count what came from where:
result["_merge"].value_counts()

Common Pitfalls

Pro Tip: If you're joining on the index rather than a column, use df.join() — it defaults to how="left" and skips having to write left_index=True, right_index=True every time.

← Back to Python Tips