Useful Data Tips

Correlation vs Causation: How to Tell the Difference

⏱️ 35 sec read 📊 Data Analysis

Correlation means two variables move together. Causation means one causes the other. Confusing them leads to terrible decisions.

The Key Difference

Correlation: X and Y move together

Causation: X causes Y to change

Classic Example: Ice Cream and Drowning

Observation: Ice cream sales and drowning deaths are highly correlated (r = 0.95)

Wrong conclusion: Ice cream causes drowning

Reality: Both are caused by hot weather (confounding variable)

3 Ways Correlation Happens Without Causation

1. Confounding Variable (Most Common)

A third variable causes both:

Hot weather → More ice cream sales
Hot weather → More swimming → More drownings

Ice cream ≠> Drowning (no causation)
But they're correlated due to shared cause

2. Reverse Causation

You have the direction backward:

Correlation: Hospitals & deaths
Wrong: Hospitals cause deaths
Right: Deaths/illness cause hospital visits

3. Pure Coincidence

Random chance, especially with small samples or cherry-picked data:

Nicolas Cage movies correlate with pool drownings (r=0.67)
Obviously coincidence!

Tests for Causation

1. Time Order

Cause must come before effect:

2. Controlled Experiment (Gold Standard)

Randomized A/B test:

3. Dose-Response Relationship

More cause → More effect:

1 hour study → +5 points
2 hours study → +10 points
3 hours study → +15 points

Consistent relationship strengthens causal claim

4. Plausible Mechanism

Can you explain how X causes Y?

5. Eliminate Confounders

Statistical controls:

# Regression controlling for confounders
model = ols('sales ~ marketing + temperature + day_of_week + holidays')

# If marketing coefficient is significant after controls → stronger causal claim

Practical Questions to Ask

Question What It Checks
Does X happen before Y? Time order
What else changed at the same time? Confounders
Could Y cause X instead? Reverse causation
Can we run an experiment? True causation test
Does the relationship make sense? Plausibility

Common Business Examples

Email Open Rates and Revenue

Correlation: High email opens → High revenue

Confound: Engaged customers do both

Test: A/B test email content to prove causation

Website Visitors and Sales

Correlation: More traffic → More sales

Could be reverse: More sales → More word of mouth → More traffic

Test: Run paid ads (controlled traffic increase)

Remember: Correlation is easy to find (just calculate r). Causation requires experiments, controls, and careful thinking. When in doubt, say "correlation" not "causes."

← Back to Data Analysis Tips