Correlation vs Causation: How to Tell the Difference
Correlation means two variables move together. Causation means one causes the other. Confusing them leads to terrible decisions.
The Key Difference
Correlation: X and Y move together
Causation: X causes Y to change
Classic Example: Ice Cream and Drowning
Observation: Ice cream sales and drowning deaths are highly correlated (r = 0.95)
Wrong conclusion: Ice cream causes drowning
Reality: Both are caused by hot weather (confounding variable)
3 Ways Correlation Happens Without Causation
1. Confounding Variable (Most Common)
A third variable causes both:
Hot weather → More ice cream sales
Hot weather → More swimming → More drownings
Ice cream ≠> Drowning (no causation)
But they're correlated due to shared cause
2. Reverse Causation
You have the direction backward:
Correlation: Hospitals & deaths
Wrong: Hospitals cause deaths
Right: Deaths/illness cause hospital visits
3. Pure Coincidence
Random chance, especially with small samples or cherry-picked data:
Nicolas Cage movies correlate with pool drownings (r=0.67)
Obviously coincidence!
Tests for Causation
1. Time Order
Cause must come before effect:
- ✅ Marketing spend (Monday) → Sales increase (Tuesday)
- ❌ Sales (Monday) → Marketing spend (Tuesday) can't be causal
2. Controlled Experiment (Gold Standard)
Randomized A/B test:
- Group A: Gets treatment (new feature)
- Group B: Control (no change)
- If A performs better → Evidence of causation
3. Dose-Response Relationship
More cause → More effect:
1 hour study → +5 points
2 hours study → +10 points
3 hours study → +15 points
Consistent relationship strengthens causal claim
4. Plausible Mechanism
Can you explain how X causes Y?
- ✅ Exercise → Heart health (clear biological mechanism)
- ❌ Shoe size → Reading ability (no mechanism, likely age confound)
5. Eliminate Confounders
Statistical controls:
# Regression controlling for confounders
model = ols('sales ~ marketing + temperature + day_of_week + holidays')
# If marketing coefficient is significant after controls → stronger causal claim
Practical Questions to Ask
| Question | What It Checks |
|---|---|
| Does X happen before Y? | Time order |
| What else changed at the same time? | Confounders |
| Could Y cause X instead? | Reverse causation |
| Can we run an experiment? | True causation test |
| Does the relationship make sense? | Plausibility |
Common Business Examples
Email Open Rates and Revenue
Correlation: High email opens → High revenue
Confound: Engaged customers do both
Test: A/B test email content to prove causation
Website Visitors and Sales
Correlation: More traffic → More sales
Could be reverse: More sales → More word of mouth → More traffic
Test: Run paid ads (controlled traffic increase)
Remember: Correlation is easy to find (just calculate r). Causation requires experiments, controls, and careful thinking. When in doubt, say "correlation" not "causes."
← Back to Data Analysis Tips