P-Value Explained
What is a P-Value?
A p-value is the probability of getting your observed results (or more extreme) if the null hypothesis is true. In plain English: how likely is it you'd see this data by random chance?
Simple Explanation
You test a new website design and see 10% more sales. Question: Is that real, or just luck?
- P-value = 0.03 (3%): Only 3% chance this result is random luck → probably real effect
- P-value = 0.40 (40%): 40% chance this is just random → could easily be luck
The 0.05 Threshold
p < 0.05 is the common threshold for "statistically significant"
- p < 0.05: Less than 5% chance it's random → reject null hypothesis
- p ≥ 0.05: 5% or higher chance it's random → cannot reject null hypothesis
Note: 0.05 is arbitrary. Some fields use 0.01 or 0.10.
Interpretation Guide
| P-Value | Meaning | Decision |
|---|---|---|
| < 0.001 | Very strong evidence | Reject null hypothesis |
| 0.001 - 0.01 | Strong evidence | Reject null hypothesis |
| 0.01 - 0.05 | Moderate evidence | Reject null hypothesis |
| 0.05 - 0.10 | Weak evidence | Marginally significant |
| > 0.10 | Little to no evidence | Cannot reject null |
Real Example: A/B Test
Scenario: Testing two website buttons
Control (A): 1000 visitors, 50 clicks (5.0% click rate)
Treatment (B): 1000 visitors, 65 clicks (6.5% click rate)
Question: Is B really better, or just luck?
# Python: Chi-square test
from scipy.stats import chi2_contingency
observed = [[50, 950], [65, 935]] # clicks, no-clicks
chi2, p_value, dof, expected = chi2_contingency(observed)
print(f"P-value: {p_value:.4f}")
# P-value: 0.1856
# Interpretation: p = 0.19 > 0.05
# Cannot reject null hypothesis
# Could easily be random chance
Python Examples
from scipy import stats
import numpy as np
# Example 1: T-test (comparing two groups)
group_a = [23, 25, 28, 22, 24] # Control
group_b = [30, 32, 35, 29, 33] # Treatment
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"P-value: {p_value:.4f}") # 0.0012
if p_value < 0.05:
print("Statistically significant difference")
else:
print("No significant difference")
# Example 2: Correlation test
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
correlation, p_value = stats.pearsonr(x, y)
print(f"Correlation: {correlation:.3f}")
print(f"P-value: {p_value:.4f}")
# Example 3: One-sample t-test
# Is mean different from hypothesized value?
data = [10.2, 9.8, 10.5, 10.1, 9.9]
t_stat, p_value = stats.ttest_1samp(data, 10) # Test if mean = 10
print(f"P-value: {p_value:.4f}")
Common Misconceptions
- ❌ Wrong: "P-value is the probability the null hypothesis is true"
- ✅ Right: "P-value is probability of seeing this data IF null hypothesis is true"
- ❌ Wrong: "P < 0.05 proves the effect is real"
- ✅ Right: "P < 0.05 suggests effect is unlikely due to chance alone"
- ❌ Wrong: "Lower p-value = bigger effect"
- ✅ Right: "Lower p-value = more confident effect exists, not necessarily bigger"
P-Value vs Effect Size
P-value tells you if there's an effect. Effect size tells you how big the effect is.
# Small effect, large sample → low p-value (significant)
# Large effect, small sample → high p-value (not significant)
# Always report both!
print(f"P-value: {p_value:.4f}")
print(f"Effect size (Cohen's d): {effect_size:.2f}")
SQL Implementation
-- While SQL doesn't calculate p-values directly,
-- you can prepare data for statistical tests
-- Example: Prepare data for t-test
SELECT
experiment_group,
AVG(conversion_rate) as mean_conversion,
STDDEV(conversion_rate) as std_dev,
COUNT(*) as sample_size
FROM ab_test_results
GROUP BY experiment_group;
Important Limitations
- Sample size matters: Tiny samples rarely give p < 0.05, even with real effects
- Multiple testing: Testing many hypotheses inflates false positives
- Not everything: P-value doesn't tell you about practical importance
- Publication bias: Studies with p < 0.05 more likely to be published
Best Practices
- Pre-register: Decide significance level before testing
- Report effect size: Don't just report p-value
- Report confidence intervals: More informative than p-value alone
- Consider context: 0.05 threshold isn't sacred
- Replicate: One low p-value doesn't prove anything
Key Takeaways:
- P-value = probability of seeing this data if null hypothesis is true
- p < 0.05: Generally considered statistically significant
- Low p-value ≠ large or important effect
- Always report effect size alongside p-value
- P-values are just one piece of evidence, not the whole story