P-Value Explained

📈 Data Analysis ⏱️ 35 sec read

What is a P-Value?

A p-value is the probability of getting your observed results (or more extreme) if the null hypothesis is true. In plain English: how likely is it you'd see this data by random chance?

Simple Explanation

You test a new website design and see 10% more sales. Question: Is that real, or just luck?

P-value = 0.03 (3%): Only 3% chance this result is random luck → probably real effect
P-value = 0.40 (40%): 40% chance this is just random → could easily be luck

The 0.05 Threshold

p < 0.05 is the common threshold for "statistically significant"

p < 0.05: Less than 5% chance it's random → reject null hypothesis
p ≥ 0.05: 5% or higher chance it's random → cannot reject null hypothesis

Note: 0.05 is arbitrary. Some fields use 0.01 or 0.10.

Interpretation Guide

P-Value	Meaning	Decision
< 0.001	Very strong evidence	Reject null hypothesis
0.001 - 0.01	Strong evidence	Reject null hypothesis
0.01 - 0.05	Moderate evidence	Reject null hypothesis
0.05 - 0.10	Weak evidence	Marginally significant
> 0.10	Little to no evidence	Cannot reject null

Real Example: A/B Test

Scenario: Testing two website buttons

Control (A):  1000 visitors, 50 clicks (5.0% click rate)
Treatment (B): 1000 visitors, 65 clicks (6.5% click rate)

Question: Is B really better, or just luck?

# Python: Chi-square test
from scipy.stats import chi2_contingency

observed = [[50, 950], [65, 935]]  # clicks, no-clicks
chi2, p_value, dof, expected = chi2_contingency(observed)

print(f"P-value: {p_value:.4f}")
# P-value: 0.1856

# Interpretation: p = 0.19 > 0.05
# Cannot reject null hypothesis
# Could easily be random chance

Python Examples

from scipy import stats
import numpy as np

# Example 1: T-test (comparing two groups)
group_a = [23, 25, 28, 22, 24]  # Control
group_b = [30, 32, 35, 29, 33]  # Treatment

t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"P-value: {p_value:.4f}")  # 0.0012

if p_value < 0.05:
    print("Statistically significant difference")
else:
    print("No significant difference")

# Example 2: Correlation test
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

correlation, p_value = stats.pearsonr(x, y)
print(f"Correlation: {correlation:.3f}")
print(f"P-value: {p_value:.4f}")

# Example 3: One-sample t-test
# Is mean different from hypothesized value?
data = [10.2, 9.8, 10.5, 10.1, 9.9]
t_stat, p_value = stats.ttest_1samp(data, 10)  # Test if mean = 10
print(f"P-value: {p_value:.4f}")

Common Misconceptions

❌ Wrong: "P-value is the probability the null hypothesis is true"
✅ Right: "P-value is probability of seeing this data IF null hypothesis is true"

❌ Wrong: "P < 0.05 proves the effect is real"
✅ Right: "P < 0.05 suggests effect is unlikely due to chance alone"

❌ Wrong: "Lower p-value = bigger effect"
✅ Right: "Lower p-value = more confident effect exists, not necessarily bigger"

P-Value vs Effect Size

P-value tells you if there's an effect. Effect size tells you how big the effect is.

# Small effect, large sample → low p-value (significant)
# Large effect, small sample → high p-value (not significant)

# Always report both!
print(f"P-value: {p_value:.4f}")
print(f"Effect size (Cohen's d): {effect_size:.2f}")

SQL Implementation

-- While SQL doesn't calculate p-values directly,
-- you can prepare data for statistical tests

-- Example: Prepare data for t-test
SELECT
    experiment_group,
    AVG(conversion_rate) as mean_conversion,
    STDDEV(conversion_rate) as std_dev,
    COUNT(*) as sample_size
FROM ab_test_results
GROUP BY experiment_group;

Important Limitations

Sample size matters: Tiny samples rarely give p < 0.05, even with real effects
Multiple testing: Testing many hypotheses inflates false positives
Not everything: P-value doesn't tell you about practical importance
Publication bias: Studies with p < 0.05 more likely to be published

Best Practices

Pre-register: Decide significance level before testing
Report effect size: Don't just report p-value
Report confidence intervals: More informative than p-value alone
Consider context: 0.05 threshold isn't sacred
Replicate: One low p-value doesn't prove anything

Key Takeaways:

P-value = probability of seeing this data if null hypothesis is true
p < 0.05: Generally considered statistically significant
Low p-value ≠ large or important effect
Always report effect size alongside p-value
P-values are just one piece of evidence, not the whole story