Useful Data Tips

Hypothesis Testing: A Step-by-Step Guide

⏱️ 32 sec read 📈 Data Analysis

Hypothesis testing determines if observed differences in data are statistically significant or just random chance. It's fundamental for A/B testing, experiments, and data-driven decision making.

The 5 Steps of Hypothesis Testing

Step 1: State the Hypotheses

# Null Hypothesis (H0): No effect, no difference
# Alternative Hypothesis (H1): There is an effect/difference

Example: Testing new website design
H0: New design has same conversion rate as old design
H1: New design has different conversion rate

# Two-tailed: Different (higher OR lower)
# One-tailed: Specifically higher (or lower)

Step 2: Choose Significance Level (α)

# Alpha = probability of rejecting H0 when it's true (Type I error)
# Common values: 0.05 (5%), 0.01 (1%)

alpha = 0.05  # 95% confidence level

# Interpretation: Accept 5% chance of false positive

Step 3: Select and Calculate Test Statistic

from scipy import stats
import numpy as np

# T-test for comparing two means
group_a = [23, 25, 27, 24, 26]  # Old design conversions
group_b = [28, 30, 32, 29, 31]  # New design conversions

# Independent samples t-test
t_stat, p_value = stats.ttest_ind(group_a, group_b)

print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.3f}")

Step 4: Determine P-Value

# P-value = probability of seeing results this extreme if H0 is true
# Low p-value = unlikely to occur by chance

if p_value < alpha:
    print(f"p-value ({p_value:.3f}) < alpha ({alpha})")
    print("Result is statistically significant")
else:
    print(f"p-value ({p_value:.3f}) >= alpha ({alpha})")
    print("Result is NOT statistically significant")

Step 5: Draw Conclusion

if p_value < alpha:
    print("Reject H0: New design has significantly different conversion")
else:
    print("Fail to reject H0: No significant difference detected")

# Important: "Fail to reject" ≠ "Accept H0"
# We never prove H0 true, just fail to find evidence against it

Common Statistical Tests

T-Test (Compare Two Groups)

# One sample t-test (compare to known value)
stats.ttest_1samp(data, popmean=100)

# Independent t-test (two separate groups)
stats.ttest_ind(group1, group2)

# Paired t-test (before/after on same subjects)
stats.ttest_rel(before, after)

Z-Test (Large Samples, Known Variance)

from statsmodels.stats.weightstats import ztest

# For large samples (n > 30)
z_stat, p_value = ztest(group1, group2)

Chi-Square Test (Categorical Data)

# Test relationship between categorical variables
from scipy.stats import chi2_contingency

# Contingency table
observed = [[10, 20], [30, 40]]
chi2, p_value, dof, expected = chi2_contingency(observed)

print(f"Chi-square: {chi2:.3f}, p-value: {p_value:.3f}")

Complete Example: A/B Test

import numpy as np
from scipy.stats import ttest_ind

# Scenario: Testing email subject lines
# Control: 1000 emails, 120 opens
# Variant: 1000 emails, 145 opens

control_rate = 120 / 1000  # 12%
variant_rate = 145 / 1000  # 14.5%

# Simulate data for test
np.random.seed(42)
control = np.random.binomial(1, control_rate, 1000)
variant = np.random.binomial(1, variant_rate, 1000)

# Run t-test
t_stat, p_value = ttest_ind(control, variant)

# Interpret
alpha = 0.05
print(f"Control open rate: {control_rate:.1%}")
print(f"Variant open rate: {variant_rate:.1%}")
print(f"Difference: {(variant_rate - control_rate):.1%}")
print(f"P-value: {p_value:.3f}")

if p_value < alpha:
    print("✓ Statistically significant - Use new subject line!")
else:
    print("✗ Not significant - Keep testing")

Understanding Errors

Type I Error (False Positive)

# Rejecting H0 when it's actually true
# Controlled by alpha (significance level)
# Example: Saying new design works when it doesn't

Type II Error (False Negative)

# Failing to reject H0 when it's actually false
# Probability = Beta (β)
# Power = 1 - β (ability to detect true effect)
# Example: Missing that new design actually works better

Power Analysis

from statsmodels.stats.power import ttest_power

# Calculate required sample size
effect_size = 0.3  # Expected difference in std deviations
alpha = 0.05
power = 0.8  # 80% chance to detect effect if it exists

sample_size = ttest_power(effect_size, power, alpha)
print(f"Need {sample_size:.0f} samples per group")

Common Pitfalls

Best Practices

Pro Tip: A significant p-value doesn't mean the effect is large or important! Always report effect size and confidence intervals alongside p-values for complete context.

← Back to Data Analysis Tips