Normal Distribution Explained
What is a Normal Distribution?
A normal distribution (also called Gaussian distribution or bell curve) is a probability distribution where data clusters symmetrically around the mean. It's the most common distribution pattern in nature and statistics.
Key Characteristics
- Bell-shaped curve: Symmetrical around the center
- Mean = Median = Mode: All at the peak
- Defined by two parameters: Mean (μ) and standard deviation (σ)
- Predictable spread: Follows the 68-95-99.7 rule
The 68-95-99.7 Rule (Empirical Rule)
- 68% of data falls within 1 standard deviation of the mean (μ ± σ)
- 95% of data falls within 2 standard deviations (μ ± 2σ)
- 99.7% of data falls within 3 standard deviations (μ ± 3σ)
Real-World Examples
- Heights: Human height follows normal distribution (mean ~170cm)
- Test scores: Large class test scores tend to be normally distributed
- Measurement errors: Random errors in scientific measurements
- Blood pressure: Population blood pressure readings
Python Example
import numpy as np
import matplotlib.pyplot as plt
# Generate normal distribution data
mean = 100
std_dev = 15
data = np.random.normal(mean, std_dev, 1000)
# Plot histogram
plt.hist(data, bins=30, density=True, alpha=0.7)
plt.axvline(mean, color='red', linestyle='--', label='Mean')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Normal Distribution (μ=100, σ=15)')
plt.legend()
plt.show()
# Check the 68-95-99.7 rule
within_1std = np.sum((data >= mean-std_dev) & (data <= mean+std_dev)) / len(data)
within_2std = np.sum((data >= mean-2*std_dev) & (data <= mean+2*std_dev)) / len(data)
print(f"Within 1 std dev: {within_1std:.1%}") # ~68%
print(f"Within 2 std dev: {within_2std:.1%}") # ~95%
Why It Matters
- Statistical inference: Many statistical tests assume normality
- Central Limit Theorem: Sample means approach normal distribution
- Outlier detection: Values beyond 3σ are often considered outliers
- Confidence intervals: Used to calculate margin of error
Testing for Normality
# Python: Test if data is normally distributed
from scipy import stats
# Shapiro-Wilk test
statistic, p_value = stats.shapiro(data)
if p_value > 0.05:
print("Data appears normally distributed")
else:
print("Data does not appear normally distributed")
# Q-Q plot visual check
stats.probplot(data, dist="norm", plot=plt)
plt.show()
Best Practices
- Always visualize: Use histograms and Q-Q plots to check normality
- Don't assume: Not all data is normally distributed
- Transform if needed: Log transforms can sometimes normalize skewed data
- Use appropriate tests: If data isn't normal, use non-parametric methods