Normal Distribution Explained

📈 Data Analysis ⏱️ 25 sec read

What is a Normal Distribution?

A normal distribution (also called Gaussian distribution or bell curve) is a probability distribution where data clusters symmetrically around the mean. It's the most common distribution pattern in nature and statistics.

Key Characteristics

Bell-shaped curve: Symmetrical around the center
Mean = Median = Mode: All at the peak
Defined by two parameters: Mean (μ) and standard deviation (σ)
Predictable spread: Follows the 68-95-99.7 rule

The 68-95-99.7 Rule (Empirical Rule)

68% of data falls within 1 standard deviation of the mean (μ ± σ)
95% of data falls within 2 standard deviations (μ ± 2σ)
99.7% of data falls within 3 standard deviations (μ ± 3σ)

Real-World Examples

Heights: Human height follows normal distribution (mean ~170cm)
Test scores: Large class test scores tend to be normally distributed
Measurement errors: Random errors in scientific measurements
Blood pressure: Population blood pressure readings

Python Example

import numpy as np
import matplotlib.pyplot as plt

# Generate normal distribution data
mean = 100
std_dev = 15
data = np.random.normal(mean, std_dev, 1000)

# Plot histogram
plt.hist(data, bins=30, density=True, alpha=0.7)
plt.axvline(mean, color='red', linestyle='--', label='Mean')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Normal Distribution (μ=100, σ=15)')
plt.legend()
plt.show()

# Check the 68-95-99.7 rule
within_1std = np.sum((data >= mean-std_dev) & (data <= mean+std_dev)) / len(data)
within_2std = np.sum((data >= mean-2*std_dev) & (data <= mean+2*std_dev)) / len(data)

print(f"Within 1 std dev: {within_1std:.1%}")  # ~68%
print(f"Within 2 std dev: {within_2std:.1%}")  # ~95%

Why It Matters

Statistical inference: Many statistical tests assume normality
Central Limit Theorem: Sample means approach normal distribution
Outlier detection: Values beyond 3σ are often considered outliers
Confidence intervals: Used to calculate margin of error

Testing for Normality

# Python: Test if data is normally distributed
from scipy import stats

# Shapiro-Wilk test
statistic, p_value = stats.shapiro(data)
if p_value > 0.05:
    print("Data appears normally distributed")
else:
    print("Data does not appear normally distributed")

# Q-Q plot visual check
stats.probplot(data, dist="norm", plot=plt)
plt.show()

Best Practices

Always visualize: Use histograms and Q-Q plots to check normality
Don't assume: Not all data is normally distributed
Transform if needed: Log transforms can sometimes normalize skewed data
Use appropriate tests: If data isn't normal, use non-parametric methods