Useful Data Tips

Normal Distribution Explained

📈 Data Analysis ⏱️ 25 sec read

What is a Normal Distribution?

A normal distribution (also called Gaussian distribution or bell curve) is a probability distribution where data clusters symmetrically around the mean. It's the most common distribution pattern in nature and statistics.

Key Characteristics

The 68-95-99.7 Rule (Empirical Rule)

Real-World Examples

Python Example

import numpy as np
import matplotlib.pyplot as plt

# Generate normal distribution data
mean = 100
std_dev = 15
data = np.random.normal(mean, std_dev, 1000)

# Plot histogram
plt.hist(data, bins=30, density=True, alpha=0.7)
plt.axvline(mean, color='red', linestyle='--', label='Mean')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Normal Distribution (μ=100, σ=15)')
plt.legend()
plt.show()

# Check the 68-95-99.7 rule
within_1std = np.sum((data >= mean-std_dev) & (data <= mean+std_dev)) / len(data)
within_2std = np.sum((data >= mean-2*std_dev) & (data <= mean+2*std_dev)) / len(data)

print(f"Within 1 std dev: {within_1std:.1%}")  # ~68%
print(f"Within 2 std dev: {within_2std:.1%}")  # ~95%

Why It Matters

Testing for Normality

# Python: Test if data is normally distributed
from scipy import stats

# Shapiro-Wilk test
statistic, p_value = stats.shapiro(data)
if p_value > 0.05:
    print("Data appears normally distributed")
else:
    print("Data does not appear normally distributed")

# Q-Q plot visual check
stats.probplot(data, dist="norm", plot=plt)
plt.show()

Best Practices