Mean, Median, and Mode Explained
The Three Measures of Central Tendency
Mean, median, and mode are the three ways to measure the "center" of a dataset. Each tells you something different about your data.
Mean (Average)
Definition: Sum of all values divided by the count.
# Example dataset: [10, 20, 30, 40, 100]
# Mean calculation
mean = sum(values) / len(values)
mean = 200 / 5 = 40
# Python
import numpy as np
mean = np.mean([10, 20, 30, 40, 100]) # 40.0
- Best for: Normally distributed data without outliers
- Problem: Sensitive to extreme values (100 pulls the average up)
Median (Middle Value)
Definition: The middle value when data is sorted.
# Example dataset: [10, 20, 30, 40, 100]
# Sorted: [10, 20, 30, 40, 100]
# Middle value (position 3 of 5) = 30
# Python
median = np.median([10, 20, 30, 40, 100]) # 30.0
# For even number of values, take average of middle two
median = np.median([10, 20, 30, 40]) # 25.0 (average of 20 and 30)
- Best for: Skewed data or data with outliers
- Advantage: Not affected by extreme values
Mode (Most Frequent)
Definition: The value that appears most often.
# Example dataset: [1, 2, 2, 3, 4, 2, 5]
# Mode = 2 (appears 3 times)
# Python
from scipy import stats
mode = stats.mode([1, 2, 2, 3, 4, 2, 5])
print(mode.mode) # 2
# Can have multiple modes (bimodal, multimodal)
# Example: [1, 1, 2, 2, 3] has modes 1 and 2
- Best for: Categorical data or finding most common value
- Use case: "What's the most popular product?" or "Most common rating?"
When to Use Each One
| Scenario | Best Measure | Why |
|---|---|---|
| Income data | Median | Billionaires skew the mean |
| Test scores (normal) | Mean | Symmetrical distribution |
| House prices | Median | Mansions skew the mean |
| Survey ratings | Mode | Find most common response |
| Product sizes sold | Mode | What size is most popular? |
Real-World Example: Salaries
salaries = [35000, 40000, 42000, 45000, 48000, 50000, 52000, 55000, 250000]
# Mean (Average)
mean_salary = np.mean(salaries) # $68,556
# Misleading! CEO salary pulls average way up
# Median (Middle value)
median_salary = np.median(salaries) # $48,000
# Better representation of "typical" salary
# Mode (Most common)
# Not useful here - all values unique
All Three Together in Python
import numpy as np
from scipy import stats
data = [10, 20, 20, 30, 40, 50, 100]
mean = np.mean(data) # 38.57
median = np.median(data) # 30.0
mode = stats.mode(data).mode # 20
print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Mode: {mode}")
SQL Examples
-- Calculate mean, median (approximate), mode
SELECT
AVG(salary) as mean_salary,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary,
MODE() WITHIN GROUP (ORDER BY department) as mode_department
FROM employees;
Excel Formulas
=AVERAGE(A1:A100) -- Mean
=MEDIAN(A1:A100) -- Median
=MODE(A1:A100) -- Mode (single mode)
=MODE.MULT(A1:A100) -- Multiple modes
Key Takeaways:
- Mean: Best for symmetric data, but sensitive to outliers
- Median: Best for skewed data, resistant to outliers
- Mode: Best for categorical data or finding most common value
- Report all three: They tell different stories about your data