Mean, Median, and Mode Explained

📈 Data Analysis ⏱️ 30 sec read

The Three Measures of Central Tendency

Mean, median, and mode are the three ways to measure the "center" of a dataset. Each tells you something different about your data.

Mean (Average)

Definition: Sum of all values divided by the count.

# Example dataset: [10, 20, 30, 40, 100]

# Mean calculation
mean = sum(values) / len(values)
mean = 200 / 5 = 40

# Python
import numpy as np
mean = np.mean([10, 20, 30, 40, 100])  # 40.0

Best for: Normally distributed data without outliers
Problem: Sensitive to extreme values (100 pulls the average up)

Median (Middle Value)

Definition: The middle value when data is sorted.

# Example dataset: [10, 20, 30, 40, 100]
# Sorted: [10, 20, 30, 40, 100]
# Middle value (position 3 of 5) = 30

# Python
median = np.median([10, 20, 30, 40, 100])  # 30.0

# For even number of values, take average of middle two
median = np.median([10, 20, 30, 40])  # 25.0 (average of 20 and 30)

Best for: Skewed data or data with outliers
Advantage: Not affected by extreme values

Mode (Most Frequent)

Definition: The value that appears most often.

# Example dataset: [1, 2, 2, 3, 4, 2, 5]
# Mode = 2 (appears 3 times)

# Python
from scipy import stats
mode = stats.mode([1, 2, 2, 3, 4, 2, 5])
print(mode.mode)  # 2

# Can have multiple modes (bimodal, multimodal)
# Example: [1, 1, 2, 2, 3] has modes 1 and 2

Best for: Categorical data or finding most common value
Use case: "What's the most popular product?" or "Most common rating?"

When to Use Each One

Scenario	Best Measure	Why
Income data	Median	Billionaires skew the mean
Test scores (normal)	Mean	Symmetrical distribution
House prices	Median	Mansions skew the mean
Survey ratings	Mode	Find most common response
Product sizes sold	Mode	What size is most popular?

Real-World Example: Salaries

salaries = [35000, 40000, 42000, 45000, 48000, 50000, 52000, 55000, 250000]

# Mean (Average)
mean_salary = np.mean(salaries)  # $68,556
# Misleading! CEO salary pulls average way up

# Median (Middle value)
median_salary = np.median(salaries)  # $48,000
# Better representation of "typical" salary

# Mode (Most common)
# Not useful here - all values unique

All Three Together in Python

import numpy as np
from scipy import stats

data = [10, 20, 20, 30, 40, 50, 100]

mean = np.mean(data)        # 38.57
median = np.median(data)    # 30.0
mode = stats.mode(data).mode  # 20

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Mode: {mode}")

SQL Examples

-- Calculate mean, median (approximate), mode
SELECT
    AVG(salary) as mean_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary,
    MODE() WITHIN GROUP (ORDER BY department) as mode_department
FROM employees;

Excel Formulas

=AVERAGE(A1:A100)   -- Mean
=MEDIAN(A1:A100)    -- Median
=MODE(A1:A100)      -- Mode (single mode)
=MODE.MULT(A1:A100) -- Multiple modes

Key Takeaways:

Mean: Best for symmetric data, but sensitive to outliers
Median: Best for skewed data, resistant to outliers
Mode: Best for categorical data or finding most common value
Report all three: They tell different stories about your data