Useful Data Tips

Mean, Median, and Mode Explained

📈 Data Analysis ⏱️ 30 sec read

The Three Measures of Central Tendency

Mean, median, and mode are the three ways to measure the "center" of a dataset. Each tells you something different about your data.

Mean (Average)

Definition: Sum of all values divided by the count.

# Example dataset: [10, 20, 30, 40, 100]

# Mean calculation
mean = sum(values) / len(values)
mean = 200 / 5 = 40

# Python
import numpy as np
mean = np.mean([10, 20, 30, 40, 100])  # 40.0

Median (Middle Value)

Definition: The middle value when data is sorted.

# Example dataset: [10, 20, 30, 40, 100]
# Sorted: [10, 20, 30, 40, 100]
# Middle value (position 3 of 5) = 30

# Python
median = np.median([10, 20, 30, 40, 100])  # 30.0

# For even number of values, take average of middle two
median = np.median([10, 20, 30, 40])  # 25.0 (average of 20 and 30)

Mode (Most Frequent)

Definition: The value that appears most often.

# Example dataset: [1, 2, 2, 3, 4, 2, 5]
# Mode = 2 (appears 3 times)

# Python
from scipy import stats
mode = stats.mode([1, 2, 2, 3, 4, 2, 5])
print(mode.mode)  # 2

# Can have multiple modes (bimodal, multimodal)
# Example: [1, 1, 2, 2, 3] has modes 1 and 2

When to Use Each One

Scenario Best Measure Why
Income data Median Billionaires skew the mean
Test scores (normal) Mean Symmetrical distribution
House prices Median Mansions skew the mean
Survey ratings Mode Find most common response
Product sizes sold Mode What size is most popular?

Real-World Example: Salaries

salaries = [35000, 40000, 42000, 45000, 48000, 50000, 52000, 55000, 250000]

# Mean (Average)
mean_salary = np.mean(salaries)  # $68,556
# Misleading! CEO salary pulls average way up

# Median (Middle value)
median_salary = np.median(salaries)  # $48,000
# Better representation of "typical" salary

# Mode (Most common)
# Not useful here - all values unique

All Three Together in Python

import numpy as np
from scipy import stats

data = [10, 20, 20, 30, 40, 50, 100]

mean = np.mean(data)        # 38.57
median = np.median(data)    # 30.0
mode = stats.mode(data).mode  # 20

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Mode: {mode}")

SQL Examples

-- Calculate mean, median (approximate), mode
SELECT
    AVG(salary) as mean_salary,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary,
    MODE() WITHIN GROUP (ORDER BY department) as mode_department
FROM employees;

Excel Formulas

=AVERAGE(A1:A100)   -- Mean
=MEDIAN(A1:A100)    -- Median
=MODE(A1:A100)      -- Mode (single mode)
=MODE.MULT(A1:A100) -- Multiple modes

Key Takeaways:

  • Mean: Best for symmetric data, but sensitive to outliers
  • Median: Best for skewed data, resistant to outliers
  • Mode: Best for categorical data or finding most common value
  • Report all three: They tell different stories about your data