How to Choose the Right Chart
The Key Question
Ask first: "What story am I trying to tell?"
- Comparison → Bar chart, column chart
- Trend over time → Line chart
- Relationship → Scatter plot
- Distribution → Histogram, box plot
- Composition → Pie chart, stacked bar
- Location → Map
Chart Selection Decision Tree
1. What are you showing?
├─ Comparison between categories → Bar/Column Chart
├─ Change over time → Line Chart
├─ Relationship between variables → Scatter Plot
├─ Distribution → Histogram/Box Plot
├─ Part of whole → Pie Chart/Treemap
└─ Many variables → Heatmap
2. How many variables?
├─ One → Histogram, bar chart
├─ Two → Scatter, line, bar
└─ Three+ → Bubble chart, faceted plots
3. How much data?
├─ Few (<10) → Bar, pie
├─ Medium (10-100) → Line, scatter
└─ Many (>100) → Heatmap, density plot
Bar Chart / Column Chart
When to use: Compare values across categories
import matplotlib.pyplot as plt
# Horizontal bar chart (many categories)
categories = ['Product A', 'Product B', 'Product C']
values = [450, 380, 290]
plt.barh(categories, values)
plt.xlabel('Sales ($K)')
plt.title('Sales by Product')
# Vertical column chart (few categories, shows height well)
months = ['Jan', 'Feb', 'Mar']
revenue = [100, 150, 130]
plt.bar(months, revenue)
plt.ylabel('Revenue ($K)')
# When to use:
# ✅ Comparing discrete categories
# ✅ Clear rankings (which is biggest?)
# ✅ Precise value comparisons
# ❌ Don't use for time series (use line chart)
Line Chart
When to use: Show trends over time
# Time series
dates = pd.date_range('2024-01-01', periods=12, freq='M')
revenue = [100, 105, 110, 108, 115, 120, 125, 130, 128, 135, 140, 145]
plt.plot(dates, revenue, marker='o')
plt.xlabel('Date')
plt.ylabel('Revenue ($K)')
plt.title('Revenue Trend')
plt.xticks(rotation=45)
# Multiple lines for comparison
plt.plot(dates, revenue_2023, label='2023')
plt.plot(dates, revenue_2024, label='2024')
plt.legend()
# When to use:
# ✅ Time series data
# ✅ Showing trends, patterns
# ✅ Multiple series comparison
# ✅ Continuous data
# ❌ Don't use for categorical comparisons
Scatter Plot
When to use: Show relationship between two continuous variables
# Correlation between variables
plt.scatter(df['age'], df['income'], alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.title('Age vs Income')
# With color encoding third variable
plt.scatter(df['age'], df['income'], c=df['education_years'],
cmap='viridis', alpha=0.6)
plt.colorbar(label='Years of Education')
# When to use:
# ✅ Finding correlations
# ✅ Identifying clusters
# ✅ Spotting outliers
# ✅ Two continuous variables
# ❌ Don't use with categorical data
Histogram
When to use: Show distribution of single continuous variable
# Distribution
plt.hist(df['age'], bins=20, edgecolor='black')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
# With density curve
from scipy.stats import norm
plt.hist(df['age'], bins=20, density=True, alpha=0.7)
plt.plot(x, norm.pdf(x, mean, std), 'r-', linewidth=2)
# When to use:
# ✅ Understanding data distribution
# ✅ Finding skewness, outliers
# ✅ Comparing distributions
# ❌ Don't use for categorical data (use bar chart)
Box Plot
When to use: Compare distributions across categories
# Compare salary distributions by department
df.boxplot(column='salary', by='department')
plt.title('Salary Distribution by Department')
plt.suptitle('') # Remove auto title
# Shows:
# - Median (line in box)
# - Quartiles (box edges)
# - Outliers (points)
# - Range (whiskers)
# When to use:
# ✅ Comparing distributions
# ✅ Identifying outliers
# ✅ Seeing median, quartiles
# ✅ Multiple groups
# ❌ Don't use with small data (<20 points)
Pie Chart
When to use: Show parts of a whole (use sparingly!)
# Market share
labels = ['Company A', 'Company B', 'Company C', 'Others']
sizes = [35, 25, 20, 20]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title('Market Share')
# When to use:
# ✅ Simple proportions (2-5 slices max)
# ✅ One variable, parts sum to 100%
# ✅ Emphasizing one large slice
# ❌ Don't use for:
# - Many categories (>5)
# - Precise comparisons (use bar chart)
# - Multiple pies (very hard to compare)
Heatmap
When to use: Show patterns in matrix data
import seaborn as sns
# Correlation matrix
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
# Time-based patterns
pivot = df.pivot_table(values='sales', index='day', columns='hour')
sns.heatmap(pivot, cmap='YlOrRd')
plt.title('Sales by Day and Hour')
# When to use:
# ✅ Matrix data (rows x columns)
# ✅ Finding patterns, clusters
# ✅ Correlation matrices
# ✅ Time-based patterns
# ❌ Don't use for simple comparisons
Area Chart
When to use: Show cumulative totals over time
# Stacked area chart
plt.stackplot(dates, revenue_A, revenue_B, revenue_C,
labels=['Product A', 'B', 'C'])
plt.legend(loc='upper left')
plt.title('Revenue by Product Over Time')
# When to use:
# ✅ Show total and components
# ✅ Multiple categories over time
# ✅ Emphasize magnitude
# ❌ Don't use when:
# - Categories don't stack logically
# - Showing trends more important than totals
Bad Chart Choices
❌ 3D Charts
# Almost never use 3D charts
# - Hard to read accurately
# - Distorts values
# - Looks dated
# Exception: True 3D scatter plots for scientific data
❌ Dual-Axis Charts (Usually)
# Be very careful with dual Y-axes
# Can be misleading if scales are manipulated
# Better: Use small multiples or normalize
❌ Too Many Pie Charts
# Don't compare multiple pies
# Very hard to compare slices across pies
# Better: Use grouped bar chart
Chart Selection by Goal
| Your Goal | Best Chart Type |
|---|---|
| Compare categories | Bar chart |
| Show trend over time | Line chart |
| Find correlation | Scatter plot |
| Show distribution | Histogram, box plot |
| Show composition | Stacked bar, pie (if simple) |
| Show rankings | Ordered bar chart |
| Show deviation | Bar chart with reference line |
| Show relationship + magnitude | Bubble chart |
| Show geographic data | Choropleth map |
| Show hierarchical data | Treemap, sunburst |
Data Type Guide
| Data Type | Chart Options |
|---|---|
| 1 categorical | Bar chart, pie chart |
| 1 continuous | Histogram, box plot, density plot |
| 1 categorical + 1 continuous | Bar chart, box plot |
| 2 continuous | Scatter plot, line chart |
| Time + continuous | Line chart, area chart |
| 3 continuous | Bubble chart, 3D scatter |
| Many variables | Heatmap, parallel coordinates |
Common Mistakes
Mistake 1: Wrong Chart for Data Type
# WRONG: Line chart for categorical data
categories = ['Red', 'Blue', 'Green']
values = [10, 15, 8]
plt.plot(categories, values) # Implies order/trend that doesn't exist
# RIGHT: Bar chart
plt.bar(categories, values)
Mistake 2: Too Much Information
# WRONG: 20 lines on one chart
# Can't distinguish colors, too busy
# RIGHT: Use small multiples or facets
import seaborn as sns
sns.relplot(data=df, x='year', y='value', col='category', col_wrap=3)
Mistake 3: Not Starting Y-Axis at Zero
# For bar charts, always start at zero
# Otherwise, differences look exaggerated
plt.ylim(0, max_value * 1.1) # Start at 0
# Exception: Line charts can have non-zero baseline if trends matter more
Best Practices
- Simplicity first: Choose simplest chart that tells the story
- Label everything: Title, axes, legends, units
- Use color purposefully: Not just decoration
- Consider audience: Technical vs. general audience
- Test readability: Can someone understand in 5 seconds?
- Avoid chart junk: 3D effects, unnecessary gridlines
Quick Decision Matrix
Question 1: Time series? → Line chart
Question 2: Comparison? → Bar chart
Question 3: Relationship? → Scatter plot
Question 4: Distribution? → Histogram
Question 5: Part-to-whole? → Stacked bar > Pie
Question 6: Many variables? → Heatmap
Still unsure? → Start with bar chart (most versatile)
Key Takeaways:
- Start with your story: What do you want to show?
- Match chart to data type: Categorical vs continuous
- Bar charts for comparisons, line charts for trends
- Scatter plots for relationships, histograms for distributions
- Use pie charts sparingly (2-5 slices max)
- When in doubt, start with a bar chart
- Simplicity > complexity