Understanding Confusion Matrices
A confusion matrix visualizes classification model performance by comparing predicted vs actual labels. It's essential for understanding precision, recall, and model errors.
Basic Confusion Matrix Structure
Predicted
Positive Negative
Actual Positive TP FN
Negative FP TN
TP = True Positives (correctly predicted positive)
FN = False Negatives (missed positives)
FP = False Positives (false alarms)
TN = True Negatives (correctly predicted negative)
Creating a Confusion Matrix
from sklearn.metrics import confusion_matrix
import numpy as np
# Example: Email spam classifier
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0] # Actual labels
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0] # Predicted labels
cm = confusion_matrix(y_true, y_pred)
print(cm)
# Output:
# [[3 1] TN=3, FP=1
# [1 5]] FN=1, TP=5
Visualizing Confusion Matrix
import seaborn as sns
import matplotlib.pyplot as plt
# Create heatmap
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['Not Spam', 'Spam'],
yticklabels=['Not Spam', 'Spam'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Spam Classifier Confusion Matrix')
plt.show()
Key Metrics from Confusion Matrix
Accuracy
# Overall correct predictions
accuracy = (TP + TN) / (TP + TN + FP + FN)
# Example
TP, TN, FP, FN = 5, 3, 1, 1
accuracy = (5 + 3) / (5 + 3 + 1 + 1)
print(f"Accuracy: {accuracy:.2%}") # 80%
Precision (Positive Predictive Value)
# Of predictions labeled positive, how many were correct?
precision = TP / (TP + FP)
precision = 5 / (5 + 1)
print(f"Precision: {precision:.2%}") # 83.33%
# High precision = few false alarms
Recall (Sensitivity, True Positive Rate)
# Of actual positives, how many did we catch?
recall = TP / (TP + FN)
recall = 5 / (5 + 1)
print(f"Recall: {recall:.2%}") # 83.33%
# High recall = catch most positives
F1 Score (Harmonic Mean)
# Balance between precision and recall
f1 = 2 * (precision * recall) / (precision + recall)
f1 = 2 * (0.833 * 0.833) / (0.833 + 0.833)
print(f"F1 Score: {f1:.2%}") # 83.33%
Using Sklearn for All Metrics
from sklearn.metrics import classification_report
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
print(classification_report(y_true, y_pred,
target_names=['Not Spam', 'Spam']))
# Output:
# precision recall f1-score support
# Not Spam 0.75 0.75 0.75 4
# Spam 0.83 0.83 0.83 6
# accuracy 0.80 10
Multi-Class Confusion Matrix
# Example: Image classifier (3 classes)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.3, random_state=42
)
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
# Visualize 3x3 matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title('Iris Classification')
plt.show()
Precision-Recall Tradeoff
# Adjusting decision threshold
from sklearn.metrics import precision_recall_curve
# Get prediction probabilities
y_scores = model.predict_proba(X_test)[:, 1]
# Calculate precision-recall for different thresholds
precision, recall, thresholds = precision_recall_curve(y_test, y_scores)
# Plot
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()
# Lower threshold → Higher recall, lower precision
# Higher threshold → Lower recall, higher precision
When to Optimize for What
| Scenario | Optimize For |
|---|---|
| Medical diagnosis | Recall (don't miss sick patients) |
| Spam filter | Precision (don't block important emails) |
| Fraud detection | Both (F1 score) |
| Recommend products | Precision (only relevant suggestions) |
Interpreting Confusion Matrix Errors
False Positives (Type I Error)
# Predicted positive, actually negative
# Example: Flagging legitimate email as spam
# Cost: User misses important email
False Negatives (Type II Error)
# Predicted negative, actually positive
# Example: Missing spam, letting it through
# Cost: User sees spam in inbox
Complete Example: Customer Churn
from sklearn.metrics import confusion_matrix, classification_report
import pandas as pd
# Predictions
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1]
cm = confusion_matrix(y_true, y_pred)
# Extract values
tn, fp, fn, tp = cm.ravel()
print(f"True Negatives: {tn}")
print(f"False Positives: {fp}")
print(f"False Negatives: {fn}")
print(f"True Positives: {tp}")
# Calculate metrics
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall)
print(f"\nAccuracy: {accuracy:.1%}")
print(f"Precision: {precision:.1%}")
print(f"Recall: {recall:.1%}")
print(f"F1 Score: {f1:.1%}")
Pro Tip: Don't rely on accuracy alone, especially with imbalanced datasets. Use precision, recall, and F1 score to get the full picture. Visualize your confusion matrix to quickly spot where your model is making mistakes!
← Back to Data Analysis Tips