Understanding Confusion Matrices

⏱️ 27 sec read 📈 Data Analysis

A confusion matrix visualizes classification model performance by comparing predicted vs actual labels. It's essential for understanding precision, recall, and model errors.

Basic Confusion Matrix Structure

                Predicted
                Positive  Negative
Actual Positive    TP        FN
       Negative    FP        TN

TP = True Positives (correctly predicted positive)
FN = False Negatives (missed positives)
FP = False Positives (false alarms)
TN = True Negatives (correctly predicted negative)

Creating a Confusion Matrix

from sklearn.metrics import confusion_matrix
import numpy as np

# Example: Email spam classifier
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]  # Actual labels
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]  # Predicted labels

cm = confusion_matrix(y_true, y_pred)
print(cm)

# Output:
# [[3 1]   TN=3, FP=1
#  [1 5]]  FN=1, TP=5

Visualizing Confusion Matrix

import seaborn as sns
import matplotlib.pyplot as plt

# Create heatmap
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Not Spam', 'Spam'],
            yticklabels=['Not Spam', 'Spam'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Spam Classifier Confusion Matrix')
plt.show()

Key Metrics from Confusion Matrix

Accuracy

# Overall correct predictions
accuracy = (TP + TN) / (TP + TN + FP + FN)

# Example
TP, TN, FP, FN = 5, 3, 1, 1
accuracy = (5 + 3) / (5 + 3 + 1 + 1)
print(f"Accuracy: {accuracy:.2%}")  # 80%

Precision (Positive Predictive Value)

# Of predictions labeled positive, how many were correct?
precision = TP / (TP + FP)

precision = 5 / (5 + 1)
print(f"Precision: {precision:.2%}")  # 83.33%

# High precision = few false alarms

Recall (Sensitivity, True Positive Rate)

# Of actual positives, how many did we catch?
recall = TP / (TP + FN)

recall = 5 / (5 + 1)
print(f"Recall: {recall:.2%}")  # 83.33%

# High recall = catch most positives

F1 Score (Harmonic Mean)

# Balance between precision and recall
f1 = 2 * (precision * recall) / (precision + recall)

f1 = 2 * (0.833 * 0.833) / (0.833 + 0.833)
print(f"F1 Score: {f1:.2%}")  # 83.33%

Using Sklearn for All Metrics

from sklearn.metrics import classification_report

y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

print(classification_report(y_true, y_pred,
                          target_names=['Not Spam', 'Spam']))

# Output:
#              precision  recall  f1-score  support
# Not Spam        0.75    0.75      0.75        4
# Spam            0.83    0.83      0.83        6
# accuracy                          0.80       10

Multi-Class Confusion Matrix

# Example: Image classifier (3 classes)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.3, random_state=42
)

model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
print(cm)

# Visualize 3x3 matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris.target_names,
            yticklabels=iris.target_names)
plt.title('Iris Classification')
plt.show()

Precision-Recall Tradeoff

# Adjusting decision threshold
from sklearn.metrics import precision_recall_curve

# Get prediction probabilities
y_scores = model.predict_proba(X_test)[:, 1]

# Calculate precision-recall for different thresholds
precision, recall, thresholds = precision_recall_curve(y_test, y_scores)

# Plot
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()

# Lower threshold → Higher recall, lower precision
# Higher threshold → Lower recall, higher precision

When to Optimize for What

Scenario	Optimize For
Medical diagnosis	Recall (don't miss sick patients)
Spam filter	Precision (don't block important emails)
Fraud detection	Both (F1 score)
Recommend products	Precision (only relevant suggestions)

Interpreting Confusion Matrix Errors

False Positives (Type I Error)

# Predicted positive, actually negative
# Example: Flagging legitimate email as spam
# Cost: User misses important email

False Negatives (Type II Error)

# Predicted negative, actually positive
# Example: Missing spam, letting it through
# Cost: User sees spam in inbox

Complete Example: Customer Churn

from sklearn.metrics import confusion_matrix, classification_report
import pandas as pd

# Predictions
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1]

cm = confusion_matrix(y_true, y_pred)

# Extract values
tn, fp, fn, tp = cm.ravel()

print(f"True Negatives: {tn}")
print(f"False Positives: {fp}")
print(f"False Negatives: {fn}")
print(f"True Positives: {tp}")

# Calculate metrics
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * precision * recall / (precision + recall)

print(f"\nAccuracy: {accuracy:.1%}")
print(f"Precision: {precision:.1%}")
print(f"Recall: {recall:.1%}")
print(f"F1 Score: {f1:.1%}")

Pro Tip: Don't rely on accuracy alone, especially with imbalanced datasets. Use precision, recall, and F1 score to get the full picture. Visualize your confusion matrix to quickly spot where your model is making mistakes!

← Back to Data Analysis Tips