Gradient Descent Explained Simply
Gradient descent is the core optimization algorithm in machine learning. It finds the best model parameters by iteratively moving downhill on the loss function surface toward the minimum error.
The Concept: Walking Downhill
# Imagine you're on a mountain in fog
# Goal: Reach the lowest point (valley)
# Strategy: Feel the slope, take step downhill, repeat
1. Start at random position
2. Calculate slope (gradient)
3. Move opposite to slope direction
4. Repeat until you reach the bottom
Same principle applies to minimizing model error!
The Math (Simplified)
# Update rule:
θ = θ - α * ∇J(θ)
Where:
θ = model parameters (weights)
α = learning rate (step size)
∇J(θ) = gradient (slope) of loss function
"Move parameters opposite to gradient direction"
Simple Implementation
import numpy as np
# Simple linear regression with gradient descent
def gradient_descent(X, y, learning_rate=0.01, iterations=1000):
m = len(y)
theta = np.zeros(X.shape[1]) # Initialize parameters
for i in range(iterations):
# Make predictions
predictions = X.dot(theta)
# Calculate error
errors = predictions - y
# Calculate gradient
gradient = (1/m) * X.T.dot(errors)
# Update parameters
theta = theta - learning_rate * gradient
# Calculate loss
if i % 100 == 0:
loss = np.mean(errors**2)
print(f"Iteration {i}: Loss = {loss:.4f}")
return theta
# Example usage
X = np.array([[1, 1], [1, 2], [1, 3], [1, 4]]) # Features with bias term
y = np.array([2, 4, 6, 8]) # Target values
theta = gradient_descent(X, y)
print(f"Final parameters: {theta}")
Learning Rate: The Step Size
Too Large (α too big)
# Takes huge steps
# May overshoot minimum
# Loss bounces around, doesn't converge
# Can even diverge (get worse)
learning_rate = 1.0 # Often too large
Too Small (α too small)
# Takes tiny steps
# Very slow convergence
# May need millions of iterations
# Gets stuck in local minima
learning_rate = 0.00001 # Often too small
Just Right
# Steady progress toward minimum
# Converges in reasonable time
# Typical values: 0.001 to 0.1
learning_rate = 0.01 # Good starting point
Types of Gradient Descent
Batch Gradient Descent
# Uses ALL data for each update
# Accurate but slow for large datasets
for iteration in range(iterations):
gradient = calculate_gradient(all_data) # All samples
theta = theta - learning_rate * gradient
Stochastic Gradient Descent (SGD)
# Uses ONE sample for each update
# Fast but noisy updates
for iteration in range(iterations):
for sample in shuffle(data):
gradient = calculate_gradient(sample) # Single sample
theta = theta - learning_rate * gradient
Mini-Batch Gradient Descent
# Uses small batch (e.g., 32 samples)
# Balance between speed and stability
# Most commonly used in practice
batch_size = 32
for iteration in range(iterations):
for batch in get_batches(data, batch_size):
gradient = calculate_gradient(batch) # Small batch
theta = theta - learning_rate * gradient
Monitoring Convergence
import matplotlib.pyplot as plt
# Track loss over iterations
losses = []
for i in range(iterations):
predictions = X.dot(theta)
loss = np.mean((predictions - y)**2)
losses.append(loss)
# Update parameters
gradient = (1/m) * X.T.dot(predictions - y)
theta = theta - learning_rate * gradient
# Plot learning curve
plt.plot(losses)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('Gradient Descent Convergence')
plt.show()
# Loss should decrease and flatten out
Common Issues
- Slow convergence: Reduce learning rate, use momentum
- Divergence: Lower learning rate significantly
- Local minima: Try multiple random initializations
- Plateaus: Use adaptive learning rates (Adam, RMSprop)
Advanced Variants
# Momentum (faster convergence)
velocity = 0.9 * velocity + learning_rate * gradient
theta = theta - velocity
# Adam (adaptive learning rates)
from torch.optim import Adam
optimizer = Adam(model.parameters(), lr=0.001)
# Most popular in deep learning
Pro Tip: Start with mini-batch gradient descent and Adam optimizer—they work well in most cases. Monitor your loss curve: it should steadily decrease. If it bounces around, lower the learning rate!
← Back to AI & ML Tips