Regression Analysis Basics
What Regression Does
Goal: Predict a continuous outcome from one or more input variables
Example: Predict house price from size, bedrooms, and location
Reading the Output
Coefficient (slope): Change in Y when X increases by 1 unit
- Size coefficient = 150 → Each extra sqft adds $150 to price
- Negative coefficient → Inverse relationship
R-squared: Percentage of variance explained
- R² = 0.85 → Model explains 85% of variation in Y
- Higher is better, but context matters
Key Assumptions (Check These!)
- Linear relationship - Plot X vs Y to verify
- Independent observations - No repeated measurements
- Homoscedasticity - Error variance is constant
- Normal residuals - For small samples especially
Common Mistakes
❌ Extrapolating beyond your data range - Model doesn't know what happens outside training range
❌ Ignoring multicollinearity - Correlated predictors make coefficients unstable
❌ Using R² alone to judge model - Plot residuals to check assumptions
Quick Decision Tree
- Predicting continuous outcome → Linear regression
- Predicting binary outcome (yes/no) → Logistic regression
- Non-linear patterns → Try polynomial terms or other models
Best practice: Always plot your residuals. A good model should show random scatter with no patterns. Patterns in residuals = violated assumptions.
← Back to Data Analysis Tips