Multiple Regression
Predicting with multiple variables
📌 Before You Start
What you need: Module 6 completed — especially simple linear regression with lm() and interpretation of R².
What you’ll learn: How to extend regression to multiple predictors. How to interpret each coefficient "holding all other variables constant." The difference between R² and adjusted R². How to check residual plots to validate assumptions.
📖 The Concept: Multiple Regression
Multiple regression extends simple regression to include several predictor variables simultaneously. This lets us:
- Control for confounders — hold other variables constant while studying one variable’s effect
- Improve predictions — multiple predictors usually explain more variance than one alone
- Understand relative importance — which predictors matter most?
Interpreting coefficients: Each β represents the change in y for a one-unit increase in that predictor, holding all other predictors constant. This "holding constant" part is key.
Adjusted R² penalizes for adding predictors that don’t help. It only increases if a new predictor genuinely explains more variance than expected by chance. Always use adjusted R² to compare models with different numbers of predictors.
Residual assumptions: Residuals (actual − predicted) should be: (1) randomly scattered around zero, (2) have constant variance (no fanning), (3) be approximately normal for inference to be valid.
🔢 The Formula
Each βi = effect of xi holding all other predictors constant | ε = residual error
n = sample size | k = number of predictors | Penalizes unnecessary variables
💻 In R — Worked Example (read-only)
Predicting exam scores from three variables: study hours, sleep hours, and prior GPA. Each coefficient is interpreted holding the other two constant.
🖐️ Your Turn
Exercise 1 — Salary Prediction with Multiple Predictors
Build a model predicting salary from years_experience, education_level (1–4), and dept_size. Which predictor has the strongest impact? Interpret each coefficient.
Exercise 2 — Model Comparison: Does Adding Variables Help?
Fit three models for exam score data: one predictor, two predictors, all three. Compare R² and adjusted R². Does adding variables always improve the model?
Exercise 3 — Residual Analysis
For the salary model from Exercise 1, plot the residuals vs. fitted values. Random scatter around zero = good. Any pattern = problem (the model is missing something).
🧠 Brain Break
Multiple regression is how real-world statistical analysis works. We almost never study one variable in isolation.
Think about it: If you run a simple regression of salary on education and find a big coefficient, it might look like education “causes” high salaries. But if experienced workers also happen to be more educated, multiple regression lets you separate those effects — holding years_experience constant.
✅ Key Takeaway
Multiple regression controls for confounders by holding other variables constant. Adjusted R² is more honest than R² for comparing models. Always check residual plots — random scatter around zero means your model’s assumptions are met.
🏆 Module 7 Complete!
You now know how to build and interpret multiple regression models — one of the most widely used statistical tools in research and industry. One module left: bring it all together in a full analysis.