Tip: This is a one-page quick reference. Use Ctrl+P (Cmd+P on Mac) to print.

Module 11 Quick Reference: Simple Linear Regression

CORE FORMULAS
Correlation (r)
r = Σ[(x-x̄)(y-ȳ)] / √[Σ(x-x̄)²Σ(y-ȳ)²]
Range: -1 ≤ r ≤ 1; measures LINEAR strength
Regression Equation
ŷ = b₀ + b₁x
ŷ = predicted y; b₀ = intercept; b₁ = slope
Slope
b₁ = r(sᵧ/sₓ) = Σ[(x-x̄)(y-ȳ)] / Σ(x-x̄)²
Intercept
b₀ = ȳ - b₁x̄
Residual
e = y - ŷ
Positive: above line; Negative: below line
Coefficient of Determination
r² = (correlation)²
% of variation in y explained by x
Standard Error
sₑ = √[Σ(y-ŷ)² / (n-2)]
Test for ρ
t = r√[(n-2)/(1-r²)]; df = n-2
Test for β₁
t = b₁/SE(b₁); df = n-2
SE(b₁) = sₑ/√Σ(x-x̄)²
CI for Slope
b₁ ± t*SE(b₁)
CI for Mean Response
ŷ ± t*sₑ√[1/n + (x*-x̄)²/Σ(x-x̄)²]
Prediction Interval
ŷ ± t*sₑ√[1+1/n+(x*-x̄)²/Σ(x-x̄)²]
Note the "1+" for individual variation
KEY INTERPRETATIONS
Slope (b₁):
"For each 1-unit increase in x, y increases/decreases by |b₁| units on average."
r² = 0.64:
"64% of the variation in y is explained by the linear relationship with x."
r = -0.85:
"Strong negative linear relationship" (|r| close to 1 = strong)
LINE CONDITIONS (must check before inference)
Condition Meaning How to Check
Linearity Relationship is linear Scatter plot linear; residual plot shows no curve
Independence Observations independent Study design (random sample, no time series)
Normality Residuals ≈ normal Histogram of residuals bell-shaped (less critical if n≥30)
Equal Variance Constant spread Residual plot shows constant vertical spread (no fan)
HYPOTHESIS TESTS
For Correlation:
H₀: ρ = 0 (no linear relationship)
Hₐ: ρ ≠ 0 (relationship exists)
t = r√[(n-2)/(1-r²)], df = n-2
For Slope:
H₀: β₁ = 0 (no linear relationship)
Hₐ: β₁ ≠ 0 (relationship exists)
t = b₁/SE(b₁), df = n-2
Testing ρ=0 ≡ Testing β₁=0
CONFIDENCE vs PREDICTION INTERVALS
Aspect Confidence Interval Prediction Interval
For MEAN y for all at x* SINGLE y for one at x*
Formula ŷ ± t*sₑ√[1/n + (x*-x̄)²/Σ(x-x̄)²] ŷ ± t*sₑ√[1+1/n + (x*-x̄)²/Σ(x-x̄)²]
Width Narrower Wider (accounts for individual variation)
Use Estimating population average Predicting specific individual
KEY FACTS TO REMEMBER
  • Correlation ≠ Causation! Always remember this
  • r measures LINEAR relationships only
  • Strength = |r|; sign = direction
  • r is unitless; r(x,y) = r(y,x)
  • r² always between 0 and 1 (always positive)
  • Residual line passes through (x̄, ȳ)
  • Sum of residuals always = 0
  • Interpolation (within range) safe; extrapolation risky
  • df for regression: always n - 2
  • PI always wider than CI
  • Testing ρ=0 same as testing β₁=0
  • Check LINE before inference!
CORRELATION STRENGTH GUIDE
|r| Value Strength |r| Value Strength
|r| = 1.0 Perfect 0 < |r| < 0.3 Weak
0.7 ≤ |r| < 1.0 Strong |r| = 0 No linear relationship
0.3 ≤ |r| < 0.7 Moderate
COMMON MISTAKES TO AVOID
  • Confusing r and r² (r² = r × r)
  • Thinking negative r = weak (|r| = strength!)
  • Claiming causation from correlation
  • Extrapolating beyond data range
  • Forgetting "1+" in prediction interval
  • Using wrong df (should be n-2)
  • Confusing CI (for mean) with PI (for individual)
  • Not checking LINE conditions
← Back to Module 11