Module 11 Quick Reference - Simple Linear Regression

CORE FORMULAS

Correlation (r)

r = Σ[(x-x̄)(y-ȳ)] / √[Σ(x-x̄)²Σ(y-ȳ)²]

Range: -1 ≤ r ≤ 1; measures LINEAR strength

Regression Equation

ŷ = b₀ + b₁x

ŷ = predicted y; b₀ = intercept; b₁ = slope

Slope

b₁ = r(sᵧ/sₓ) = Σ[(x-x̄)(y-ȳ)] / Σ(x-x̄)²

Intercept

b₀ = ȳ - b₁x̄

Residual

e = y - ŷ

Positive: above line; Negative: below line

Coefficient of Determination

r² = (correlation)²

% of variation in y explained by x

Standard Error

sₑ = √[Σ(y-ŷ)² / (n-2)]

Test for ρ

t = r√[(n-2)/(1-r²)]; df = n-2

Test for β₁

t = b₁/SE(b₁); df = n-2

SE(b₁) = sₑ/√Σ(x-x̄)²

CI for Slope

b₁ ± t*SE(b₁)

CI for Mean Response

ŷ ± t*sₑ√[1/n + (x*-x̄)²/Σ(x-x̄)²]

Prediction Interval

ŷ ± t*sₑ√[1+1/n+(x*-x̄)²/Σ(x-x̄)²]

Note the "1+" for individual variation

KEY INTERPRETATIONS

Slope (b₁):
"For each 1-unit increase in x, y increases/decreases by |b₁| units on average."

r² = 0.64:
"64% of the variation in y is explained by the linear relationship with x."

r = -0.85:
"Strong negative linear relationship" (|r| close to 1 = strong)

LINE CONDITIONS (must check before inference)

Condition	Meaning	How to Check
Linearity	Relationship is linear	Scatter plot linear; residual plot shows no curve
Independence	Observations independent	Study design (random sample, no time series)
Normality	Residuals ≈ normal	Histogram of residuals bell-shaped (less critical if n≥30)
Equal Variance	Constant spread	Residual plot shows constant vertical spread (no fan)

HYPOTHESIS TESTS

For Correlation:
H₀: ρ = 0 (no linear relationship)
Hₐ: ρ ≠ 0 (relationship exists)
t = r√[(n-2)/(1-r²)], df = n-2

For Slope:
H₀: β₁ = 0 (no linear relationship)
Hₐ: β₁ ≠ 0 (relationship exists)
t = b₁/SE(b₁), df = n-2
Testing ρ=0 ≡ Testing β₁=0

CONFIDENCE vs PREDICTION INTERVALS

Aspect	Confidence Interval	Prediction Interval
For	MEAN y for all at x*	SINGLE y for one at x*
Formula	ŷ ± tsₑ√[1/n + (x-x̄)²/Σ(x-x̄)²]	ŷ ± tsₑ√[1+1/n + (x-x̄)²/Σ(x-x̄)²]
Width	Narrower	Wider (accounts for individual variation)
Use	Estimating population average	Predicting specific individual

KEY FACTS TO REMEMBER

Correlation ≠ Causation! Always remember this
r measures LINEAR relationships only
Strength = |r|; sign = direction
r is unitless; r(x,y) = r(y,x)
r² always between 0 and 1 (always positive)
Residual line passes through (x̄, ȳ)
Sum of residuals always = 0
Interpolation (within range) safe; extrapolation risky
df for regression: always n - 2
PI always wider than CI
Testing ρ=0 same as testing β₁=0
Check LINE before inference!

CORRELATION STRENGTH GUIDE

\|r\| Value	Strength	\|r\| Value	Strength
\|r\| = 1.0	Perfect	0 < \|r\| < 0.3	Weak
0.7 ≤ \|r\| < 1.0	Strong	\|r\| = 0	No linear relationship
0.3 ≤ \|r\| < 0.7	Moderate

COMMON MISTAKES TO AVOID

Confusing r and r² (r² = r × r)
Thinking negative r = weak (|r| = strength!)
Claiming causation from correlation
Extrapolating beyond data range
Forgetting "1+" in prediction interval
Using wrong df (should be n-2)
Confusing CI (for mean) with PI (for individual)
Not checking LINE conditions

Learn Without Walls

Module 11 Quick Reference: Simple Linear Regression