Lesson 4: Confidence and Prediction Intervals
Estimating parameters and making predictions with intervals
Home > Intro Stats > Module 11 > Lesson 4
Learning Objectives
By the end of this lesson, you will be able to:
- Construct and interpret confidence intervals for the slope (β₁)
- Construct and interpret confidence intervals for mean response
- Construct and interpret prediction intervals for individual responses
- Understand the difference between confidence and prediction intervals
- Explain why prediction intervals are wider than confidence intervals
- Use intervals to make informed decisions
1. Confidence Interval for Slope (β₁)
In Lesson 3, we tested whether the slope equals zero (H₀: β₁ = 0). But what if we want to estimate the actual value of the population slope? We use a confidence interval!
Confidence Interval for Slope
Where:
- b₁ = sample slope
- t* = critical value from t-distribution with df = n - 2
- SE(b₁) = sₑ / √[Σ(x - x̄)²] (standard error of slope)
Interpretation Template:
"We are [C]% confident that the true slope (β₁) is between [lower bound] and [upper bound]."
More specifically: "We are [C]% confident that for each 1-[unit of x] increase in [x variable], [y variable] changes by between [lower] and [upper] [units of y] on average."
Complete Example: CI for Slope
Scenario: Predicting house price (in $1000s) from square footage
Data:
- n = 25 houses
- Regression equation: ŷ = 50 + 0.12x
- b₁ = 0.12 (thousand dollars per square foot)
- SE(b₁) = 0.035
- df = 25 - 2 = 23
Construct a 95% confidence interval for β₁:
Step 1: Find Critical Value
For 95% CI with df = 23: t* = 2.069 (from t-table)
Step 2: Calculate Margin of Error
ME = t* × SE(b₁) = 2.069 × 0.035 = 0.072
Step 3: Construct Interval
Lower bound: 0.12 - 0.072 = 0.048
Upper bound: 0.12 + 0.072 = 0.192
95% CI: (0.048, 0.192)
Interpretation
Statistical: We are 95% confident that the true slope (β₁) is between 0.048 and 0.192 thousand dollars per square foot.
In Context: We are 95% confident that for each additional square foot of house size, the price increases by between $48 and $192 on average.
Connection to Hypothesis Testing
Key insight: If the confidence interval does NOT contain 0, then we would reject H₀: β₁ = 0 at that significance level.
In our example, the interval (0.048, 0.192) does NOT contain 0, so we conclude there IS a significant relationship at α = 0.05.
2. Confidence Interval for Mean Response
Suppose we want to estimate the average y value for a specific x value. For example, what's the average price of ALL 2000-square-foot houses?
We use a confidence interval for mean response.
Confidence Interval for Mean Response
Purpose: Estimate the average y value for all individuals with a specific x value (x*).
Where:
- ŷ = predicted value at x = x*
- t* = critical value with df = n - 2
- sₑ = standard error of estimate
- x* = specific x value of interest
- x̄ = mean of x values in data
Interpretation: "We are [C]% confident that the average [y variable] for [x variable] = [x*] is between [lower] and [upper]."
Example: CI for Mean Response
Question: What is the average price of all 2000-square-foot houses?
Given:
- Regression equation: ŷ = 50 + 0.12x
- sₑ = 15 (thousand dollars)
- n = 25
- x̄ = 1800 square feet
- Σ(x - x̄)² = 500,000
- x* = 2000 square feet
- t* = 2.069 (95% CI, df = 23)
Step 1: Calculate Predicted Value
ŷ = 50 + 0.12(2000) = 50 + 240 = 290 thousand dollars
Step 2: Calculate Standard Error
SE(ŷmean) = 15√[1/25 + (2000-1800)²/500,000]
= 15√[0.04 + 40,000/500,000]
= 15√[0.04 + 0.08]
= 15√0.12
= 15 × 0.346 = 5.19
Step 3: Construct Interval
ME = 2.069 × 5.19 = 10.74
Lower: 290 - 10.74 = 279.26
Upper: 290 + 10.74 = 300.74
95% CI: (279.26, 300.74) thousand dollars
Interpretation
We are 95% confident that the average price of all 2000-square-foot houses is between $279,260 and $300,740.
Width of CI for Mean Response
Notice in the formula that the standard error includes the term (x* - x̄)². This means:
- Narrowest interval: When x* = x̄ (at the mean of x)
- Wider intervals: As x* moves away from x̄ in either direction
- We have the most precision when predicting at the center of our data
Confidence Band for Mean Response
Notice how the confidence band widens as you move away from x̄
3. Prediction Interval for Individual Response
What if we want to predict the price of a single, specific 2000-square-foot house (not the average of all such houses)?
We need a prediction interval, which accounts for individual variation.
Prediction Interval for Individual Response
Purpose: Predict the y value for a single individual with a specific x value (x*).
Where:
- Note the "1 +" at the beginning! This accounts for individual variation.
- All other terms are the same as for mean response
Interpretation: "We are [C]% confident that a single [individual/case] with [x variable] = [x*] will have a [y variable] between [lower] and [upper]."
Example: Prediction Interval
Question: What is the predicted price of a single, specific 2000-square-foot house?
Using the same data as before:
Step 1: Predicted Value (Same)
ŷ = 290 thousand dollars (from before)
Step 2: Calculate Standard Error
SE(ŷind) = 15√[1 + 1/25 + (2000-1800)²/500,000]
= 15√[1 + 0.04 + 0.08]
= 15√1.12
= 15 × 1.058 = 15.87
Notice: This is much larger than SE(ŷmean) = 5.19!
Step 3: Construct Interval
ME = 2.069 × 15.87 = 32.83
Lower: 290 - 32.83 = 257.17
Upper: 290 + 32.83 = 322.83
95% Prediction Interval: (257.17, 322.83) thousand dollars
Interpretation
We are 95% confident that a single 2000-square-foot house will be priced between $257,170 and $322,830.
4. Confidence vs Prediction Intervals: Key Differences
Comparing the Two Types of Intervals
| Aspect | Confidence Interval for Mean Response | Prediction Interval for Individual |
|---|---|---|
| What we're estimating | Average y for ALL individuals at x* | y value for ONE individual at x* |
| Standard error formula | sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²] | sₑ√[1 + 1/n + (x* - x̄)²/Σ(x - x̄)²] |
| Width | Narrower (less uncertainty) | Wider (more uncertainty) |
| Why? | Averages are more predictable | Individuals vary more than averages |
| Example interpretation | "Average price of all 2000-sq-ft houses" | "Price of this specific 2000-sq-ft house" |
| Use when... | Planning, policy decisions, general trends | Predicting for a specific case |
Confidence Band vs Prediction Band
The prediction band (red) is always wider than the confidence band (blue)
Why is the Prediction Interval Wider?
The prediction interval must account for two sources of uncertainty:
- Uncertainty in estimating the mean (same as confidence interval)
- Individual variation around the mean (the "1 +" in the formula)
Analogy: Predicting the average height of all adult males is easier (narrower interval) than predicting the height of one specific adult male (wider interval). Individuals vary!
Mathematical note: As n → ∞, the confidence interval → 0 width (perfect knowledge of the mean), but the prediction interval approaches sₑ (always some individual variation).
Side-by-Side Comparison: Same Data
For a 2000-square-foot house (x* = 2000):
| Interval Type | Point Estimate | Standard Error | Margin of Error | 95% Interval |
|---|---|---|---|---|
| Confidence (Mean) | $290,000 | 5.19 | 10.74 | ($279,260, $300,740) |
| Prediction (Individual) | $290,000 | 15.87 | 32.83 | ($257,170, $322,830) |
Notice:
- Same point estimate ($290,000)
- Prediction SE is about 3 times larger
- Prediction interval is much wider ($65,660 vs $21,480 range)
When to use which?
- Confidence interval: "What's the average market price for 2000-sq-ft houses?" (for appraisers, policy makers)
- Prediction interval: "What will this particular house sell for?" (for a specific buyer or seller)
Check Your Understanding
Question 1: A 95% CI for the slope β₁ is (2.5, 7.8). What conclusion can you make about testing H₀: β₁ = 0 at α = 0.05?
Answer: We would reject H₀: β₁ = 0 at α = 0.05.
Reason: The confidence interval (2.5, 7.8) does NOT contain 0. This means 0 is not a plausible value for β₁, so we have evidence that the slope is significantly different from 0.
Conclusion: There is a significant linear relationship between x and y.
Question 2: What's the difference between predicting the average test score for all students who study 5 hours vs predicting the test score for one specific student who studies 5 hours?
Difference:
- Average for all students: Use a confidence interval for mean response. This estimates the population mean test score for the subgroup of all students who study 5 hours. The interval is relatively narrow because averages are stable.
- One specific student: Use a prediction interval. This predicts what this particular student will score. The interval is wider because individuals vary around the average - some will score higher, some lower.
Example:
- Confidence interval might be (78, 82) - we're confident the average is around 80
- Prediction interval might be (65, 95) - this specific student could score anywhere in a wider range
Question 3: Why is the prediction interval always wider than the confidence interval for the same x* value?
Answer: The prediction interval is wider because it accounts for individual variation in addition to uncertainty in estimating the mean.
Two sources of uncertainty in prediction interval:
- Uncertainty in the regression line (same as confidence interval)
- Individual variation around the line (the extra "1 +" in the SE formula)
Mathematical evidence:
- CI standard error: sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²]
- PI standard error: sₑ√[1 + 1/n + (x* - x̄)²/Σ(x - x̄)²]
The "1 +" makes the prediction SE larger, thus the interval is wider.
Intuition: It's easier to predict an average than to predict an individual. Individuals are unpredictable!
Question 4: At which x value will the confidence interval for mean response be narrowest?
Answer: At x* = x̄ (the mean of the x values).
Reason: The standard error formula includes the term (x* - x̄)². When x* = x̄, this term equals 0, minimizing the standard error.
SE(ŷmean) = sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²]
As x* moves farther from x̄ in either direction, (x* - x̄)² increases, so SE increases, and the interval widens.
Practical implication: We have the most precision when predicting near the center of our data. Predictions far from x̄ are less reliable (and extrapolation is even worse!).
Lesson 4 Summary
Three Types of Intervals in Regression
| Interval Type | What It Estimates | Formula | When to Use |
|---|---|---|---|
| CI for Slope | True slope β₁ | b₁ ± t* × SE(b₁) | To estimate the rate of change |
| CI for Mean Response | Average y at x* | ŷ ± t* × sₑ√[1/n + (x*-x̄)²/Σ(x-x̄)²] | To estimate population mean at x* |
| Prediction Interval | Individual y at x* | ŷ ± t* × sₑ√[1 + 1/n + (x*-x̄)²/Σ(x-x̄)²] | To predict a specific individual |
Key Takeaway: Prediction intervals are always wider than confidence intervals because individual values vary more than averages.