Learn Without Walls

Lesson 4: Confidence and Prediction Intervals

Estimating parameters and making predictions with intervals

Home > Intro Stats > Module 11 > Lesson 4

Learning Objectives

By the end of this lesson, you will be able to:

1. Confidence Interval for Slope (β₁)

In Lesson 3, we tested whether the slope equals zero (H₀: β₁ = 0). But what if we want to estimate the actual value of the population slope? We use a confidence interval!

Confidence Interval for Slope

b₁ ± t* × SE(b₁)

Where:

  • b₁ = sample slope
  • t* = critical value from t-distribution with df = n - 2
  • SE(b₁) = sₑ / √[Σ(x - x̄)²] (standard error of slope)

Interpretation Template:

"We are [C]% confident that the true slope (β₁) is between [lower bound] and [upper bound]."

More specifically: "We are [C]% confident that for each 1-[unit of x] increase in [x variable], [y variable] changes by between [lower] and [upper] [units of y] on average."

Complete Example: CI for Slope

Scenario: Predicting house price (in $1000s) from square footage

Data:

  • n = 25 houses
  • Regression equation: ŷ = 50 + 0.12x
  • b₁ = 0.12 (thousand dollars per square foot)
  • SE(b₁) = 0.035
  • df = 25 - 2 = 23

Construct a 95% confidence interval for β₁:

Step 1: Find Critical Value

For 95% CI with df = 23: t* = 2.069 (from t-table)

Step 2: Calculate Margin of Error

ME = t* × SE(b₁) = 2.069 × 0.035 = 0.072

Step 3: Construct Interval

Lower bound: 0.12 - 0.072 = 0.048

Upper bound: 0.12 + 0.072 = 0.192

95% CI: (0.048, 0.192)

Interpretation

Statistical: We are 95% confident that the true slope (β₁) is between 0.048 and 0.192 thousand dollars per square foot.

In Context: We are 95% confident that for each additional square foot of house size, the price increases by between $48 and $192 on average.

Connection to Hypothesis Testing

Key insight: If the confidence interval does NOT contain 0, then we would reject H₀: β₁ = 0 at that significance level.

In our example, the interval (0.048, 0.192) does NOT contain 0, so we conclude there IS a significant relationship at α = 0.05.

2. Confidence Interval for Mean Response

Suppose we want to estimate the average y value for a specific x value. For example, what's the average price of ALL 2000-square-foot houses?

We use a confidence interval for mean response.

Confidence Interval for Mean Response

Purpose: Estimate the average y value for all individuals with a specific x value (x*).

ŷ ± t* × SE(ŷmean)

Where:

SE(ŷmean) = sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²]
  • ŷ = predicted value at x = x*
  • t* = critical value with df = n - 2
  • sₑ = standard error of estimate
  • x* = specific x value of interest
  • x̄ = mean of x values in data

Interpretation: "We are [C]% confident that the average [y variable] for [x variable] = [x*] is between [lower] and [upper]."

Example: CI for Mean Response

Question: What is the average price of all 2000-square-foot houses?

Given:

  • Regression equation: ŷ = 50 + 0.12x
  • sₑ = 15 (thousand dollars)
  • n = 25
  • x̄ = 1800 square feet
  • Σ(x - x̄)² = 500,000
  • x* = 2000 square feet
  • t* = 2.069 (95% CI, df = 23)

Step 1: Calculate Predicted Value

ŷ = 50 + 0.12(2000) = 50 + 240 = 290 thousand dollars

Step 2: Calculate Standard Error

SE(ŷmean) = 15√[1/25 + (2000-1800)²/500,000]

= 15√[0.04 + 40,000/500,000]

= 15√[0.04 + 0.08]

= 15√0.12

= 15 × 0.346 = 5.19

Step 3: Construct Interval

ME = 2.069 × 5.19 = 10.74

Lower: 290 - 10.74 = 279.26

Upper: 290 + 10.74 = 300.74

95% CI: (279.26, 300.74) thousand dollars

Interpretation

We are 95% confident that the average price of all 2000-square-foot houses is between $279,260 and $300,740.

Width of CI for Mean Response

Notice in the formula that the standard error includes the term (x* - x̄)². This means:

Confidence Band for Mean Response

Notice how the confidence band widens as you move away from x̄

3. Prediction Interval for Individual Response

What if we want to predict the price of a single, specific 2000-square-foot house (not the average of all such houses)?

We need a prediction interval, which accounts for individual variation.

Prediction Interval for Individual Response

Purpose: Predict the y value for a single individual with a specific x value (x*).

ŷ ± t* × SE(ŷind)

Where:

SE(ŷind) = sₑ√[1 + 1/n + (x* - x̄)²/Σ(x - x̄)²]
  • Note the "1 +" at the beginning! This accounts for individual variation.
  • All other terms are the same as for mean response

Interpretation: "We are [C]% confident that a single [individual/case] with [x variable] = [x*] will have a [y variable] between [lower] and [upper]."

Example: Prediction Interval

Question: What is the predicted price of a single, specific 2000-square-foot house?

Using the same data as before:

Step 1: Predicted Value (Same)

ŷ = 290 thousand dollars (from before)

Step 2: Calculate Standard Error

SE(ŷind) = 15√[1 + 1/25 + (2000-1800)²/500,000]

= 15√[1 + 0.04 + 0.08]

= 15√1.12

= 15 × 1.058 = 15.87

Notice: This is much larger than SE(ŷmean) = 5.19!

Step 3: Construct Interval

ME = 2.069 × 15.87 = 32.83

Lower: 290 - 32.83 = 257.17

Upper: 290 + 32.83 = 322.83

95% Prediction Interval: (257.17, 322.83) thousand dollars

Interpretation

We are 95% confident that a single 2000-square-foot house will be priced between $257,170 and $322,830.

4. Confidence vs Prediction Intervals: Key Differences

Comparing the Two Types of Intervals

Aspect Confidence Interval for Mean Response Prediction Interval for Individual
What we're estimating Average y for ALL individuals at x* y value for ONE individual at x*
Standard error formula sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²] sₑ√[1 + 1/n + (x* - x̄)²/Σ(x - x̄)²]
Width Narrower (less uncertainty) Wider (more uncertainty)
Why? Averages are more predictable Individuals vary more than averages
Example interpretation "Average price of all 2000-sq-ft houses" "Price of this specific 2000-sq-ft house"
Use when... Planning, policy decisions, general trends Predicting for a specific case

Confidence Band vs Prediction Band

The prediction band (red) is always wider than the confidence band (blue)

Why is the Prediction Interval Wider?

The prediction interval must account for two sources of uncertainty:

  1. Uncertainty in estimating the mean (same as confidence interval)
  2. Individual variation around the mean (the "1 +" in the formula)

Analogy: Predicting the average height of all adult males is easier (narrower interval) than predicting the height of one specific adult male (wider interval). Individuals vary!

Mathematical note: As n → ∞, the confidence interval → 0 width (perfect knowledge of the mean), but the prediction interval approaches sₑ (always some individual variation).

Side-by-Side Comparison: Same Data

For a 2000-square-foot house (x* = 2000):

Interval Type Point Estimate Standard Error Margin of Error 95% Interval
Confidence (Mean) $290,000 5.19 10.74 ($279,260, $300,740)
Prediction (Individual) $290,000 15.87 32.83 ($257,170, $322,830)

Notice:

  • Same point estimate ($290,000)
  • Prediction SE is about 3 times larger
  • Prediction interval is much wider ($65,660 vs $21,480 range)

When to use which?

  • Confidence interval: "What's the average market price for 2000-sq-ft houses?" (for appraisers, policy makers)
  • Prediction interval: "What will this particular house sell for?" (for a specific buyer or seller)

Check Your Understanding

Question 1: A 95% CI for the slope β₁ is (2.5, 7.8). What conclusion can you make about testing H₀: β₁ = 0 at α = 0.05?

Answer: We would reject H₀: β₁ = 0 at α = 0.05.

Reason: The confidence interval (2.5, 7.8) does NOT contain 0. This means 0 is not a plausible value for β₁, so we have evidence that the slope is significantly different from 0.

Conclusion: There is a significant linear relationship between x and y.

Question 2: What's the difference between predicting the average test score for all students who study 5 hours vs predicting the test score for one specific student who studies 5 hours?

Difference:

  • Average for all students: Use a confidence interval for mean response. This estimates the population mean test score for the subgroup of all students who study 5 hours. The interval is relatively narrow because averages are stable.
  • One specific student: Use a prediction interval. This predicts what this particular student will score. The interval is wider because individuals vary around the average - some will score higher, some lower.

Example:

  • Confidence interval might be (78, 82) - we're confident the average is around 80
  • Prediction interval might be (65, 95) - this specific student could score anywhere in a wider range

Question 3: Why is the prediction interval always wider than the confidence interval for the same x* value?

Answer: The prediction interval is wider because it accounts for individual variation in addition to uncertainty in estimating the mean.

Two sources of uncertainty in prediction interval:

  1. Uncertainty in the regression line (same as confidence interval)
  2. Individual variation around the line (the extra "1 +" in the SE formula)

Mathematical evidence:

  • CI standard error: sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²]
  • PI standard error: sₑ√[1 + 1/n + (x* - x̄)²/Σ(x - x̄)²]

The "1 +" makes the prediction SE larger, thus the interval is wider.

Intuition: It's easier to predict an average than to predict an individual. Individuals are unpredictable!

Question 4: At which x value will the confidence interval for mean response be narrowest?

Answer: At x* = x̄ (the mean of the x values).

Reason: The standard error formula includes the term (x* - x̄)². When x* = x̄, this term equals 0, minimizing the standard error.

SE(ŷmean) = sₑ√[1/n + (x* - x̄)²/Σ(x - x̄)²]

As x* moves farther from x̄ in either direction, (x* - x̄)² increases, so SE increases, and the interval widens.

Practical implication: We have the most precision when predicting near the center of our data. Predictions far from x̄ are less reliable (and extrapolation is even worse!).

Lesson 4 Summary

Three Types of Intervals in Regression

Interval Type What It Estimates Formula When to Use
CI for Slope True slope β₁ b₁ ± t* × SE(b₁) To estimate the rate of change
CI for Mean Response Average y at x* ŷ ± t* × sₑ√[1/n + (x*-x̄)²/Σ(x-x̄)²] To estimate population mean at x*
Prediction Interval Individual y at x* ŷ ± t* × sₑ√[1 + 1/n + (x*-x̄)²/Σ(x-x̄)²] To predict a specific individual

Key Takeaway: Prediction intervals are always wider than confidence intervals because individual values vary more than averages.

← Lesson 3: Hypothesis Testing Next: Practice Problems →