Learn Without Walls

Lesson 3: Hypothesis Testing in Regression

Testing for significant linear relationships and checking regression conditions

Home > Intro Stats > Module 11 > Lesson 3

Learning Objectives

By the end of this lesson, you will be able to:

1. Introduction to Regression Inference

In Lessons 1-2, we calculated sample statistics (r, b₀, b₁, r²) to describe the relationship in our specific dataset. But we often want to make inferences about the population:

To answer these questions, we use hypothesis testing!

Sample vs Population Parameters

Parameter Population (Unknown) Sample (Known)
Correlation ρ (rho) r
Slope β₁ (beta-one) b₁
Intercept β₀ (beta-zero) b₀

We use the sample statistics (r, b₁) to make inferences about the population parameters (ρ, β₁).

2. Hypothesis Test for Correlation (ρ)

To test whether there's a significant linear relationship in the population, we test whether the population correlation ρ equals zero.

Hypothesis Test for Correlation

Hypotheses:

  • H₀: ρ = 0 (No linear relationship in the population)
  • Hₐ: ρ ≠ 0 (There is a linear relationship in the population)

Note: We can also use one-sided tests (Hₐ: ρ > 0 or Hₐ: ρ < 0) if we have prior reason to expect a specific direction.

Test Statistic for Correlation

t = r√[(n-2)/(1-r²)]

Degrees of freedom: df = n - 2

Where:

  • r = sample correlation coefficient
  • n = sample size

Decision:

  • Compare t to critical value from t-distribution with df = n - 2, OR
  • Use p-value: If p-value < α, reject H₀

Complete Example: Testing Correlation

Research Question: Is there a significant linear relationship between study hours and test scores?

Data: n = 20 students, r = 0.65

Significance level: α = 0.05

Step 1: State Hypotheses

  • H₀: ρ = 0 (No linear relationship between study hours and test scores)
  • Hₐ: ρ ≠ 0 (There is a linear relationship between study hours and test scores)

Step 2: Check Conditions (see Section 4 below)

Assume conditions are met for this example.

Step 3: Calculate Test Statistic

t = r√[(n-2)/(1-r²)]

t = 0.65√[(20-2)/(1-0.65²)]

t = 0.65√[18/(1-0.4225)]

t = 0.65√[18/0.5775]

t = 0.65√31.17

t = 0.65 × 5.58

t ≈ 3.63

Degrees of freedom: df = 20 - 2 = 18

Step 4: Find P-value

Using t-distribution table or technology with df = 18, two-tailed test:

p-value ≈ 0.002

Step 5: Make Decision

Since p-value (0.002) < α (0.05), we reject H₀.

Step 6: State Conclusion

There is sufficient evidence to conclude that there is a significant linear relationship between study hours and test scores (p = 0.002).

In other words, the correlation we observed (r = 0.65) is statistically significant and unlikely to be due to chance alone.

3. Hypothesis Test for Slope (β₁)

An equivalent way to test for a linear relationship is to test whether the population slope β₁ equals zero.

Hypothesis Test for Slope

Hypotheses:

  • H₀: β₁ = 0 (Slope is zero; no linear relationship)
  • Hₐ: β₁ ≠ 0 (Slope is not zero; there is a linear relationship)

Test Statistic for Slope

t = (b₁ - 0) / SE(b₁)

Where:

SE(b₁) = sₑ / √[Σ(x - x̄)²]
  • b₁ = sample slope
  • sₑ = standard error of estimate
  • Σ(x - x̄)² = sum of squared deviations of x

Degrees of freedom: df = n - 2

Key Insight: Equivalent Tests

Testing ρ = 0 is EXACTLY THE SAME as testing β₁ = 0!

  • Same t-value (may look different due to rounding)
  • Same p-value
  • Same conclusion

Why? If there's no correlation (ρ = 0), the slope must be zero (β₁ = 0), and vice versa. These are two ways of asking the same question: "Is there a linear relationship?"

Example: Testing Slope

Scenario: Predicting house price (in $1000s) from square footage

Data:

  • n = 25 houses
  • Regression equation: ŷ = 50 + 0.12x (where x = square feet)
  • b₁ = 0.12 (thousand $/sq ft)
  • SE(b₁) = 0.035

Test at α = 0.05:

Hypotheses

  • H₀: β₁ = 0 (Square footage has no effect on price)
  • Hₐ: β₁ ≠ 0 (Square footage affects price)

Test Statistic

t = (b₁ - 0) / SE(b₁) = (0.12 - 0) / 0.035 = 3.43

df = 25 - 2 = 23

P-value and Decision

Using t-table with df = 23: p-value ≈ 0.002

Since p < 0.05, reject H₀

Conclusion

There is sufficient evidence to conclude that square footage is a significant predictor of house price (t = 3.43, p = 0.002).

The positive slope (b₁ = 0.12) indicates that larger houses tend to have higher prices.

4. Conditions for Regression Inference (LINE)

Before we can trust the results of regression inference (hypothesis tests or confidence intervals), we must check that certain conditions are met. Remember the acronym LINE:

LINE Conditions for Regression Inference

  • L - Linearity: The relationship between x and y is linear

    Check: Scatter plot shows a roughly linear pattern (not curved)

  • I - Independence: Observations are independent of each other

    Check: Based on data collection method (random sampling, no time series)

  • N - Normality: The residuals (errors) are approximately normally distributed

    Check: Histogram or normal probability plot of residuals looks roughly bell-shaped

    Note: This condition is less critical for large samples (n ≥ 30) due to Central Limit Theorem

  • E - Equal variance (Homoscedasticity): The spread of residuals is constant across all x values

    Check: Residual plot shows random scatter with constant vertical spread (no "fan shape")

Checking Conditions with Residual Plots

The most important diagnostic tool is the residual plot: a scatter plot of residuals vs x (or vs ŷ).

Good vs Bad Residual Plots

Pattern in Residual Plot Indication Condition Violated
Random scatter around zero with constant spread Good! Conditions met None
Curved pattern (U-shape, etc.) Non-linear relationship Linearity
Fan shape (spread increases) Non-constant variance Equal Variance
Outliers or extreme points May affect regression; investigate Potential issue

What If Conditions Are Violated?

If LINE conditions are not met, the hypothesis test results may be unreliable. Options include:

  • Transform variables: Try log(x), √x, x², etc. to linearize the relationship
  • Remove outliers: If there are clear data errors or influential outliers (but be careful - don't remove data just because it doesn't fit!)
  • Use non-linear regression: If the relationship is genuinely curved
  • Collect more data: Larger samples are more robust to condition violations
  • Be cautious: Interpret results with appropriate skepticism

Check Your Understanding

Question 1: A researcher finds r = 0.25 with n = 50. Calculate the test statistic for testing H₀: ρ = 0.

Solution:

t = r√[(n-2)/(1-r²)]

t = 0.25√[(50-2)/(1-0.25²)]

t = 0.25√[48/0.9375]

t = 0.25√51.2

t = 0.25 × 7.16

t ≈ 1.79

df = 50 - 2 = 48

Critical value (α = 0.05, two-tailed, df = 48): approximately ±2.01

Since |1.79| < 2.01, we fail to reject H₀. There is insufficient evidence of a significant linear relationship at α = 0.05.

Question 2: Why do we test β₁ = 0 instead of, say, β₁ = 1?

Answer: We test β₁ = 0 because:

  • If β₁ = 0, there is NO relationship between x and y. The regression line would be horizontal (flat), meaning changes in x don't affect y.
  • If β₁ ≠ 0, there IS a relationship - positive if β₁ > 0, negative if β₁ < 0.
  • Testing β₁ = 0 answers the fundamental question: "Is x a useful predictor of y?"

We could test β₁ = some other value if we had a specific theory (e.g., H₀: β₁ = 1 if we think the relationship is one-to-one), but usually we start by testing whether there's any relationship at all.

Question 3: A residual plot shows a clear U-shaped curve. What does this tell you?

Answer: A U-shaped curve in the residual plot indicates that the linearity condition is violated.

Interpretation:

  • The relationship between x and y is not linear - it's curved (quadratic or non-linear)
  • A straight line is not the best model for this data
  • Predictions from the linear regression will be systematically too high or too low depending on the x value

Solution: Consider using a curved model (quadratic regression, polynomial regression) or transforming variables to linearize the relationship.

Question 4: If you reject H₀: ρ = 0, what will happen when you test H₀: β₁ = 0 with the same data?

Answer: You will also reject H₀: β₁ = 0.

Reason: Testing ρ = 0 and testing β₁ = 0 are equivalent tests. They:

  • Yield the same (or nearly the same) t-statistic
  • Yield the same p-value
  • Lead to the same conclusion

Both tests answer the same question: "Is there a significant linear relationship?" If one test rejects the null, the other will too.

← Lesson 2: Regression Equation Next: Lesson 4 - Intervals →