Lesson 3: Post-Hoc Tests
Identifying Which Groups Differ After a Significant ANOVA
Why Do We Need Post-Hoc Tests?
The Limitation of ANOVA
When ANOVA gives a significant result, we know:
"At least one population mean is different from the others."
But ANOVA does NOT tell us:
- Which specific groups differ?
- How many groups differ?
- Which group has the highest/lowest mean?
Purpose of Post-Hoc Tests
Post-hoc tests (also called multiple comparison tests) perform pairwise comparisons to identify which specific groups differ significantly.
Important: We only conduct post-hoc tests AFTER finding a significant F-statistic in ANOVA!
Example Scenario
Suppose ANOVA comparing 4 teaching methods gives F = 8.2, p < 0.01 (significant).
We know: At least one method differs.
We DON'T know:
- Is Method 1 different from Method 2?
- Is Method 3 different from Method 4?
- Which method is best?
Solution: Use post-hoc tests to answer these questions!
Tukey's HSD (Honestly Significant Difference)
Tukey's HSD is the most commonly used post-hoc test. It controls the family-wise error rate while comparing all possible pairs of groups.
The Tukey HSD Formula
Where:
- q = critical value from the Studentized Range Distribution (q-table)
- MSW = Mean Square Within from ANOVA table
- n = sample size per group (assumes equal sample sizes)
Decision Rule
For any two groups i and j:
Finding the q-value
The q-value depends on:
- k = number of groups
- dfwithin = N - k (from ANOVA)
- α = significance level (usually 0.05)
You look up q in a Studentized Range table or use statistical software.
Example: Tukey HSD for k = 3, df = 12, α = 0.05
From Studentized Range table: q = 3.77
(This is an approximate value; exact tables available in textbooks or online)
Complete Example: Tukey's HSD
Continuing from Lesson 2 Example
Recall our three study methods with:
- Method 1 (Traditional): x̄₁ = 80
- Method 2 (Flashcards): x̄₂ = 86
- Method 3 (Practice Tests): x̄₃ = 92
ANOVA results: F = 31.75, p < 0.001 (significant)
From ANOVA: MSW = 5.67, n = 5 per group, k = 3, dfwithin = 12
1Find Critical q-Value
From Studentized Range table with k = 3, df = 12, α = 0.05:
2Calculate HSD
3Compare All Pairs
Comparison 1: Method 1 vs Method 2
Comparison 2: Method 1 vs Method 3
Comparison 3: Method 2 vs Method 3
4Summarize Results
| Comparison | Difference in Means | HSD = 4.01 | Conclusion |
|---|---|---|---|
| Method 1 vs 2 | 6 | 6 > 4.01 | Significant |
| Method 1 vs 3 | 12 | 12 > 4.01 | Significant |
| Method 2 vs 3 | 6 | 6 > 4.01 | Significant |
Conclusion: All three study methods produce significantly different mean exam scores. Method 3 (Practice Tests) is superior, followed by Method 2 (Flashcards), then Method 1 (Traditional).
Bonferroni Correction
The Bonferroni correction is another method for controlling Type I error in multiple comparisons. It's more conservative (stricter) than Tukey's HSD.
The Bonferroni Method
Instead of calculating a single HSD value, Bonferroni adjusts the significance level (α) for each comparison:
Where:
- α = original significance level (e.g., 0.05)
- c = number of pairwise comparisons = k(k-1)/2
Then: Perform each pairwise comparison using a two-sample t-test with the adjusted α.
Example: Bonferroni with k = 4 Groups
Number of comparisons: c = 4(3)/2 = 6
Original α = 0.05
For each of the 6 pairwise comparisons, we would use α = 0.0083 instead of 0.05.
This controls the overall family-wise error rate at 0.05.
Tukey vs Bonferroni: When to Use Which?
| Method | Strengths | Best Used When |
|---|---|---|
| Tukey's HSD |
• Most powerful when comparing all pairs • Easy to calculate and interpret • Controls family-wise error rate |
• Equal sample sizes • Want to compare ALL pairs • Standard choice for ANOVA |
| Bonferroni |
• Very simple to understand • Works with unequal sample sizes • Can use for specific comparisons |
• Unequal sample sizes • Only interested in few specific comparisons • Want to be extra conservative |
General Rule
Use Tukey's HSD as your default post-hoc test for ANOVA. It's the most commonly used and strikes a good balance between power and error control.
Use Bonferroni when you have unequal sample sizes or only want to test specific comparisons (not all pairs).
Other Post-Hoc Tests (Brief Overview)
Scheffé's Test
- Most conservative (hardest to find significance)
- Can compare not just pairs but any combination of groups
- Use when you want maximum protection against Type I error
Dunnett's Test
- Compares each treatment group to a control group
- Does NOT compare treatment groups to each other
- More powerful than Tukey when you only care about comparisons to control
- Common in medical research (comparing drugs to placebo)
When to Use Dunnett's
Testing 3 new drugs (A, B, C) against a placebo:
Dunnett's tests:
- Drug A vs Placebo
- Drug B vs Placebo
- Drug C vs Placebo
Does NOT test: Drug A vs Drug B, etc.
Important Guidelines for Post-Hoc Testing
When NOT to Do Post-Hoc Tests
- If ANOVA is not significant: If F-test fails to reject H₀, STOP. Don't do post-hoc tests.
- With only 2 groups: If k = 2, ANOVA is equivalent to a t-test. No post-hoc needed.
- Before running ANOVA: Always do the overall F-test first!
The Proper Sequence
- Run ANOVA to test if any differences exist
- If ANOVA is significant: Proceed to post-hoc tests
- Choose appropriate post-hoc test (usually Tukey's HSD)
- Identify which specific pairs differ
- Report findings with context and interpretation
Check Your Understanding
Question 1
An ANOVA comparing 4 fertilizers gives F = 2.1, p = 0.15. Should you conduct post-hoc tests? Why or why not?
Answer: NO, do not conduct post-hoc tests.
Reason: The ANOVA is NOT significant (p = 0.15 > 0.05). We failed to reject H₀, meaning we don't have evidence that any of the fertilizers differ. Post-hoc tests are only conducted AFTER finding a significant F-statistic.
Question 2
Given: k = 4 groups, MSW = 12, n = 8 per group, q = 3.96. Calculate Tukey's HSD.
Step 1: Use the formula HSD = q × √(MSW / n)
Answer: HSD = 4.85
Any two group means that differ by more than 4.85 are significantly different.
Question 3
Using the HSD from Question 2, determine if groups with x̄₁ = 50 and x̄₂ = 54 differ significantly.
Step 1: Calculate the difference
Step 2: Compare to HSD = 4.85
Answer: No significant difference.
The difference (4) is less than HSD (4.85), so we conclude that groups 1 and 2 do not differ significantly.
Question 4
If you're comparing 5 groups, how many pairwise comparisons are there? What would the Bonferroni-adjusted α be if the original α = 0.05?
Step 1: Calculate number of comparisons
Step 2: Calculate adjusted α
Answers:
- 10 pairwise comparisons
- Bonferroni α = 0.005 (or 0.5%) for each test
Lesson Summary
- Post-hoc tests identify which specific groups differ after significant ANOVA
- Only conduct post-hoc tests if ANOVA F-test is significant!
- Tukey's HSD: Most common post-hoc test
- Formula: HSD = q × √(MSW / n)
- If |x̄ᵢ - x̄ⱼ| > HSD, groups differ significantly
- Bonferroni correction: Adjusts α for multiple comparisons
- αadjusted = α / c
- More conservative than Tukey
- Other tests: Scheffé (most conservative), Dunnett (control comparison)
- Post-hoc tests control family-wise error rate across multiple comparisons