Practice Problems
Module 10: Analysis of Variance (ANOVA) - 20 Problems
Part 1: Conceptual Understanding (5 problems)
Test your understanding of ANOVA concepts and principles.
Multiple Comparisons and α Inflation
A researcher wants to compare mean test scores across 6 different teaching methods using α = 0.05.
a) How many pairwise comparisons would be needed if using multiple t-tests?
b) What would be the approximate overall Type I error rate if conducting these t-tests?
c) Explain why ANOVA is preferable in this situation.
Solution:
a) Number of comparisons:
b) Overall Type I error rate:
With 15 tests, there's more than a 50% chance of making at least one Type I error!
c) Why ANOVA is better:
- ANOVA performs a single test, maintaining α = 0.05
- Controls the family-wise error rate
- Avoids the multiple comparisons problem
- More powerful and statistically sound approach
Interpreting the F-Statistic
Explain what each of the following F-statistics tells you about the data:
a) F = 0.8
b) F = 1.2
c) F = 15.6
Solution:
a) F = 0.8:
- Between-group variance is LESS than within-group variance
- Group means are very similar relative to natural variation
- Very little evidence of group differences
- Will definitely fail to reject H₀
b) F = 1.2:
- Between-group variance is slightly larger than within-group variance
- Weak evidence of group differences
- Likely not statistically significant
- Probably fail to reject H₀
c) F = 15.6:
- Between-group variance is MUCH larger than within-group variance
- Group means differ substantially
- Strong evidence that at least one group differs
- Very likely to reject H₀ (highly significant)
ANOVA vs Other Tests
For each scenario, identify the appropriate statistical test and explain why:
a) Comparing mean salaries of employees in two different departments
b) Comparing mean GPAs across four different majors
c) Determining if gender and political party preference are related
d) Comparing median incomes across 5 cities when data is highly skewed
Solution:
a) Two-sample t-test
- Only 2 groups (two departments)
- Comparing means
- Quantitative variable (salary)
b) One-way ANOVA
- More than 2 groups (four majors)
- Comparing means
- One categorical factor (major)
- Quantitative variable (GPA)
c) Chi-square test of independence
- Both variables are categorical
- Testing for association/relationship
- Not comparing means
d) Kruskal-Wallis test
- More than 2 groups (5 cities)
- Highly skewed data (normality violated)
- Non-parametric alternative to ANOVA
- Compares medians instead of means
Understanding ANOVA Results
An ANOVA comparing customer satisfaction scores across 4 store locations yields F(3, 76) = 5.8, p = 0.002.
a) What are the degrees of freedom and what do they represent?
b) What conclusion can you draw at α = 0.05?
c) What does this result NOT tell you?
Solution:
a) Degrees of freedom:
- df₁ = 3 = dfbetween = k - 1, so k = 4 groups
- df₂ = 76 = dfwithin = N - k, so N = 80 total observations
- This means 4 locations with 80 total customers (average 20 per location)
b) Conclusion:
- p = 0.002 < 0.05, so reject H₀
- At the 0.05 significance level, there is sufficient evidence to conclude that at least one store location has a different mean customer satisfaction score than the others
c) What this does NOT tell us:
- Which specific locations differ from each other
- Which location has the highest/lowest satisfaction
- How many locations differ significantly
- Need post-hoc tests to answer these questions!
Between vs Within Variation
Explain in your own words what "between-group variation" and "within-group variation" mean in the context of ANOVA. Why do we compare these two types of variation?
Solution:
Between-group variation:
- Measures how much the group means differ from each other
- Reflects the effect of the factor/treatment
- If groups truly differ, this should be large
- Example: Differences in average test scores across different teaching methods
Within-group variation:
- Measures how much individuals vary within each group
- Represents natural variability/"noise" in the data
- Exists regardless of group membership
- Example: Students in the same class have different scores due to individual differences
Why we compare them:
- If between-group variation is much larger than within-group variation, it suggests real group differences (not just random chance)
- The F-statistic is this ratio: F = MSB/MSW
- Large F → signal (group differences) >> noise (random variation) → significant result
- Small F → signal ≈ noise → no significant difference
Part 2: ANOVA Calculations (5 problems)
Practice calculating sum of squares, degrees of freedom, mean squares, and F-statistics.
Degrees of Freedom
Calculate the degrees of freedom for each ANOVA scenario:
a) 3 groups with 12 observations each
b) 5 groups with sample sizes: 8, 10, 12, 10, 10
c) 4 groups with total N = 60
Solution:
a) k = 3, N = 3 × 12 = 36
b) k = 5, N = 8 + 10 + 12 + 10 + 10 = 50
c) k = 4, N = 60
Completing an ANOVA Table
Complete the missing values in this ANOVA table:
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between Groups | 240 | ? | ? | ? |
| Within Groups | 360 | 27 | ? | |
| Total | ? | ? |
Given: k = 4 groups
Solution:
Step 1: Find dfbetween
Step 2: Find N and dftotal
Step 3: Calculate SST
Step 4: Calculate mean squares
Step 5: Calculate F
Complete table:
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | 240 | 3 | 80 | 6.00 |
| Within | 360 | 27 | 13.33 | |
| Total | 600 | 30 |
Simple ANOVA Calculation
Three groups have the following data:
| Group A | Group B | Group C |
|---|---|---|
| 6 | 8 | 12 |
| 8 | 10 | 14 |
| 10 | 12 | 16 |
a) Calculate the group means and grand mean
b) Calculate SSB (between-group sum of squares)
c) Calculate SSW (within-group sum of squares)
Solution:
a) Calculate means:
b) Calculate SSB:
c) Calculate SSW:
Group A:
Group B:
Group C:
Mean Squares and F-Statistic
Using the results from Problem 8:
SSB = 56, SSW = 24, k = 3, N = 9
a) Calculate MSB and MSW
b) Calculate the F-statistic
c) If the critical value at α = 0.05 is Fcrit = 5.14, what is your decision?
Solution:
a) Calculate mean squares:
First find degrees of freedom:
Then calculate MS:
b) Calculate F:
c) Decision:
Conclusion: At the 0.05 significance level, there is sufficient evidence to conclude that at least one group mean is different from the others.
Relationship Between SS, MS, and df
An ANOVA has SSB = 180, MSW = 5, dfbetween = 3, and dftotal = 39.
a) Find MSB
b) Find dfwithin and N
c) Find SSW
d) Find SST and verify SST = SSB + SSW
Solution:
a) Find MSB:
b) Find dfwithin and N:
c) Find SSW:
d) Find SST and verify:
Verification (alternatively: SST = MST × dftotal, but MST isn't typically calculated)
Part 3: Post-Hoc Tests (5 problems)
Practice with Tukey's HSD and Bonferroni correction.
Tukey's HSD Calculation
Given: k = 3 groups, n = 6 per group, MSW = 8, q = 3.67 (from table)
Group means: x̄₁ = 45, x̄₂ = 50, x̄₃ = 53
a) Calculate Tukey's HSD
b) Determine which pairs of groups differ significantly
Solution:
a) Calculate HSD:
b) Compare all pairs:
| Comparison | |Difference| | vs HSD = 4.24 | Significant? |
|---|---|---|---|
| Group 1 vs 2 | |45 - 50| = 5 | 5 > 4.24 | Yes |
| Group 1 vs 3 | |45 - 53| = 8 | 8 > 4.24 | Yes |
| Group 2 vs 3 | |50 - 53| = 3 | 3 < 4.24 | No |
Conclusion: Group 1 differs significantly from both Groups 2 and 3, but Groups 2 and 3 do not differ significantly from each other.
Bonferroni Correction
A researcher is comparing mean reaction times across 5 different conditions using α = 0.05.
a) How many pairwise comparisons are there?
b) What is the Bonferroni-adjusted α for each comparison?
c) If a comparison has p = 0.02, is it significant using the Bonferroni correction?
Solution:
a) Number of comparisons:
b) Bonferroni-adjusted α:
c) Is p = 0.02 significant?
Explanation: While 0.02 < 0.05 (would be significant without correction), it's not less than the Bonferroni-adjusted 0.005, so we fail to reject H₀ for this comparison. This is why Bonferroni is considered conservative—it makes it harder to find significance.
When to Conduct Post-Hoc Tests
For each ANOVA result, indicate whether post-hoc tests should be conducted and explain why:
a) F(2, 45) = 1.8, p = 0.18
b) F(3, 96) = 6.2, p = 0.001
c) F(1, 38) = 12.5, p < 0.001
Solution:
a) NO, do not conduct post-hoc tests
- p = 0.18 > 0.05, so ANOVA is not significant
- Failed to reject H₀
- No evidence of group differences
- Post-hoc tests only make sense after significant ANOVA
b) YES, conduct post-hoc tests
- p = 0.001 < 0.05, so ANOVA is significant
- df₁ = 3 means k = 4 groups (more than 2)
- Know at least one group differs, but not which ones
- Post-hoc tests will identify specific group differences
c) NO, do not conduct post-hoc tests
- Even though p < 0.001 (significant)
- df₁ = 1 means k = 2 groups only
- With only 2 groups, ANOVA is equivalent to a t-test
- Already know which groups differ (the only two!)
- Post-hoc tests are only needed with 3+ groups
Interpreting Post-Hoc Results
After a significant ANOVA comparing 4 brands of batteries, Tukey's HSD = 3.5 hours. The mean lifespans are:
- Brand A: 48 hours
- Brand B: 52 hours
- Brand C: 45 hours
- Brand D: 51 hours
Create a summary table showing which brands differ significantly and interpret the results.
Solution:
Pairwise comparisons:
| Comparison | Difference | vs HSD | Significant? |
|---|---|---|---|
| A vs B | |48-52| = 4 | 4 > 3.5 | Yes |
| A vs C | |48-45| = 3 | 3 < 3.5 | No |
| A vs D | |48-51| = 3 | 3 < 3.5 | No |
| B vs C | |52-45| = 7 | 7 > 3.5 | Yes |
| B vs D | |52-51| = 1 | 1 < 3.5 | No |
| C vs D | |45-51| = 6 | 6 > 3.5 | Yes |
Interpretation:
- Brand C has significantly shorter lifespan than Brands B and D
- Brand B has significantly longer lifespan than Brands A and C
- Brands B and D perform similarly (no significant difference)
- Brands A and D perform similarly
- Brands A and C perform similarly
Ranking (from longest to shortest): B ≈ D > A ≈ C
Recommendation: Brands B or D are best choices; avoid Brand C.
Tukey vs Bonferroni
You're comparing 4 fertilizers with unequal sample sizes: n₁ = 8, n₂ = 12, n₃ = 10, n₄ = 15.
a) Would you use Tukey's HSD or Bonferroni? Explain.
b) If using Bonferroni with α = 0.05, what α would you use for each comparison?
Solution:
a) Use Bonferroni
- Sample sizes are unequal (8, 12, 10, 15)
- Tukey's HSD assumes equal sample sizes
- Bonferroni works well with unequal sample sizes
- Alternatively, could use modified Tukey with harmonic mean, but Bonferroni is simpler
b) Bonferroni-adjusted α:
Number of comparisons:
Adjusted α:
Use α = 0.0083 (or approximately 0.008) for each of the 6 pairwise t-tests.
Part 4: Assumptions and Applications (5 problems)
Practice checking ANOVA assumptions and choosing appropriate tests.
Checking Equal Variance Assumption
Five groups have the following sample variances:
- Group 1: s₁² = 14.2
- Group 2: s₂² = 18.5
- Group 3: s₃² = 12.8
- Group 4: s₄² = 16.1
- Group 5: s₅² = 20.3
Is the equal variance assumption reasonable? Show your work.
Solution:
Step 1: Identify max and min variances
Step 2: Calculate ratio
Step 3: Apply rule of thumb
Conclusion: Yes, the equal variance assumption is reasonable. The ratio of 1.59 is less than 2, indicating that the variances are similar enough for ANOVA.
Assumption Violations
For each scenario, identify which ANOVA assumption(s) might be violated and suggest an alternative approach:
a) Comparing stress levels of 30 students at three time points: beginning, middle, and end of semester
b) Comparing median home prices across 4 neighborhoods with highly right-skewed data (n = 15 per neighborhood)
c) Comparing test scores of students within the same classroom (groups assigned by classroom)
Solution:
a) Violation: Independence
- Same 30 students measured three times (repeated measures)
- Observations are NOT independent
- Alternative: Use Repeated Measures ANOVA or Friedman test (non-parametric)
b) Violation: Normality
- Highly right-skewed data violates normality assumption
- Small sample sizes (n = 15) means CLT may not apply
- Alternatives:
- Use Kruskal-Wallis test (compares medians, no normality assumption)
- Log-transform the data to reduce skewness, then use ANOVA
c) Violation: Independence
- Students within same classroom may not be independent (clustered data)
- Classroom effects could influence results
- Alternatives:
- Use multilevel/hierarchical models to account for classroom nesting
- Randomly sample students from each classroom (not all students)
- Consider classroom as a random effect
Choosing the Right Test
Determine the most appropriate statistical test for each research question:
a) Do men and women differ in average commute time?
b) Is there a difference in mean satisfaction scores across 5 different smartphone brands?
c) Is political party preference independent of education level?
d) Do median salaries differ across 3 job sectors when data is extremely skewed?
Solution:
a) Two-sample t-test (or independent samples t-test)
- 2 groups (men vs women)
- Comparing means
- Quantitative variable (commute time)
b) One-way ANOVA
- More than 2 groups (5 brands)
- Comparing means
- One categorical factor (brand)
- Quantitative variable (satisfaction score)
c) Chi-square test of independence
- Two categorical variables (party preference and education level)
- Testing for association/relationship
- Not comparing means
d) Kruskal-Wallis test
- More than 2 groups (3 sectors)
- Comparing medians (not means due to extreme skew)
- Normality assumption violated
- Non-parametric alternative to ANOVA
Sample Size and Robustness
A researcher has three groups with sample sizes n₁ = 45, n₂ = 50, n₃ = 48. The data shows moderate skewness and slightly unequal variances (ratio = 2.3).
a) Can the researcher reasonably use ANOVA? Explain why or why not.
b) What features of this design make ANOVA robust to assumption violations?
Solution:
a) Yes, ANOVA is reasonable
- Large sample sizes: All groups have n ≥ 30, so Central Limit Theorem applies
- Moderate skewness: With large n, ANOVA is robust to moderate departures from normality
- Equal sample sizes: Groups are nearly equal (45, 50, 48), making ANOVA robust to unequal variances
- Variance ratio: While 2.3 is slightly above the ideal < 2 threshold, it's not severely violated, and equal sample sizes compensate
b) Features promoting robustness:
- Large samples (n ≥ 30): Sampling distribution of means is approximately normal regardless of population shape
- Balanced design: Similar sample sizes across groups reduce impact of variance inequality
- Total sample size (N = 143): Large overall N increases power and stability
Note: If concerned, could also:
- Check with Levene's test for formal variance equality test
- Use Welch's ANOVA as a sensitivity check
- Examine residual plots to assess normality
Complete ANOVA Analysis
A nutritionist studies the effect of three different diets on weight loss. She randomly assigns 36 participants to three diet groups (12 per group) and measures weight loss (pounds) after 8 weeks:
| Diet A | Diet B | Diet C |
|---|---|---|
| x̄₁ = 8.2 | x̄₂ = 12.5 | x̄₃ = 10.8 |
| s₁² = 4.1 | s₂² = 5.2 | s₃² = 4.8 |
ANOVA results: F(2, 33) = 8.45, p = 0.001
Tukey's HSD = 2.8 pounds
a) Check the equal variance assumption
b) State the conclusion from ANOVA (α = 0.05)
c) Use Tukey's HSD to determine which diets differ
d) Write a complete interpretation of the results
Solution:
a) Check equal variance:
Equal variance assumption is reasonable.
b) ANOVA conclusion:
- p = 0.001 < 0.05 → Reject H₀
- At the 0.05 significance level, there is sufficient evidence to conclude that at least one diet produces a different mean weight loss than the others
c) Tukey's HSD comparisons:
| Comparison | Difference | vs HSD = 2.8 | Significant? |
|---|---|---|---|
| Diet A vs B | |8.2 - 12.5| = 4.3 | 4.3 > 2.8 | Yes |
| Diet A vs C | |8.2 - 10.8| = 2.6 | 2.6 < 2.8 | No |
| Diet B vs C | |12.5 - 10.8| = 1.7 | 1.7 < 2.8 | No |
d) Complete interpretation:
A one-way ANOVA was conducted to compare the effectiveness of three diets on weight loss over an 8-week period. Thirty-six participants were randomly assigned to one of three diet groups (n = 12 per group). The equal variance assumption was met (variance ratio = 1.27).
The ANOVA revealed a statistically significant difference in mean weight loss among the three diets, F(2, 33) = 8.45, p = 0.001. Post-hoc comparisons using Tukey's HSD indicated that Diet B (M = 12.5 lbs) produced significantly more weight loss than Diet A (M = 8.2 lbs), with a difference of 4.3 pounds. However, Diet C (M = 10.8 lbs) was not significantly different from either Diet A or Diet B.
Practical conclusion: Diet B appears to be the most effective for weight loss, producing an average of 4.3 more pounds of weight loss than Diet A over 8 weeks. Diet C shows intermediate results that are not statistically distinguishable from the other two diets.