Lesson 2: One-Way ANOVA Procedure
Step-by-Step Calculations and the ANOVA Table
What is One-Way ANOVA?
One-way ANOVA is used when we have:
- One categorical independent variable (called a factor) with 3+ levels (groups)
- One quantitative dependent variable (the measurement we're comparing)
Example
Factor: Teaching method (Traditional, Flipped, Project-based)
Dependent variable: Final exam score
This is "one-way" because we're looking at the effect of ONE factor (teaching method).
Note: There's also two-way ANOVA (two factors) and other variations, but we'll focus on one-way ANOVA in this course.
Understanding Sum of Squares
ANOVA breaks down the total variation in the data into two components:
1. Total Sum of Squares (SST)
Measures the total variation of all observations from the grand mean.
Where:
- xij = individual observation (i-th observation in j-th group)
- x̄ = grand mean (mean of ALL observations)
2. Between-Group Sum of Squares (SSB)
Measures the variation between group means.
Where:
- nj = sample size of group j
- x̄j = mean of group j
- x̄ = grand mean
3. Within-Group Sum of Squares (SSW)
Measures the variation within each group (error/residual).
Where:
- xij = individual observation in group j
- x̄j = mean of group j
Fundamental Relationship
Total variation = Variation between groups + Variation within groups
This is the key partitioning that makes ANOVA work!
Degrees of Freedom
Each sum of squares has associated degrees of freedom (df):
Where k = number of groups
Where N = total sample size (all observations)
Relationship
(N - 1) = (k - 1) + (N - k)
Example
If we have 4 groups with 10 observations each:
- k = 4 groups
- N = 40 total observations
- dfbetween = 4 - 1 = 3
- dfwithin = 40 - 4 = 36
- dftotal = 40 - 1 = 39
Mean Squares
Mean squares are the average squared deviations. We calculate them by dividing each sum of squares by its degrees of freedom.
Mean Square Between (MSB)
This estimates the variance between groups.
Mean Square Within (MSW)
This estimates the variance within groups (pooled error variance).
Why Mean Squares?
We divide by degrees of freedom to get an average measure of variation that's comparable between different sample sizes. Mean squares are estimates of variance.
The F-Statistic
Finally, we calculate the F-statistic by comparing the two mean squares:
Interpretation
- If MSB >> MSW (F is large): Between-group differences are large compared to within-group variation → groups likely differ
- If MSB ≈ MSW (F ≈ 1): Between-group differences are similar to within-group variation → groups likely don't differ
The F-statistic follows an F-distribution with:
- df₁ = dfbetween (numerator degrees of freedom)
- df₂ = dfwithin (denominator degrees of freedom)
The ANOVA Table
We organize all ANOVA calculations in a standard table format:
| Source of Variation | Sum of Squares (SS) | Degrees of Freedom (df) | Mean Square (MS) | F-statistic |
|---|---|---|---|---|
| Between Groups | SSB | k - 1 | MSB = SSB/(k-1) | F = MSB/MSW |
| Within Groups | SSW | N - k | MSW = SSW/(N-k) | — |
| Total | SST | N - 1 | — | — |
Complete Step-by-Step Example
Scenario
A professor wants to compare the effectiveness of three study methods on exam performance. She randomly assigns 15 students to three groups (5 students per group) and records their exam scores:
| Method 1 (Traditional) | Method 2 (Flashcards) | Method 3 (Practice Tests) |
|---|---|---|
| 78 | 85 | 92 |
| 82 | 88 | 95 |
| 76 | 84 | 90 |
| 80 | 86 | 93 |
| 84 | 87 | 90 |
Test at α = 0.05: Do the study methods produce different mean exam scores?
1State the Hypotheses
- H₀: μ₁ = μ₂ = μ₃ (all three methods have equal mean scores)
- Hₐ: At least one method has a different mean score
2Calculate Group Means and Grand Mean
3Calculate Sum of Squares Between (SSB)
SSB = Σnj(x̄j - x̄)²
4Calculate Sum of Squares Within (SSW)
SSW = Σ(xij - x̄j)² for all groups
Method 1:
Method 2:
Method 3:
5Calculate Total Sum of Squares (SST)
We can verify: SST = SSB + SSW
6Calculate Degrees of Freedom
7Calculate Mean Squares
8Calculate F-Statistic
9Complete ANOVA Table
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between Groups | 360 | 2 | 180 | 31.75 |
| Within Groups | 68 | 12 | 5.67 | — |
| Total | 428 | 14 | — | — |
10Make Decision
Using an F-table with df₁ = 2, df₂ = 12, and α = 0.05:
Critical value: Fcritical = 3.89
Alternatively, using technology: p-value < 0.001
Since p-value < 0.05, we reject H₀.
11State Conclusion
Conclusion: At the 0.05 significance level, there is sufficient evidence to conclude that at least one study method produces a different mean exam score.
Note: We don't yet know WHICH methods differ. We'll learn about post-hoc tests in Lesson 3!
Check Your Understanding
Question 1
If you have 4 groups with sample sizes n₁ = 8, n₂ = 10, n₃ = 12, n₄ = 10, what are the degrees of freedom?
- k = 4 groups
- N = 8 + 10 + 12 + 10 = 40 total observations
- dfbetween = k - 1 = 4 - 1 = 3
- dfwithin = N - k = 40 - 4 = 36
- dftotal = N - 1 = 40 - 1 = 39
Question 2
Given SSB = 240, SSW = 160, dfbetween = 3, dfwithin = 36, calculate the F-statistic.
Step 1: Calculate MSB
MSB = SSB / dfbetween = 240 / 3 = 80
Step 2: Calculate MSW
MSW = SSW / dfwithin = 160 / 36 = 4.44
Step 3: Calculate F
F = MSB / MSW = 80 / 4.44 = 18.02
This is a large F-value, suggesting strong evidence of differences among groups!
Question 3
Why do we divide sum of squares by degrees of freedom to get mean squares?
Answer: We divide by degrees of freedom to get an average measure of variation that accounts for sample size.
This makes mean squares (which are variance estimates) comparable even when groups have different sample sizes. Without this adjustment, larger samples would always have larger sums of squares simply due to having more observations, even if the actual variability is the same.
Lesson Summary
- One-way ANOVA: One categorical factor, one quantitative dependent variable
- Sum of Squares:
- SST = total variation
- SSB = between-group variation
- SSW = within-group variation (error)
- SST = SSB + SSW
- Degrees of Freedom: dfbetween = k-1, dfwithin = N-k
- Mean Squares: MS = SS / df (average variation)
- F-statistic: F = MSB / MSW
- ANOVA table organizes all calculations systematically
- Large F-values (relative to critical value) lead to rejecting H₀