Save or print this lesson:

Lesson 4: Assumptions and Conditions for ANOVA

Checking Requirements and Handling Violations

Why Assumptions Matter

Like all statistical tests, ANOVA has assumptions that must be reasonably satisfied for the results to be valid and trustworthy.

Important

If assumptions are seriously violated, the F-statistic may not follow an F-distribution, leading to:

Incorrect p-values
Wrong conclusions
Increased Type I or Type II error rates

The Three Key Assumptions

Independence of observations
Normality of populations
Equal Variances (Homogeneity of Variance)

Let's examine each in detail!

Assumption 1: Independence

Requirement

Observations must be independent within and between groups.

This means:

The value of one observation doesn't influence another
Each observation comes from a different, randomly selected subject
No repeated measures (same subject measured multiple times)

How to Check Independence

Study design: Were subjects randomly assigned to groups?
Random sampling: Were subjects randomly selected from populations?
No clustering: Are observations collected independently (not in batches or groups)?
One observation per subject: Is each subject measured only once?

Independence is Met When:

Experimental design with random assignment to groups
Random sample from each population
Each subject appears in only one group
Measurements collected independently

Independence is Violated When:

Same subjects measured multiple times (use repeated measures ANOVA instead)
Matched pairs or dependent samples
Clustered data (students within classrooms, patients within hospitals)
Time series data with autocorrelation

If violated: Results are invalid. Must use different methods (e.g., repeated measures ANOVA, mixed models).

Assumption 2: Normality

Requirement

Each population is normally distributed.

More precisely: The residuals (deviations from group means) should be normally distributed.

How to Check Normality

Visual Methods:
- Histograms: Check if data appears bell-shaped
- Normal probability plots (Q-Q plots): Points should fall roughly on a straight line
- Boxplots: Check for symmetry and outliers
Formal Tests:
- Shapiro-Wilk test: Tests if data comes from normal distribution
- H₀: Data is normally distributed
- If p > 0.05: normality assumption is reasonable

Good News: ANOVA is Robust to Moderate Violations!

ANOVA is fairly robust to violations of normality, especially when:

Large sample sizes: n ≥ 30 per group (Central Limit Theorem applies)
Equal sample sizes: Groups have similar n values
Mild departures: Data is slightly skewed but not severely non-normal

Rule of thumb: With n ≥ 30 per group, moderate departures from normality are acceptable.

What to Do if Normality is Violated

Check for outliers: Remove or investigate extreme values
Transform the data: Log, square root, or other transformations may help
Increase sample size: Larger samples make ANOVA more robust
Use non-parametric alternative: Kruskal-Wallis test (doesn't assume normality)

Assumption 3: Equal Variances (Homogeneity of Variance)

Requirement

All populations have equal variances:

σ₁² = σ₂² = σ₃² = ... = σₖ²

This is also called homoscedasticity or homogeneity of variance.

How to Check Equal Variances

1. Rule of Thumb (Quick Check)

Calculate the sample variances s₁², s₂², ..., sₖ² for each group.

Rule: If the ratio of largest to smallest variance is less than 2, equal variance assumption is reasonable.

max(s²) / min(s²) < 2 → Acceptable

2. Visual Method: Side-by-Side Boxplots

Create boxplots for each group
Check if boxes (IQR) are similar in height
Similar spread suggests equal variances

3. Formal Test: Levene's Test

Tests H₀: σ₁² = σ₂² = ... = σₖ²
If p-value > 0.05: equal variance assumption is reasonable
Available in most statistical software

ANOVA is Fairly Robust When:

Equal sample sizes: Groups have same or very similar n values
Mild violations: Variance ratio < 3

With equal sample sizes across groups, ANOVA can tolerate moderate violations of equal variance.

What to Do if Equal Variance is Violated

Check for outliers: May be inflating variance in some groups
Transform the data: Log transformation often stabilizes variance
Use Welch's ANOVA: Modified version that doesn't assume equal variances
Use non-parametric test: Kruskal-Wallis test

Example: Checking Equal Variance

Group	Sample Variance (s²)
Group 1	12.5
Group 2	15.2
Group 3	10.8
Group 4	13.1

Check: Ratio = 15.2 / 10.8 = 1.41 < 2

Conclusion: Equal variance assumption is reasonable.

When to Use ANOVA (Decision Guide)

Use ANOVA When:

Comparing 3 or more independent groups
Dependent variable is quantitative (continuous)
Independent variable is categorical
Independence assumption is met
Normality is reasonable (or large samples)
Equal variances is reasonable (or equal sample sizes)

Do NOT Use ANOVA When:

Situation	Use Instead
Only 2 groups	Two-sample t-test
Dependent variable is categorical	Chi-square test
Repeated measures (same subjects)	Repeated measures ANOVA
Severe non-normality + small samples	Kruskal-Wallis test
Severely unequal variances	Welch's ANOVA
Two or more factors	Two-way ANOVA or factorial ANOVA

Non-Parametric Alternative: Kruskal-Wallis Test

When to Use Kruskal-Wallis

The Kruskal-Wallis test is the non-parametric alternative to one-way ANOVA. Use it when:

Normality assumption is severely violated
Small sample sizes with non-normal data
Ordinal data (ranked data)
Extreme outliers that can't be removed

How Kruskal-Wallis Works

Rank all observations from smallest to largest (ignoring groups)
Compare the average ranks across groups
If groups have similar distributions, average ranks should be similar
Uses a chi-square distribution instead of F-distribution

Advantages:

No normality assumption required
Robust to outliers
Works with ordinal data

Disadvantages:

Less powerful than ANOVA when normality holds
Tests medians rather than means
More difficult to interpret

Check Your Understanding

Question 1

A researcher collects data from 4 groups with the following sample variances: s₁² = 8, s₂² = 12, s₃² = 15, s₄² = 18. Is the equal variance assumption reasonable?

Step 1: Find max and min variances

Max = 18
Min = 8

Step 2: Calculate ratio

Ratio = 18 / 8 = 2.25

Step 3: Apply rule of thumb

2.25 > 2, so the equal variance assumption is questionable.

Recommendation: Check with Levene's test or consider using Welch's ANOVA instead. If sample sizes are equal across groups, regular ANOVA may still be acceptable.

Question 2

You want to compare customer satisfaction ratings (1-5 scale) across three store locations. You have 15 customers from each location. What assumptions do you need to check?

Three assumptions to check:

Independence:
- Were customers randomly selected?
- Is each customer surveyed only once?
- Are responses independent?
Normality:
- Check histograms or Q-Q plots for each location
- With n = 15 per group, some deviation is acceptable
- Note: Ordinal data (1-5 scale) may violate normality; consider Kruskal-Wallis
Equal Variances:
- Calculate sample variance for each location
- Check if max/min ratio < 2
- Equal sample sizes (n=15 each) makes ANOVA robust

Question 3

Which test should you use in each scenario?

a) Comparing mean test scores across 5 different study strategies (random assignment, n = 40 per group, data appears normal)

b) Comparing median income across 3 neighborhoods (n = 12 per group, highly skewed data)

c) Comparing pre-test and post-test scores for the same 50 students

a) One-way ANOVA

3+ groups (5 strategies)
Independent groups (random assignment)
Large samples (n = 40)
Normal data
All assumptions met

b) Kruskal-Wallis test

3 groups
Highly skewed data (normality violated)
Small samples (n = 12)
Non-parametric test is appropriate

c) Paired t-test

Same subjects measured twice (not independent)
Only 2 conditions (pre and post)
Dependent/paired samples
NOT appropriate for ANOVA

Lesson Summary

Three Key Assumptions:

Independence: Observations are independent (check study design)
Normality: Populations are normally distributed (check with plots/tests, robust with large n)
Equal Variances: σ₁² = σ₂² = ... = σₖ² (check ratio < 2, robust with equal n)

Key Takeaways:

ANOVA is robust to moderate violations when sample sizes are large and equal
Independence is most critical—if violated, results are invalid
Use Kruskal-Wallis test when normality is severely violated
Use Welch's ANOVA when variances are unequal
Always check assumptions BEFORE interpreting results
Choose appropriate test based on data characteristics and assumption violations

← Previous: Post-Hoc Tests Next: Practice Problems →