Safaa Dabagh

Lesson 4: Assumptions and Conditions for ANOVA

Checking Requirements and Handling Violations

Why Assumptions Matter

Like all statistical tests, ANOVA has assumptions that must be reasonably satisfied for the results to be valid and trustworthy.

Important

If assumptions are seriously violated, the F-statistic may not follow an F-distribution, leading to:

  • Incorrect p-values
  • Wrong conclusions
  • Increased Type I or Type II error rates

The Three Key Assumptions

  1. Independence of observations
  2. Normality of populations
  3. Equal Variances (Homogeneity of Variance)

Let's examine each in detail!

Assumption 1: Independence

Requirement

Observations must be independent within and between groups.

This means:

  • The value of one observation doesn't influence another
  • Each observation comes from a different, randomly selected subject
  • No repeated measures (same subject measured multiple times)

How to Check Independence

  • Study design: Were subjects randomly assigned to groups?
  • Random sampling: Were subjects randomly selected from populations?
  • No clustering: Are observations collected independently (not in batches or groups)?
  • One observation per subject: Is each subject measured only once?

Independence is Met When:

  • Experimental design with random assignment to groups
  • Random sample from each population
  • Each subject appears in only one group
  • Measurements collected independently

Independence is Violated When:

  • Same subjects measured multiple times (use repeated measures ANOVA instead)
  • Matched pairs or dependent samples
  • Clustered data (students within classrooms, patients within hospitals)
  • Time series data with autocorrelation

If violated: Results are invalid. Must use different methods (e.g., repeated measures ANOVA, mixed models).

Assumption 2: Normality

Requirement

Each population is normally distributed.

More precisely: The residuals (deviations from group means) should be normally distributed.

How to Check Normality

  1. Visual Methods:
    • Histograms: Check if data appears bell-shaped
    • Normal probability plots (Q-Q plots): Points should fall roughly on a straight line
    • Boxplots: Check for symmetry and outliers
  2. Formal Tests:
    • Shapiro-Wilk test: Tests if data comes from normal distribution
    • H₀: Data is normally distributed
    • If p > 0.05: normality assumption is reasonable

Good News: ANOVA is Robust to Moderate Violations!

ANOVA is fairly robust to violations of normality, especially when:

  • Large sample sizes: n ≥ 30 per group (Central Limit Theorem applies)
  • Equal sample sizes: Groups have similar n values
  • Mild departures: Data is slightly skewed but not severely non-normal

Rule of thumb: With n ≥ 30 per group, moderate departures from normality are acceptable.

What to Do if Normality is Violated

  1. Check for outliers: Remove or investigate extreme values
  2. Transform the data: Log, square root, or other transformations may help
  3. Increase sample size: Larger samples make ANOVA more robust
  4. Use non-parametric alternative: Kruskal-Wallis test (doesn't assume normality)

Assumption 3: Equal Variances (Homogeneity of Variance)

Requirement

All populations have equal variances:

σ₁² = σ₂² = σ₃² = ... = σₖ²

This is also called homoscedasticity or homogeneity of variance.

How to Check Equal Variances

1. Rule of Thumb (Quick Check)

Calculate the sample variances s₁², s₂², ..., sₖ² for each group.

Rule: If the ratio of largest to smallest variance is less than 2, equal variance assumption is reasonable.

max(s²) / min(s²) < 2 → Acceptable

2. Visual Method: Side-by-Side Boxplots

3. Formal Test: Levene's Test

ANOVA is Fairly Robust When:

  • Equal sample sizes: Groups have same or very similar n values
  • Mild violations: Variance ratio < 3

With equal sample sizes across groups, ANOVA can tolerate moderate violations of equal variance.

What to Do if Equal Variance is Violated

  1. Check for outliers: May be inflating variance in some groups
  2. Transform the data: Log transformation often stabilizes variance
  3. Use Welch's ANOVA: Modified version that doesn't assume equal variances
  4. Use non-parametric test: Kruskal-Wallis test

Example: Checking Equal Variance

Group Sample Variance (s²)
Group 1 12.5
Group 2 15.2
Group 3 10.8
Group 4 13.1

Check: Ratio = 15.2 / 10.8 = 1.41 < 2

Conclusion: Equal variance assumption is reasonable.

When to Use ANOVA (Decision Guide)

Use ANOVA When:

  • Comparing 3 or more independent groups
  • Dependent variable is quantitative (continuous)
  • Independent variable is categorical
  • Independence assumption is met
  • Normality is reasonable (or large samples)
  • Equal variances is reasonable (or equal sample sizes)

Do NOT Use ANOVA When:

Situation Use Instead
Only 2 groups Two-sample t-test
Dependent variable is categorical Chi-square test
Repeated measures (same subjects) Repeated measures ANOVA
Severe non-normality + small samples Kruskal-Wallis test
Severely unequal variances Welch's ANOVA
Two or more factors Two-way ANOVA or factorial ANOVA

Non-Parametric Alternative: Kruskal-Wallis Test

When to Use Kruskal-Wallis

The Kruskal-Wallis test is the non-parametric alternative to one-way ANOVA. Use it when:

  • Normality assumption is severely violated
  • Small sample sizes with non-normal data
  • Ordinal data (ranked data)
  • Extreme outliers that can't be removed

How Kruskal-Wallis Works

  1. Rank all observations from smallest to largest (ignoring groups)
  2. Compare the average ranks across groups
  3. If groups have similar distributions, average ranks should be similar
  4. Uses a chi-square distribution instead of F-distribution

Advantages:

Disadvantages:

Check Your Understanding

Question 1

A researcher collects data from 4 groups with the following sample variances: s₁² = 8, s₂² = 12, s₃² = 15, s₄² = 18. Is the equal variance assumption reasonable?

Step 1: Find max and min variances

  • Max = 18
  • Min = 8

Step 2: Calculate ratio

Ratio = 18 / 8 = 2.25

Step 3: Apply rule of thumb

2.25 > 2, so the equal variance assumption is questionable.

Recommendation: Check with Levene's test or consider using Welch's ANOVA instead. If sample sizes are equal across groups, regular ANOVA may still be acceptable.

Question 2

You want to compare customer satisfaction ratings (1-5 scale) across three store locations. You have 15 customers from each location. What assumptions do you need to check?

Three assumptions to check:

  1. Independence:
    • Were customers randomly selected?
    • Is each customer surveyed only once?
    • Are responses independent?
  2. Normality:
    • Check histograms or Q-Q plots for each location
    • With n = 15 per group, some deviation is acceptable
    • Note: Ordinal data (1-5 scale) may violate normality; consider Kruskal-Wallis
  3. Equal Variances:
    • Calculate sample variance for each location
    • Check if max/min ratio < 2
    • Equal sample sizes (n=15 each) makes ANOVA robust

Question 3

Which test should you use in each scenario?

a) Comparing mean test scores across 5 different study strategies (random assignment, n = 40 per group, data appears normal)

b) Comparing median income across 3 neighborhoods (n = 12 per group, highly skewed data)

c) Comparing pre-test and post-test scores for the same 50 students

a) One-way ANOVA

  • 3+ groups (5 strategies)
  • Independent groups (random assignment)
  • Large samples (n = 40)
  • Normal data
  • All assumptions met

b) Kruskal-Wallis test

  • 3 groups
  • Highly skewed data (normality violated)
  • Small samples (n = 12)
  • Non-parametric test is appropriate

c) Paired t-test

  • Same subjects measured twice (not independent)
  • Only 2 conditions (pre and post)
  • Dependent/paired samples
  • NOT appropriate for ANOVA

Lesson Summary

Three Key Assumptions:

  • Independence: Observations are independent (check study design)
  • Normality: Populations are normally distributed (check with plots/tests, robust with large n)
  • Equal Variances: σ₁² = σ₂² = ... = σₖ² (check ratio < 2, robust with equal n)

Key Takeaways: