Safaa Dabagh

Lesson 3: Chi-Square Test of Homogeneity

Comparing Distributions Across Multiple Populations

Learning Objectives

By the end of this lesson, you will be able to:

What is Homogeneity?

Homogeneity means "sameness" - when distributions are homogeneous, they have the same pattern across different groups or populations.

When to Use Test of Homogeneity

Research question pattern: "Do [population 1], [population 2], and [population 3] have the same distribution of [variable]?"

Independence vs. Homogeneity: The KEY Difference

IMPORTANT: Same Calculations, Different Study Design!

The chi-square test of homogeneity uses the EXACT SAME formulas as the test of independence, but the research question and study design are completely different.

Test of Independence

Study Design:

  • ONE random sample
  • TWO variables measured on each subject
  • Sample size determined beforehand, but cell counts vary

Question: Are the two variables associated?

Example: Survey 500 people, classify each by gender AND political party

Test of Homogeneity

Study Design:

  • MULTIPLE samples/populations
  • ONE variable measured in each group
  • Sample sizes for each group determined beforehand

Question: Do the populations have the same distribution?

Example: Sample 200 from City A, 200 from City B, 200 from City C; measure political party in each

The Conceptual Difference

Independence: "Are these two things related?" (Relationship between variables)

Homogeneity: "Are these groups similar?" (Comparison of distributions)

In practice: The table setup often reveals which test you're doing:

Step-by-Step Procedure

Step 1: State the Hypotheses

H₀ (Null Hypothesis): The distribution is the same for all populations

Hₐ (Alternative Hypothesis): The distribution is NOT the same for all populations (at least one differs)

Step 2: Check Conditions

Step 3: Calculate Expected Frequencies

Same Formula as Independence:

E = (Row Total × Column Total) / Grand Total

Step 4: Calculate the Test Statistic

χ² = Σ[(O - E)² / E]

Step 5: Find Degrees of Freedom

df = (r - 1)(c - 1)

Steps 6-7: Find p-value and Make Conclusion

Use chi-square table or technology, then state conclusion in context

Complete Example: Comparing Three Cities

Example 1: Political Views Across Cities

Problem: A political analyst wants to know if political views differ across three cities. They randomly sample 150 residents from each city and ask about their political leaning:

City Liberal Moderate Conservative Row Total
City A 70 50 30 150
City B 55 60 35 150
City C 45 55 50 150
Column Total 170 165 115 450

Question: At α = 0.05, is there evidence that the distribution of political views differs across the three cities?

Solution:

Step 1: State the hypotheses

Step 2: Check conditions

Step 3: Calculate expected frequencies

For each cell: E = (Row Total × Column Total) / Grand Total

Sample calculations:

City Liberal (E) Moderate (E) Conservative (E)
City A 56.67 55.00 38.33
City B 56.67 55.00 38.33
City C 56.67 55.00 38.33

All expected frequencies are ≥ 5!

Step 4: Calculate the chi-square test statistic

Cell O E (O - E)²/E
City A & Liberal 70 56.67 3.136
City A & Moderate 50 55.00 0.455
City A & Conservative 30 38.33 1.813
City B & Liberal 55 56.67 0.049
City B & Moderate 60 55.00 0.455
City B & Conservative 35 38.33 0.289
City C & Liberal 45 56.67 2.402
City C & Moderate 55 55.00 0.000
City C & Conservative 50 38.33 3.548
Total χ² 12.147

χ² = 12.147

Step 5: Find degrees of freedom

df = (r - 1)(c - 1) = (3 - 1)(3 - 1) = 2 × 2 = 4

Step 6: Find p-value or critical value

Using a chi-square table with df = 4 and α = 0.05:

Critical value = 9.488

Since our χ² = 12.147 > 9.488, we REJECT H₀

(Alternatively, p-value ≈ 0.016 < 0.05)

Step 7: Conclusion

At the 0.05 significance level, there is sufficient evidence to conclude that the distribution of political views differs across the three cities. The cities are not homogeneous in their political views.

Follow-up observation: Looking at the data, City A appears more liberal (70 vs 56.67 expected), while City C appears more conservative (50 vs 38.33 expected).

Check Your Understanding

Question: In the example above, why is this a test of homogeneity rather than independence?

Answer: B) Because we have multiple separate samples (one from each city)

Explanation: We took three separate samples (150 from City A, 150 from City B, 150 from City C). We're comparing the distribution of ONE variable (political views) across three populations. This is the key feature of homogeneity tests - multiple populations, one variable.

Another Example: Treatment Groups

Example 2: Comparing Educational Approaches

Problem: An educator tests three different teaching methods by randomly assigning students to one of three groups (100 students per method). After the course, students are classified as achieving Low, Medium, or High mastery:

Method Low Medium High Row Total
Method 1 (Traditional) 25 50 25 100
Method 2 (Flipped) 20 45 35 100
Method 3 (Hybrid) 15 40 45 100
Column Total 60 135 105 300

Question: At α = 0.01, is there evidence that the distribution of mastery levels differs across the three teaching methods?

Complete Solution:

Step 1: Hypotheses

  • H₀: The distribution of mastery levels is the same for all three teaching methods
  • Hₐ: The distribution of mastery levels is NOT the same across all three methods

Step 3: Expected frequencies

Each method has 100 students. If distributions were the same:

Method Low (E) Medium (E) High (E)
Method 1 (100×60)/300 = 20 (100×135)/300 = 45 (100×105)/300 = 35
Method 2 20 45 35
Method 3 20 45 35

All expected frequencies ≥ 5

Step 4: Calculate χ²

Cell O E (O-E)²/E
Method 1 & Low 25 20 1.250
Method 1 & Medium 50 45 0.556
Method 1 & High 25 35 2.857
Method 2 & Low 20 20 0.000
Method 2 & Medium 45 45 0.000
Method 2 & High 35 35 0.000
Method 3 & Low 15 20 1.250
Method 3 & Medium 40 45 0.556
Method 3 & High 45 35 2.857
Total 9.326

χ² = 9.326

Step 5: Degrees of freedom

df = (3 - 1)(3 - 1) = 2 × 2 = 4

Step 6: Critical value

With df = 4 and α = 0.01, critical value = 13.277

Since 9.326 < 13.277, we fail to reject H₀

(p-value ≈ 0.053 > 0.01)

Step 7: Conclusion

At the 0.01 significance level, there is insufficient evidence to conclude that the distribution of mastery levels differs across the three teaching methods. The methods appear to produce homogeneous distributions of student mastery.

Note: The p-value (0.053) is very close to 0.05. If we had used α = 0.05, we would have rejected H₀. This highlights the importance of choosing α before conducting the test!

Check Your Understanding

Question: Which of the following scenarios would use a test of homogeneity rather than independence?

Answer: B) Sample 150 from School A and 150 from School B, asking each about preferred study method

Explanation: This scenario has multiple separate samples (School A and School B) and measures one variable (study method preference) in each group. We're comparing whether the two schools have the same distribution of study preferences - this is homogeneity. Options A and C are independence (one sample, two variables), and D is goodness of fit (one sample, one variable).

Summary Table: All Three Chi-Square Tests

Test Type Variables Samples Research Question Example
Goodness of Fit 1 categorical 1 sample Does data fit expected distribution? Is a die fair?
Independence 2 categorical 1 sample Are variables associated? Are gender and party related?
Homogeneity 1 categorical 2+ samples Same distribution across groups? Do three cities have same political views?

Key Takeaways

Remember These Points

Continue to Lesson 4: Choosing the Right Test →

← Previous: Lesson 2 (Test of Independence)

← Back to Module 12 Overview