Save or print this lesson:

Lesson 3: Chi-Square Test of Homogeneity

Comparing Distributions Across Multiple Populations

Learning Objectives

By the end of this lesson, you will be able to:

Understand when to use the chi-square test of homogeneity
Distinguish between test of independence and test of homogeneity
Perform a complete chi-square test of homogeneity
Interpret results about whether distributions are the same across populations
Recognize the conceptual difference despite identical calculations

What is Homogeneity?

Homogeneity means "sameness" - when distributions are homogeneous, they have the same pattern across different groups or populations.

When to Use Test of Homogeneity

One categorical variable
Multiple samples/populations (2 or more groups)
Purpose: Test if the distribution of the variable is the same across all populations

Research question pattern: "Do [population 1], [population 2], and [population 3] have the same distribution of [variable]?"

Independence vs. Homogeneity: The KEY Difference

IMPORTANT: Same Calculations, Different Study Design!

The chi-square test of homogeneity uses the EXACT SAME formulas as the test of independence, but the research question and study design are completely different.

Test of Independence

Study Design:

ONE random sample
TWO variables measured on each subject
Sample size determined beforehand, but cell counts vary

Question: Are the two variables associated?

Example: Survey 500 people, classify each by gender AND political party

Test of Homogeneity

Study Design:

MULTIPLE samples/populations
ONE variable measured in each group
Sample sizes for each group determined beforehand

Question: Do the populations have the same distribution?

Example: Sample 200 from City A, 200 from City B, 200 from City C; measure political party in each

The Conceptual Difference

Independence: "Are these two things related?" (Relationship between variables)

Homogeneity: "Are these groups similar?" (Comparison of distributions)

In practice: The table setup often reveals which test you're doing:

Rows = different populations/groups → Usually homogeneity
Rows and columns both represent variable categories → Usually independence

Step-by-Step Procedure

Step 1: State the Hypotheses

H₀ (Null Hypothesis): The distribution is the same for all populations

Hₐ (Alternative Hypothesis): The distribution is NOT the same for all populations (at least one differs)

Step 2: Check Conditions

Random sampling (or random assignment)
Independent observations
All expected frequencies ≥ 5

Step 3: Calculate Expected Frequencies

Same Formula as Independence:

E = (Row Total × Column Total) / Grand Total

Step 4: Calculate the Test Statistic

χ² = Σ[(O - E)² / E]

Step 5: Find Degrees of Freedom

df = (r - 1)(c - 1)

Steps 6-7: Find p-value and Make Conclusion

Use chi-square table or technology, then state conclusion in context

Complete Example: Comparing Three Cities

Example 1: Political Views Across Cities

Problem: A political analyst wants to know if political views differ across three cities. They randomly sample 150 residents from each city and ask about their political leaning:

City	Liberal	Moderate	Conservative	Row Total
City A	70	50	30	150
City B	55	60	35	150
City C	45	55	50	150
Column Total	170	165	115	450

Question: At α = 0.05, is there evidence that the distribution of political views differs across the three cities?

Solution:

Step 1: State the hypotheses

H₀: The distribution of political views is the same in all three cities
Hₐ: The distribution of political views is NOT the same across all three cities

Step 2: Check conditions

Random samples from each city (150 from each)
Observations are independent
We'll check if all expected frequencies ≥ 5

Step 3: Calculate expected frequencies

For each cell: E = (Row Total × Column Total) / Grand Total

Sample calculations:

City A & Liberal: E = (150 × 170) / 450 = 56.67
City A & Moderate: E = (150 × 165) / 450 = 55.00
City A & Conservative: E = (150 × 115) / 450 = 38.33

City	Liberal (E)	Moderate (E)	Conservative (E)
City A	56.67	55.00	38.33
City B	56.67	55.00	38.33
City C	56.67	55.00	38.33

All expected frequencies are ≥ 5!

Step 4: Calculate the chi-square test statistic

Cell	O	E	(O - E)²/E
City A & Liberal	70	56.67	3.136
City A & Moderate	50	55.00	0.455
City A & Conservative	30	38.33	1.813
City B & Liberal	55	56.67	0.049
City B & Moderate	60	55.00	0.455
City B & Conservative	35	38.33	0.289
City C & Liberal	45	56.67	2.402
City C & Moderate	55	55.00	0.000
City C & Conservative	50	38.33	3.548
Total χ²			12.147

χ² = 12.147

Step 5: Find degrees of freedom

df = (r - 1)(c - 1) = (3 - 1)(3 - 1) = 2 × 2 = 4

Step 6: Find p-value or critical value

Using a chi-square table with df = 4 and α = 0.05:

Critical value = 9.488

Since our χ² = 12.147 > 9.488, we REJECT H₀

(Alternatively, p-value ≈ 0.016 < 0.05)

Step 7: Conclusion

At the 0.05 significance level, there is sufficient evidence to conclude that the distribution of political views differs across the three cities. The cities are not homogeneous in their political views.

Follow-up observation: Looking at the data, City A appears more liberal (70 vs 56.67 expected), while City C appears more conservative (50 vs 38.33 expected).

Check Your Understanding

Question: In the example above, why is this a test of homogeneity rather than independence?

Answer: B) Because we have multiple separate samples (one from each city)

Explanation: We took three separate samples (150 from City A, 150 from City B, 150 from City C). We're comparing the distribution of ONE variable (political views) across three populations. This is the key feature of homogeneity tests - multiple populations, one variable.

Another Example: Treatment Groups

Example 2: Comparing Educational Approaches

Problem: An educator tests three different teaching methods by randomly assigning students to one of three groups (100 students per method). After the course, students are classified as achieving Low, Medium, or High mastery:

Method	Low	Medium	High	Row Total
Method 1 (Traditional)	25	50	25	100
Method 2 (Flipped)	20	45	35	100
Method 3 (Hybrid)	15	40	45	100
Column Total	60	135	105	300

Question: At α = 0.01, is there evidence that the distribution of mastery levels differs across the three teaching methods?

Complete Solution:

Step 1: Hypotheses

H₀: The distribution of mastery levels is the same for all three teaching methods
Hₐ: The distribution of mastery levels is NOT the same across all three methods

Step 3: Expected frequencies

Each method has 100 students. If distributions were the same:

Method	Low (E)	Medium (E)	High (E)
Method 1	(100×60)/300 = 20	(100×135)/300 = 45	(100×105)/300 = 35
Method 2	20	45	35
Method 3	20	45	35

All expected frequencies ≥ 5

Step 4: Calculate χ²

Cell	O	E	(O-E)²/E
Method 1 & Low	25	20	1.250
Method 1 & Medium	50	45	0.556
Method 1 & High	25	35	2.857
Method 2 & Low	20	20	0.000
Method 2 & Medium	45	45	0.000
Method 2 & High	35	35	0.000
Method 3 & Low	15	20	1.250
Method 3 & Medium	40	45	0.556
Method 3 & High	45	35	2.857
Total			9.326

χ² = 9.326

Step 5: Degrees of freedom

df = (3 - 1)(3 - 1) = 2 × 2 = 4

Step 6: Critical value

With df = 4 and α = 0.01, critical value = 13.277

Since 9.326 < 13.277, we fail to reject H₀

(p-value ≈ 0.053 > 0.01)

Step 7: Conclusion

At the 0.01 significance level, there is insufficient evidence to conclude that the distribution of mastery levels differs across the three teaching methods. The methods appear to produce homogeneous distributions of student mastery.

Note: The p-value (0.053) is very close to 0.05. If we had used α = 0.05, we would have rejected H₀. This highlights the importance of choosing α before conducting the test!

Check Your Understanding

Question: Which of the following scenarios would use a test of homogeneity rather than independence?

Answer: B) Sample 150 from School A and 150 from School B, asking each about preferred study method

Explanation: This scenario has multiple separate samples (School A and School B) and measures one variable (study method preference) in each group. We're comparing whether the two schools have the same distribution of study preferences - this is homogeneity. Options A and C are independence (one sample, two variables), and D is goodness of fit (one sample, one variable).

Summary Table: All Three Chi-Square Tests

Test Type	Variables	Samples	Research Question	Example
Goodness of Fit	1 categorical	1 sample	Does data fit expected distribution?	Is a die fair?
Independence	2 categorical	1 sample	Are variables associated?	Are gender and party related?
Homogeneity	1 categorical	2+ samples	Same distribution across groups?	Do three cities have same political views?

Key Takeaways

Remember These Points

Purpose: Compare distributions of one variable across multiple populations
Study design: Multiple samples, one variable in each
Calculations: IDENTICAL to test of independence
Conceptual difference: Comparing groups vs. testing relationship
Expected frequency: E = (Row Total × Column Total) / Grand Total
Test statistic: χ² = Σ[(O - E)² / E]
Degrees of freedom: df = (r - 1)(c - 1)
Interpretation: Rejected H₀ = distributions differ; Failed to reject = distributions appear homogeneous

Continue to Lesson 4: Choosing the Right Test →

← Previous: Lesson 2 (Test of Independence)

← Back to Module 12 Overview