Study Guide - Module 12: Chi-Square Tests

1. Overview of Chi-Square Tests

Purpose: Chi-square tests analyze categorical data to determine if observed frequencies differ significantly from expected frequencies.

Three Types of Chi-Square Tests

Goodness of Fit: Tests if sample data fits an expected distribution
Independence: Tests if two categorical variables are related
Homogeneity: Tests if multiple populations have the same distribution

When to Use Chi-Square

You have categorical data (counts/frequencies)
You want to compare observed vs. expected frequencies
You need to test relationships between categorical variables
Do NOT use for quantitative data (use t-tests, ANOVA instead)

2. Chi-Square Distribution

Properties

Always positive: χ² ≥ 0 (cannot be negative)
Right-skewed: Tail extends to the right
Shape depends on df: More degrees of freedom → more symmetric
Mean = df: The expected value equals the degrees of freedom

Key Characteristics

Unlike the normal distribution, chi-square is NOT symmetric (except with very large df)
We always use the RIGHT tail for rejection region
Different df values create different distribution curves

3. Chi-Square Goodness of Fit Test

When to Use

ONE categorical variable
ONE sample
Want to test if data fits an expected distribution

Step-by-Step Procedure

State hypotheses
- H₀: The data follows the specified distribution
- Hₐ: The data does NOT follow the specified distribution
Check conditions
- Random sampling
- Independent observations
- All expected frequencies ≥ 5
Calculate expected frequencies
- E = n × p (for each category)
Calculate test statistic
- χ² = Σ[(O - E)² / E]
Find degrees of freedom
- df = k - 1
Find p-value or critical value
Make decision and conclude

Formulas for Goodness of Fit

Expected Frequency:

E = n × p

where n = sample size, p = expected proportion

Test Statistic:

χ² = Σ[(O - E)² / E]

Sum over all categories

Degrees of Freedom:

df = k - 1

where k = number of categories

Example

Problem: Test if a die is fair with 120 rolls

Expected: Each face should appear 120/6 = 20 times

df: 6 - 1 = 5

4. Chi-Square Test of Independence

When to Use

TWO categorical variables
ONE sample (classified by both variables)
Want to test if variables are related

Step-by-Step Procedure

State hypotheses
- H₀: The two variables are independent (no association)
- Hₐ: The two variables are dependent (associated)
Check conditions (same as goodness of fit)
Calculate expected frequencies
- E = (Row Total × Column Total) / Grand Total
- Calculate for EACH cell
Calculate test statistic
- χ² = Σ[(O - E)² / E] over ALL cells
Find degrees of freedom
- df = (r - 1)(c - 1)
Find p-value or critical value
Make decision and conclude

Formulas for Test of Independence

Expected Frequency (for each cell):

E = (Row Total × Column Total) / Grand Total

Test Statistic:

χ² = Σ[(O - E)² / E]

Sum over ALL cells in contingency table

Degrees of Freedom:

df = (r - 1)(c - 1)

where r = rows, c = columns

Example

Problem: Test if gender and political party are independent (one sample of 500)

Setup: 2×3 contingency table (2 genders × 3 parties)

df: (2-1)(3-1) = 2

5. Chi-Square Test of Homogeneity

When to Use

ONE categorical variable
MULTIPLE samples/populations
Want to test if distributions are the same across groups

Step-by-Step Procedure

State hypotheses
- H₀: The distribution is the same for all populations
- Hₐ: At least one population has a different distribution
Check conditions (same as others)
Calculate expected frequencies
- SAME formula as independence: E = (RT × CT) / GT
Calculate test statistic
- SAME formula: χ² = Σ[(O - E)² / E]
Find degrees of freedom
- SAME formula: df = (r - 1)(c - 1)
Find p-value or critical value
Make decision and conclude

Key Difference from Independence

Calculations are IDENTICAL, but study design differs:

Independence: One sample, two variables
Homogeneity: Multiple samples, one variable

Example

Problem: Compare satisfaction across 3 cities (100 sampled from each city)

Setup: 3×3 table (3 cities × 3 satisfaction levels)

df: (3-1)(3-1) = 4

6. Comparison of All Three Tests

Feature	Goodness of Fit	Independence	Homogeneity
Variables	1 categorical	2 categorical	1 categorical
Samples	1 sample	1 sample	2+ samples
Research Question	Does data fit expected distribution?	Are variables related?	Same distribution across groups?
Example	Is a die fair?	Gender vs party preference?	Three cities' political views same?
Expected Frequency	E = n × p	E = (RT × CT) / GT	E = (RT × CT) / GT
Test Statistic	χ² = Σ[(O-E)²/E]	χ² = Σ[(O-E)²/E]	χ² = Σ[(O-E)²/E]
Degrees of Freedom	df = k - 1	df = (r-1)(c-1)	df = (r-1)(c-1)

7. Choosing the Right Test

Decision Process

Is the data categorical?
- No → Use t-test, ANOVA, or regression (NOT chi-square)
- Yes → Continue
How many categorical variables?
- One variable → Goodness of Fit OR Homogeneity (check #3)
- Two variables → Independence OR Homogeneity (check #3)
How many samples?
- One sample → Goodness of Fit (if 1 variable) or Independence (if 2 variables)
- Multiple samples → Homogeneity

Quick Decision Rules

Testing if data matches a theory/model? → Goodness of Fit
One sample classified by TWO variables? → Independence
Comparing MULTIPLE groups on ONE variable? → Homogeneity
Random assignment to treatment groups? → Usually Homogeneity

8. Conditions and Assumptions

Required Conditions (ALL Tests)

Random Sampling or Random Assignment
- Ensures representative data
- If violated: Cannot generalize to population
Independence of Observations
- Each observation must be independent
- If violated: p-values are unreliable
All Expected Frequencies ≥ 5
- MOST CRITICAL CONDITION
- Check EVERY expected cell
- If violated: Combine categories, collect more data, or use Fisher's exact test (for 2×2)
Categorical Data (Counts)
- Must use frequencies, NOT proportions or percentages

9. Interpreting Results

Understanding the Test Statistic

Small χ²: Observed frequencies close to expected → likely fail to reject H₀
Large χ²: Observed frequencies far from expected → likely reject H₀
Perfect fit would give χ² = 0 (never happens with real data)

Making Decisions

Using Critical Value:

If χ² > critical value → REJECT H₀
If χ² ≤ critical value → FAIL TO REJECT H₀

Using p-value:

If p-value < α → REJECT H₀
If p-value ≥ α → FAIL TO REJECT H₀

Writing Conclusions

Template for Goodness of Fit:

"At the [α] significance level, there is [sufficient/insufficient] evidence to conclude that the data does not follow the [specified distribution]."

Template for Independence:

"At the [α] significance level, there is [sufficient/insufficient] evidence to conclude that [variable 1] and [variable 2] are associated."

Template for Homogeneity:

"At the [α] significance level, there is [sufficient/insufficient] evidence to conclude that the distribution of [variable] differs across [populations]."

Important Reminder

Association ≠ Causation!

Even if a chi-square test shows two variables are associated, this does NOT prove that one causes the other. There may be confounding variables or the relationship may be indirect.

10. Common Mistakes to Avoid

Top 10 Mistakes

Using proportions instead of counts
- WRONG: Entering 0.35 (35%)
- RIGHT: Entering 35 (the count)
Not checking expected frequency condition
- Always calculate expected values BEFORE computing χ²
- Check that ALL are ≥ 5
Confusing independence and homogeneity
- Look at study design, not just the table
Using wrong degrees of freedom
- Goodness of fit: k - 1
- Independence/Homogeneity: (r-1)(c-1)
Trying to use one-tailed test
- Chi-square is ALWAYS right-tailed
Using chi-square for quantitative data
- Chi-square is for categories only!
Claiming causation from association
- Association does not imply causation
Forgetting to sum over ALL cells
- χ² must include contribution from every cell
Not stating conclusion in context
- Always relate back to the original problem
Rounding too early
- Keep extra decimals in calculations
- Round only final answer

Study Tips

Make flashcards for the three test types and when to use each
Practice calculating expected frequencies - this is key!
Memorize the df formulas for each test
Always check the expected frequency condition
Work through complete examples step-by-step
Practice identifying which test to use from word problems
Keep the quick reference sheet handy during practice

Safaa Dabagh

Comprehensive Study Guide

Module 12: Chi-Square Tests

Table of Contents

1. Overview of Chi-Square Tests

Three Types of Chi-Square Tests

When to Use Chi-Square

2. Chi-Square Distribution

Properties

Key Characteristics

3. Chi-Square Goodness of Fit Test

When to Use

Step-by-Step Procedure

Formulas for Goodness of Fit

Example

4. Chi-Square Test of Independence

When to Use

Step-by-Step Procedure

Formulas for Test of Independence

Example

5. Chi-Square Test of Homogeneity

When to Use

Step-by-Step Procedure

Key Difference from Independence

Example

6. Comparison of All Three Tests

7. Choosing the Right Test

Decision Process

Quick Decision Rules

8. Conditions and Assumptions

Required Conditions (ALL Tests)

9. Interpreting Results

Understanding the Test Statistic

Making Decisions

Writing Conclusions

Important Reminder

10. Common Mistakes to Avoid

Top 10 Mistakes

Study Tips