Safaa Dabagh

Comprehensive Study Guide

Module 12: Chi-Square Tests

Table of Contents

  1. Overview of Chi-Square Tests
  2. Chi-Square Distribution
  3. Goodness of Fit Test
  4. Test of Independence
  5. Test of Homogeneity
  6. Comparison of All Three Tests
  7. Choosing the Right Test
  8. Conditions and Assumptions
  9. Interpreting Results
  10. Common Mistakes to Avoid

1. Overview of Chi-Square Tests

Purpose: Chi-square tests analyze categorical data to determine if observed frequencies differ significantly from expected frequencies.

Three Types of Chi-Square Tests

  • Goodness of Fit: Tests if sample data fits an expected distribution
  • Independence: Tests if two categorical variables are related
  • Homogeneity: Tests if multiple populations have the same distribution

When to Use Chi-Square

  • You have categorical data (counts/frequencies)
  • You want to compare observed vs. expected frequencies
  • You need to test relationships between categorical variables
  • Do NOT use for quantitative data (use t-tests, ANOVA instead)

2. Chi-Square Distribution

Properties

Key Characteristics

3. Chi-Square Goodness of Fit Test

When to Use

Step-by-Step Procedure

  1. State hypotheses
    • H₀: The data follows the specified distribution
    • Hₐ: The data does NOT follow the specified distribution
  2. Check conditions
    • Random sampling
    • Independent observations
    • All expected frequencies ≥ 5
  3. Calculate expected frequencies
    • E = n × p (for each category)
  4. Calculate test statistic
    • χ² = Σ[(O - E)² / E]
  5. Find degrees of freedom
    • df = k - 1
  6. Find p-value or critical value
  7. Make decision and conclude

Formulas for Goodness of Fit

Expected Frequency:

E = n × p

where n = sample size, p = expected proportion

Test Statistic:

χ² = Σ[(O - E)² / E]

Sum over all categories

Degrees of Freedom:

df = k - 1

where k = number of categories

Example

Problem: Test if a die is fair with 120 rolls

Expected: Each face should appear 120/6 = 20 times

df: 6 - 1 = 5

4. Chi-Square Test of Independence

When to Use

Step-by-Step Procedure

  1. State hypotheses
    • H₀: The two variables are independent (no association)
    • Hₐ: The two variables are dependent (associated)
  2. Check conditions (same as goodness of fit)
  3. Calculate expected frequencies
    • E = (Row Total × Column Total) / Grand Total
    • Calculate for EACH cell
  4. Calculate test statistic
    • χ² = Σ[(O - E)² / E] over ALL cells
  5. Find degrees of freedom
    • df = (r - 1)(c - 1)
  6. Find p-value or critical value
  7. Make decision and conclude

Formulas for Test of Independence

Expected Frequency (for each cell):

E = (Row Total × Column Total) / Grand Total

Test Statistic:

χ² = Σ[(O - E)² / E]

Sum over ALL cells in contingency table

Degrees of Freedom:

df = (r - 1)(c - 1)

where r = rows, c = columns

Example

Problem: Test if gender and political party are independent (one sample of 500)

Setup: 2×3 contingency table (2 genders × 3 parties)

df: (2-1)(3-1) = 2

5. Chi-Square Test of Homogeneity

When to Use

Step-by-Step Procedure

  1. State hypotheses
    • H₀: The distribution is the same for all populations
    • Hₐ: At least one population has a different distribution
  2. Check conditions (same as others)
  3. Calculate expected frequencies
    • SAME formula as independence: E = (RT × CT) / GT
  4. Calculate test statistic
    • SAME formula: χ² = Σ[(O - E)² / E]
  5. Find degrees of freedom
    • SAME formula: df = (r - 1)(c - 1)
  6. Find p-value or critical value
  7. Make decision and conclude

Key Difference from Independence

Calculations are IDENTICAL, but study design differs:

  • Independence: One sample, two variables
  • Homogeneity: Multiple samples, one variable

Example

Problem: Compare satisfaction across 3 cities (100 sampled from each city)

Setup: 3×3 table (3 cities × 3 satisfaction levels)

df: (3-1)(3-1) = 4

6. Comparison of All Three Tests

Feature Goodness of Fit Independence Homogeneity
Variables 1 categorical 2 categorical 1 categorical
Samples 1 sample 1 sample 2+ samples
Research Question Does data fit expected distribution? Are variables related? Same distribution across groups?
Example Is a die fair? Gender vs party preference? Three cities' political views same?
Expected Frequency E = n × p E = (RT × CT) / GT E = (RT × CT) / GT
Test Statistic χ² = Σ[(O-E)²/E] χ² = Σ[(O-E)²/E] χ² = Σ[(O-E)²/E]
Degrees of Freedom df = k - 1 df = (r-1)(c-1) df = (r-1)(c-1)

7. Choosing the Right Test

Decision Process

  1. Is the data categorical?
    • No → Use t-test, ANOVA, or regression (NOT chi-square)
    • Yes → Continue
  2. How many categorical variables?
    • One variable → Goodness of Fit OR Homogeneity (check #3)
    • Two variables → Independence OR Homogeneity (check #3)
  3. How many samples?
    • One sample → Goodness of Fit (if 1 variable) or Independence (if 2 variables)
    • Multiple samples → Homogeneity

Quick Decision Rules

  • Testing if data matches a theory/model? → Goodness of Fit
  • One sample classified by TWO variables? → Independence
  • Comparing MULTIPLE groups on ONE variable? → Homogeneity
  • Random assignment to treatment groups? → Usually Homogeneity

8. Conditions and Assumptions

Required Conditions (ALL Tests)

  1. Random Sampling or Random Assignment
    • Ensures representative data
    • If violated: Cannot generalize to population
  2. Independence of Observations
    • Each observation must be independent
    • If violated: p-values are unreliable
  3. All Expected Frequencies ≥ 5
    • MOST CRITICAL CONDITION
    • Check EVERY expected cell
    • If violated: Combine categories, collect more data, or use Fisher's exact test (for 2×2)
  4. Categorical Data (Counts)
    • Must use frequencies, NOT proportions or percentages

9. Interpreting Results

Understanding the Test Statistic

Making Decisions

Using Critical Value:

  • If χ² > critical value → REJECT H₀
  • If χ² ≤ critical value → FAIL TO REJECT H₀

Using p-value:

  • If p-value < α → REJECT H₀
  • If p-value ≥ α → FAIL TO REJECT H₀

Writing Conclusions

Template for Goodness of Fit:

"At the [α] significance level, there is [sufficient/insufficient] evidence to conclude that the data does not follow the [specified distribution]."

Template for Independence:

"At the [α] significance level, there is [sufficient/insufficient] evidence to conclude that [variable 1] and [variable 2] are associated."

Template for Homogeneity:

"At the [α] significance level, there is [sufficient/insufficient] evidence to conclude that the distribution of [variable] differs across [populations]."

Important Reminder

Association ≠ Causation!

Even if a chi-square test shows two variables are associated, this does NOT prove that one causes the other. There may be confounding variables or the relationship may be indirect.

10. Common Mistakes to Avoid

Top 10 Mistakes

  1. Using proportions instead of counts
    • WRONG: Entering 0.35 (35%)
    • RIGHT: Entering 35 (the count)
  2. Not checking expected frequency condition
    • Always calculate expected values BEFORE computing χ²
    • Check that ALL are ≥ 5
  3. Confusing independence and homogeneity
    • Look at study design, not just the table
  4. Using wrong degrees of freedom
    • Goodness of fit: k - 1
    • Independence/Homogeneity: (r-1)(c-1)
  5. Trying to use one-tailed test
    • Chi-square is ALWAYS right-tailed
  6. Using chi-square for quantitative data
    • Chi-square is for categories only!
  7. Claiming causation from association
    • Association does not imply causation
  8. Forgetting to sum over ALL cells
    • χ² must include contribution from every cell
  9. Not stating conclusion in context
    • Always relate back to the original problem
  10. Rounding too early
    • Keep extra decimals in calculations
    • Round only final answer

Study Tips

  • Make flashcards for the three test types and when to use each
  • Practice calculating expected frequencies - this is key!
  • Memorize the df formulas for each test
  • Always check the expected frequency condition
  • Work through complete examples step-by-step
  • Practice identifying which test to use from word problems
  • Keep the quick reference sheet handy during practice
View Quick Reference →

← Back to Module 12 Overview