Lesson 4: Choosing the Right Chi-Square Test
Decision Making, Conditions, and Common Mistakes
Learning Objectives
By the end of this lesson, you will be able to:
- Choose the appropriate chi-square test based on the research question
- Use a decision flowchart to identify which test to apply
- Check all necessary conditions for valid chi-square testing
- Recognize common mistakes in chi-square testing
- Distinguish chi-square tests from other statistical tests
Decision Flowchart: Which Chi-Square Test?
START: What type of data do you have?
Use t-test or ANOVA,
NOT chi-square!
ONE sample
→ Goodness of Fit
Testing if data fits
expected distribution
ONE sample
→ Independence
Testing if variables
are associated
MULTIPLE samples
→ Homogeneity
Comparing distributions
across groups
Quick Comparison Table
| Feature | Goodness of Fit | Independence | Homogeneity |
|---|---|---|---|
| Variables | 1 categorical | 2 categorical | 1 categorical |
| Samples | 1 sample | 1 sample | 2+ samples |
| Research Question | Does data fit expected distribution? | Are the variables related? | Do groups have same distribution? |
| Expected Frequency | E = n × p | E = (RT × CT) / GT | E = (RT × CT) / GT |
| Degrees of Freedom | df = k - 1 | df = (r-1)(c-1) | df = (r-1)(c-1) |
| Example | Is a die fair? | Gender vs party preference? | Do three cities have same political views? |
Practice Identifying the Test
Scenario 1
A genetics researcher observes the flower colors in 200 pea plants. According to Mendelian genetics, the colors should appear in a 3:1 ratio (red to white). The researcher wants to test if the observed data matches this theoretical ratio.
Test: Goodness of Fit
Reasoning:
- ONE variable: flower color (categorical)
- ONE sample: 200 pea plants
- Question: Does the observed distribution match the expected 3:1 ratio?
Setup: df = k - 1 = 2 - 1 = 1 (two categories: red and white)
Scenario 2
A marketing team surveys 500 random consumers and records both their age group (18-34, 35-54, 55+) and their preferred shopping method (online, in-store, both). They want to know if age and shopping preference are related.
Test: Independence
Reasoning:
- TWO variables: age group AND shopping preference (both categorical)
- ONE sample: 500 consumers classified by both variables
- Question: Are age and shopping preference associated?
Setup: df = (r-1)(c-1) = (3-1)(3-1) = 4
Scenario 3
A school district wants to compare student satisfaction across three different schools. They randomly sample 100 students from School A, 100 from School B, and 100 from School C. Each student rates satisfaction as Low, Medium, or High. The district wants to know if satisfaction levels are the same across all three schools.
Test: Homogeneity
Reasoning:
- ONE variable: satisfaction level (categorical)
- MULTIPLE samples: three separate samples from three schools
- Question: Do the three schools have the same distribution of satisfaction?
Setup: df = (r-1)(c-1) = (3-1)(3-1) = 4 (3 schools × 3 satisfaction levels)
Scenario 4
A university randomly assigns 150 students to three different study techniques (50 per technique). After the exam, students are classified as passing or failing. Researchers want to know if the three techniques have different pass/fail distributions.
Test: Homogeneity
Reasoning:
- ONE variable: pass/fail (categorical)
- MULTIPLE samples: students assigned to three different techniques (predetermined group sizes)
- Question: Do the three techniques produce the same distribution of outcomes?
Setup: df = (r-1)(c-1) = (3-1)(2-1) = 2
Note: Random assignment to groups is a key indicator of homogeneity!
Scenario 5
A casino manager rolls a die 600 times to verify it's fair. All six outcomes should be equally likely if the die is fair.
Test: Goodness of Fit
Reasoning:
- ONE variable: die outcome (categorical: 1, 2, 3, 4, 5, or 6)
- ONE sample: 600 rolls
- Question: Do the observed frequencies fit the expected equal distribution?
Setup: df = k - 1 = 6 - 1 = 5, Expected for each outcome = 600/6 = 100
Conditions for ALL Chi-Square Tests
Required Conditions Checklist
Before conducting ANY chi-square test, verify these conditions:
Data must come from a random sample or random assignment to groups. This ensures observations represent the population and aren't biased.
What if violated? Results cannot be generalized to the population.
Each observation must be independent - one observation shouldn't influence another.
What if violated? Test results become unreliable; actual p-values may be different from calculated p-values.
Common violations: Repeated measures on same subjects, clustered sampling without adjustment, paired data.
ALL expected cell counts must be at least 5. This is the most commonly checked condition.
Why? The chi-square distribution is only a good approximation when expected counts are sufficiently large.
What if violated? Options:
- Combine categories (if logically reasonable)
- Collect more data
- For 2×2 tables, use Fisher's exact test instead
General guideline: Total sample size should be at least 5 times the number of cells.
Example: For a 3×4 table (12 cells), you should have n ≥ 60.
Use actual counts, NOT proportions or percentages.
Common mistake: Entering percentages instead of raw counts.
Common Mistakes to Avoid
Mistake #1: Using Proportions Instead of Counts
Wrong: Entering 0.45 (45%) into the chi-square formula
Right: Entering 45 (the actual count of observations)
Why it matters: Chi-square tests require raw frequencies. Convert percentages back to counts first!
Mistake #2: Not Checking Expected Frequency Condition
Problem: Calculating χ² without verifying all E ≥ 5
Fix: ALWAYS calculate expected frequencies first and check the condition before proceeding
Example: If you have a cell with E = 3.2, you need to either combine categories or collect more data
Mistake #3: Confusing Independence and Homogeneity
Problem: These tests use identical calculations but answer different questions
Fix: Look at the study design:
- One sample, two variables → Independence
- Multiple samples, one variable → Homogeneity
Mistake #4: Using One-Tailed Tests
Problem: Trying to use a one-tailed test with chi-square
Fix: Chi-square tests are ALWAYS right-tailed. Large χ² values (in either direction of deviation) contribute to the statistic.
Why? We square the differences (O - E)², so negative deviations become positive.
Mistake #5: Using Chi-Square for Quantitative Data
Problem: Trying to use chi-square on means or measurements
Fix: Chi-square is for categorical data only!
- If you have means/averages → Use t-test or ANOVA
- If you have counts/frequencies → Use chi-square
Mistake #6: Claiming Causation from Association
Problem: Concluding that one variable causes another just because they're associated
Fix: Association ≠ Causation! Chi-square tests only detect relationships, not cause-and-effect
Example: Finding that ice cream sales and drowning deaths are associated doesn't mean ice cream causes drowning (both are related to warm weather - a confounding variable)
Chi-Square vs. Other Tests
| Data Type | Question Type | Appropriate Test |
|---|---|---|
| Categorical (counts) | Does distribution match expected? | Chi-square goodness of fit |
| Categorical (counts) | Are two variables related? | Chi-square independence |
| Categorical (counts) | Same distribution across groups? | Chi-square homogeneity |
| Quantitative (means) | Compare two group means | Two-sample t-test |
| Quantitative (means) | Compare 3+ group means | ANOVA |
| Quantitative (two variables) | Is there a linear relationship? | Linear regression / Correlation |
| Categorical (proportions) | Compare two proportions | Two-proportion z-test |
Interactive Decision Tree
Click through scenarios to practice choosing the right test
Practice Decision-Making
Quick Decision Strategy
Ask yourself these questions in order:
- Is the data categorical (counts/frequencies)?
- No → Not chi-square (use t-test, ANOVA, or regression)
- Yes → Continue to #2
- How many categorical variables?
- One → Continue to #3
- Two → It's likely Independence (if one sample) or Homogeneity (if multiple samples)
- How many samples?
- One sample, one variable → Goodness of Fit
- One sample, two variables → Independence
- Multiple samples, one variable → Homogeneity
Key Takeaways
Remember These Points
- Decision-making: Number of variables + number of samples determines which test
- Always check conditions: Especially expected frequencies ≥ 5
- Use counts, not proportions: Enter raw frequencies into formulas
- Right-tailed only: Chi-square tests are never one-tailed
- Association ≠ Causation: Chi-square shows relationships, not cause-effect
- When in doubt: Look at the research question - what is being compared or tested?