Practice Problems: Chi-Square Tests
20 Comprehensive Problems Covering All Three Chi-Square Tests
How to Use These Practice Problems
- Work through problems systematically - don't skip the setup steps!
- Write out your work - this helps solidify your understanding
- Check conditions before calculating the test statistic
- Use the solution as a guide if you get stuck, but try first!
- Pay attention to conclusion wording - practice stating results in context
Part 1: Goodness of Fit Test (5 Problems)
Problem 1: Traffic Patterns
A city planner believes traffic accidents occur equally throughout the week. Data from 140 accidents shows:
| Day | Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|---|
| Observed | 18 | 16 | 22 | 21 | 28 | 19 | 16 |
Test at α = 0.05: Is there evidence that accidents are NOT equally distributed across days?
Solution:
Test: Goodness of Fit (one variable, testing equal distribution)
H₀: Accidents are equally distributed across all 7 days
Hₐ: Accidents are NOT equally distributed
Expected: E = 140/7 = 20 for each day
df: k - 1 = 7 - 1 = 6
Calculate χ²:
χ² = (18-20)²/20 + (16-20)²/20 + (22-20)²/20 + (21-20)²/20 + (28-20)²/20 + (19-20)²/20 + (16-20)²/20
χ² = 0.2 + 0.8 + 0.2 + 0.05 + 3.2 + 0.05 + 0.8 = 5.30
Critical value (df=6, α=0.05): 12.592
Decision: 5.30 < 12.592, fail to reject H₀
Conclusion: At the 0.05 significance level, there is insufficient evidence to conclude that accidents are not equally distributed across days of the week.
Problem 2: College Majors
A university's historical data shows the following distribution of majors: 30% STEM, 25% Business, 20% Humanities, 15% Social Sciences, 10% Arts. A random sample of 200 incoming freshmen shows:
| Major | STEM | Business | Humanities | Social Sci | Arts |
|---|---|---|---|---|---|
| Observed | 72 | 48 | 32 | 28 | 20 |
Test at α = 0.01: Does this incoming class follow the historical distribution?
Solution:
Test: Goodness of Fit
H₀: The distribution matches historical percentages
Hₐ: The distribution does NOT match
Expected frequencies:
- STEM: 200 × 0.30 = 60
- Business: 200 × 0.25 = 50
- Humanities: 200 × 0.20 = 40
- Social Sciences: 200 × 0.15 = 30
- Arts: 200 × 0.10 = 20
All E ≥ 5
df: 5 - 1 = 4
χ²: (72-60)²/60 + (48-50)²/50 + (32-40)²/40 + (28-30)²/30 + (20-20)²/20
= 2.4 + 0.08 + 1.6 + 0.133 + 0 = 4.213
Critical value (df=4, α=0.01): 13.277
Decision: 4.213 < 13.277, fail to reject H₀
Conclusion: At the 0.01 significance level, there is insufficient evidence that the incoming class distribution differs from the historical distribution.
Problem 3: Jury Selection
A county's population is 60% White, 25% Hispanic, 10% Black, and 5% Asian. A random jury pool of 120 people contains:
| Ethnicity | White | Hispanic | Black | Asian |
|---|---|---|---|---|
| Observed | 82 | 22 | 10 | 6 |
Test at α = 0.05: Does the jury pool match the county's demographic distribution?
Solution:
Expected: White: 72, Hispanic: 30, Black: 12, Asian: 6
df: 4 - 1 = 3
χ²: (82-72)²/72 + (22-30)²/30 + (10-12)²/12 + (6-6)²/6
= 1.389 + 2.133 + 0.333 + 0 = 3.855
Critical value (df=3, α=0.05): 7.815
Conclusion: Fail to reject H₀. The jury pool distribution is consistent with the county demographics.
Problem 4: Birth Months
A researcher wants to test if births are equally likely in each quarter. Out of 400 births:
| Quarter | Q1 (Jan-Mar) | Q2 (Apr-Jun) | Q3 (Jul-Sep) | Q4 (Oct-Dec) |
|---|---|---|---|---|
| Observed | 115 | 95 | 92 | 98 |
Test at α = 0.10: Are births equally distributed across quarters?
Solution:
Expected: 400/4 = 100 for each quarter
df: 4 - 1 = 3
χ²: (115-100)²/100 + (95-100)²/100 + (92-100)²/100 + (98-100)²/100
= 2.25 + 0.25 + 0.64 + 0.04 = 3.18
Critical value (df=3, α=0.10): 6.251
Conclusion: Fail to reject H₀. No evidence that births differ by quarter.
Problem 5: Lottery Numbers
A lottery uses digits 0-9. In 500 draws, the last digit frequencies are:
| Digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| Obs | 48 | 52 | 46 | 51 | 49 | 54 | 47 | 50 | 53 | 50 |
Test at α = 0.05: Is the lottery fair (all digits equally likely)?
Solution:
Expected: 500/10 = 50 for each digit
df: 10 - 1 = 9
χ²: Sum of (O-E)²/E for all digits = 1.12
Critical value (df=9, α=0.05): 16.919
Conclusion: Fail to reject H₀. The lottery appears fair.
Part 2: Test of Independence (6 Problems)
Problem 6: Exercise and Health
A health researcher surveys 300 adults about exercise habits and self-reported health:
| Excellent Health | Good Health | Poor Health | Total | |
|---|---|---|---|---|
| Exercise Regularly | 70 | 50 | 10 | 130 |
| Don't Exercise | 40 | 80 | 50 | 170 |
| Total | 110 | 130 | 60 | 300 |
Test at α = 0.01: Are exercise habits and health status independent?
Solution:
Test: Independence (one sample, two variables)
H₀: Exercise and health are independent
Hₐ: Exercise and health are associated
Expected frequencies:
- Exercise & Excellent: (130×110)/300 = 47.67
- Exercise & Good: (130×130)/300 = 56.33
- Exercise & Poor: (130×60)/300 = 26
- No Exercise & Excellent: (170×110)/300 = 62.33
- No Exercise & Good: (170×130)/300 = 73.67
- No Exercise & Poor: (170×60)/300 = 34
df: (2-1)(3-1) = 2
χ²: Sum of all (O-E)²/E = 34.48
Critical value (df=2, α=0.01): 9.210
Conclusion: REJECT H₀. There is sufficient evidence that exercise habits and health status are associated.
Problem 7: Smartphone Preference
Survey of 400 consumers about age and smartphone brand:
| iPhone | Android | Other | Total | |
|---|---|---|---|---|
| 18-34 | 90 | 55 | 15 | 160 |
| 35-54 | 60 | 70 | 10 | 140 |
| 55+ | 30 | 55 | 15 | 100 |
| Total | 180 | 180 | 40 | 400 |
Test at α = 0.05: Are age and smartphone preference independent?
Solution:
df: (3-1)(3-1) = 4
Expected example: E(18-34 & iPhone) = (160×180)/400 = 72
Calculate all expected values, then χ² = 15.48
Critical value (df=4, α=0.05): 9.488
Conclusion: REJECT H₀. Age and smartphone preference are associated.
Problem 8: Education and Income (2×2 Table)
Random sample of 200 adults:
| High Income | Low Income | Total | |
|---|---|---|---|
| College Degree | 70 | 50 | 120 |
| No Degree | 30 | 50 | 80 |
| Total | 100 | 100 | 200 |
Test at α = 0.05: Are education and income independent?
Solution:
df: (2-1)(2-1) = 1
Expected: All cells = (row total × col total)/200
College & High: 60, College & Low: 60, No Degree & High: 40, No Degree & Low: 40
χ²: (70-60)²/60 + (50-60)²/60 + (30-40)²/40 + (50-40)²/40 = 10
Critical value (df=1, α=0.05): 3.841
Conclusion: REJECT H₀. Education and income level are associated.
Problem 9: Voting and Party Affiliation
500 voters surveyed about party and whether they voted in the last election:
| Voted | Didn't Vote | Total | |
|---|---|---|---|
| Democrat | 140 | 60 | 200 |
| Republican | 130 | 70 | 200 |
| Independent | 50 | 50 | 100 |
| Total | 320 | 180 | 500 |
Test at α = 0.05: Are party affiliation and voting behavior independent?
Solution:
df: (3-1)(2-1) = 2
Expected calculations yield χ² = 7.81
Critical value (df=2, α=0.05): 5.991
Conclusion: REJECT H₀. Party affiliation and voting behavior are associated.
Problem 10: Coffee and Productivity
250 employees surveyed:
| High Productivity | Medium | Low | Total | |
|---|---|---|---|---|
| Drinks Coffee | 55 | 70 | 25 | 150 |
| No Coffee | 35 | 45 | 20 | 100 |
| Total | 90 | 115 | 45 | 250 |
Test at α = 0.10: Are coffee consumption and productivity independent?
Solution:
df: (2-1)(3-1) = 2
χ² = 0.26 (very small)
Critical value (df=2, α=0.10): 4.605
Conclusion: Fail to reject H₀. No evidence that coffee and productivity are associated.
Problem 11: Social Media and Age
600 people surveyed:
| TikTok | Total | |||
|---|---|---|---|---|
| Under 30 | 60 | 100 | 90 | 250 |
| 30-50 | 110 | 70 | 20 | 200 |
| Over 50 | 130 | 10 | 10 | 150 |
| Total | 300 | 180 | 120 | 600 |
Test at α = 0.01: Are age and social media platform independent?
Solution:
df: (3-1)(3-1) = 4
χ² ≈ 142.5 (very large!)
Critical value (df=4, α=0.01): 13.277
Conclusion: STRONGLY REJECT H₀. Age and social media platform are clearly associated.
Part 3: Test of Homogeneity (5 Problems)
Problem 12: Customer Satisfaction Across Stores
A company samples 100 customers from each of three stores:
| Store | Satisfied | Neutral | Unsatisfied | Total |
|---|---|---|---|---|
| Store A | 70 | 20 | 10 | 100 |
| Store B | 60 | 30 | 10 | 100 |
| Store C | 55 | 25 | 20 | 100 |
| Total | 185 | 75 | 40 | 300 |
Test at α = 0.05: Do the three stores have the same satisfaction distribution?
Solution:
Test: Homogeneity (multiple samples, one variable)
H₀: Satisfaction distribution is the same for all three stores
Hₐ: At least one store has a different distribution
df: (3-1)(3-1) = 4
χ² = 7.46
Critical value (df=4, α=0.05): 9.488
Conclusion: Fail to reject H₀. No evidence that satisfaction differs across stores.
Problem 13: Teaching Methods
Students randomly assigned to three methods (60 per group), then tested:
| Method | Pass | Fail | Total |
|---|---|---|---|
| Method A | 48 | 12 | 60 |
| Method B | 52 | 8 | 60 |
| Method C | 40 | 20 | 60 |
| Total | 140 | 40 | 180 |
Test at α = 0.01: Do the methods produce different pass/fail rates?
Solution:
df: (3-1)(2-1) = 2
χ² = 6.43
Critical value (df=2, α=0.01): 9.210
Conclusion: Fail to reject H₀. No evidence methods produce different outcomes.
Problem 14: Regional Preferences
Sample 150 from each of four regions about product preference:
| Region | Product A | Product B | Product C | Total |
|---|---|---|---|---|
| North | 60 | 50 | 40 | 150 |
| South | 50 | 60 | 40 | 150 |
| East | 55 | 55 | 40 | 150 |
| West | 45 | 50 | 55 | 150 |
| Total | 210 | 215 | 175 | 600 |
Test at α = 0.05: Do regions have the same product preferences?
Solution:
df: (4-1)(3-1) = 6
χ² = 4.98
Critical value (df=6, α=0.05): 12.592
Conclusion: Fail to reject H₀. Regions appear to have homogeneous preferences.
Problem 15: Drug Trial Outcomes
200 patients per treatment group:
| Treatment | Improved | No Change | Worsened | Total |
|---|---|---|---|---|
| Drug A | 130 | 50 | 20 | 200 |
| Drug B | 110 | 70 | 20 | 200 |
| Placebo | 80 | 90 | 30 | 200 |
| Total | 320 | 210 | 70 | 600 |
Test at α = 0.01: Do the three treatments produce different outcome distributions?
Solution:
df: (3-1)(3-1) = 4
χ² = 19.05
Critical value (df=4, α=0.01): 13.277
Conclusion: REJECT H₀. The treatments produce different outcome distributions.
Problem 16: School Discipline Policies
Sample 120 students from each of two schools about discipline fairness:
| School | Fair | Somewhat Fair | Unfair | Total |
|---|---|---|---|---|
| School 1 | 50 | 40 | 30 | 120 |
| School 2 | 45 | 45 | 30 | 120 |
| Total | 95 | 85 | 60 | 240 |
Test at α = 0.05: Do the schools have the same fairness perception distribution?
Solution:
df: (2-1)(3-1) = 2
χ² = 0.51
Critical value (df=2, α=0.05): 5.991
Conclusion: Fail to reject H₀. Schools have similar fairness perception distributions.
Part 4: Choosing the Appropriate Test (4 Problems)
Problem 17: Identifying the Test
Scenario A: A researcher surveys 500 college students and records both their major (STEM, Humanities, Business) and their preferred study location (Library, Dorm, Coffee Shop).
Scenario B: A quality control manager samples 100 widgets from Factory 1, 100 from Factory 2, and 100 from Factory 3. Each widget is classified as Pass or Fail.
Scenario C: A casino rolls a die 300 times to test if all six faces are equally likely.
Question: Identify which chi-square test (Goodness of Fit, Independence, or Homogeneity) is appropriate for each scenario and explain why.
Solution:
Scenario A: Independence
- One sample (500 students)
- Two variables (major AND study location)
- Question: Are major and study location associated?
Scenario B: Homogeneity
- Three samples (100 from each factory)
- One variable (pass/fail)
- Question: Do factories have same pass/fail distribution?
Scenario C: Goodness of Fit
- One sample (300 rolls)
- One variable (die outcome)
- Question: Does distribution match equal likelihood?
Problem 18: Checking Conditions
A researcher plans to test independence between gender and voting preference. They survey 80 people, resulting in this table:
| Democrat | Republican | Other | |
|---|---|---|---|
| Male | O=25 | O=12 | O=3 |
| Female | O=28 | O=10 | O=2 |
Question: Calculate expected frequencies. Can the chi-square test be validly conducted? Why or why not?
Solution:
Row totals: Male: 40, Female: 40
Column totals: Democrat: 53, Republican: 22, Other: 5
Expected frequencies:
- Male & Democrat: (40×53)/80 = 26.5
- Male & Republican: (40×22)/80 = 11
- Male & Other: (40×5)/80 = 2.5
- Female & Democrat: 26.5
- Female & Republican: 11
- Female & Other: 2.5
Conclusion: NO, the test should NOT be conducted as is. Two cells have expected counts less than 5. Options: (1) Combine "Other" with another category, (2) Collect more data, or (3) Use Fisher's exact test.
Problem 19: Chi-Square vs. Other Tests
For each scenario, identify whether to use chi-square OR a different test:
A. Compare average test scores of students using three different study methods
B. Test if proportion of smokers differs between two cities
C. Determine if eye color and hair color are independent
D. Test if mean height differs between men and women
Solution:
A. ANOVA (comparing means of 3+ groups, quantitative data)
B. Two-proportion z-test (comparing proportions from two populations)
C. Chi-square test of independence (two categorical variables, one sample)
D. Two-sample t-test (comparing means of two groups, quantitative data)
Problem 20: Comprehensive Problem
A university administrator wants to know if student satisfaction with campus facilities is the same across three different campuses. They randomly survey 100 students from each campus:
| Campus | Very Satisfied | Satisfied | Dissatisfied | Total |
|---|---|---|---|---|
| Main Campus | 40 | 45 | 15 | 100 |
| North Campus | 35 | 50 | 15 | 100 |
| South Campus | 25 | 55 | 20 | 100 |
| Total | 100 | 150 | 50 | 300 |
Complete the following:
- Identify which chi-square test to use and explain why
- State the hypotheses
- Check all conditions
- Calculate the test statistic
- Find the critical value at α = 0.05
- Make a decision and state your conclusion
Complete Solution:
1. Test: Homogeneity
Reason: Three separate samples (one from each campus), one variable (satisfaction level)
2. Hypotheses:
- H₀: The distribution of satisfaction is the same across all three campuses
- Hₐ: At least one campus has a different satisfaction distribution
3. Conditions:
- Random samples from each campus
- Independent observations
- All expected frequencies: All = (100×column total)/300, all ≥ 5
4. Test Statistic:
Expected for each campus: Very Satisfied = 33.33, Satisfied = 50, Dissatisfied = 16.67
χ² = Σ(O-E)²/E = 4.67
5. Critical Value:
df = (3-1)(3-1) = 4
Critical value (α=0.05, df=4) = 9.488
6. Decision and Conclusion:
Since 4.67 < 9.488, we fail to reject H₀.
Conclusion: At the 0.05 significance level, there is insufficient evidence to conclude that satisfaction levels differ across the three campuses. The campuses appear to have homogeneous satisfaction distributions.