Safaa Dabagh

Practice Problems: Chi-Square Tests

20 Comprehensive Problems Covering All Three Chi-Square Tests

How to Use These Practice Problems

Part 1: Goodness of Fit Test (5 Problems)

Problem 1: Traffic Patterns

A city planner believes traffic accidents occur equally throughout the week. Data from 140 accidents shows:

Day Mon Tue Wed Thu Fri Sat Sun
Observed 18 16 22 21 28 19 16

Test at α = 0.05: Is there evidence that accidents are NOT equally distributed across days?

Solution:

Test: Goodness of Fit (one variable, testing equal distribution)

H₀: Accidents are equally distributed across all 7 days

Hₐ: Accidents are NOT equally distributed

Expected: E = 140/7 = 20 for each day

df: k - 1 = 7 - 1 = 6

Calculate χ²:

χ² = (18-20)²/20 + (16-20)²/20 + (22-20)²/20 + (21-20)²/20 + (28-20)²/20 + (19-20)²/20 + (16-20)²/20

χ² = 0.2 + 0.8 + 0.2 + 0.05 + 3.2 + 0.05 + 0.8 = 5.30

Critical value (df=6, α=0.05): 12.592

Decision: 5.30 < 12.592, fail to reject H₀

Conclusion: At the 0.05 significance level, there is insufficient evidence to conclude that accidents are not equally distributed across days of the week.

Problem 2: College Majors

A university's historical data shows the following distribution of majors: 30% STEM, 25% Business, 20% Humanities, 15% Social Sciences, 10% Arts. A random sample of 200 incoming freshmen shows:

Major STEM Business Humanities Social Sci Arts
Observed 72 48 32 28 20

Test at α = 0.01: Does this incoming class follow the historical distribution?

Solution:

Test: Goodness of Fit

H₀: The distribution matches historical percentages

Hₐ: The distribution does NOT match

Expected frequencies:

  • STEM: 200 × 0.30 = 60
  • Business: 200 × 0.25 = 50
  • Humanities: 200 × 0.20 = 40
  • Social Sciences: 200 × 0.15 = 30
  • Arts: 200 × 0.10 = 20

All E ≥ 5

df: 5 - 1 = 4

χ²: (72-60)²/60 + (48-50)²/50 + (32-40)²/40 + (28-30)²/30 + (20-20)²/20

= 2.4 + 0.08 + 1.6 + 0.133 + 0 = 4.213

Critical value (df=4, α=0.01): 13.277

Decision: 4.213 < 13.277, fail to reject H₀

Conclusion: At the 0.01 significance level, there is insufficient evidence that the incoming class distribution differs from the historical distribution.

Problem 3: Jury Selection

A county's population is 60% White, 25% Hispanic, 10% Black, and 5% Asian. A random jury pool of 120 people contains:

Ethnicity White Hispanic Black Asian
Observed 82 22 10 6

Test at α = 0.05: Does the jury pool match the county's demographic distribution?

Solution:

Expected: White: 72, Hispanic: 30, Black: 12, Asian: 6

df: 4 - 1 = 3

χ²: (82-72)²/72 + (22-30)²/30 + (10-12)²/12 + (6-6)²/6

= 1.389 + 2.133 + 0.333 + 0 = 3.855

Critical value (df=3, α=0.05): 7.815

Conclusion: Fail to reject H₀. The jury pool distribution is consistent with the county demographics.

Problem 4: Birth Months

A researcher wants to test if births are equally likely in each quarter. Out of 400 births:

Quarter Q1 (Jan-Mar) Q2 (Apr-Jun) Q3 (Jul-Sep) Q4 (Oct-Dec)
Observed 115 95 92 98

Test at α = 0.10: Are births equally distributed across quarters?

Solution:

Expected: 400/4 = 100 for each quarter

df: 4 - 1 = 3

χ²: (115-100)²/100 + (95-100)²/100 + (92-100)²/100 + (98-100)²/100

= 2.25 + 0.25 + 0.64 + 0.04 = 3.18

Critical value (df=3, α=0.10): 6.251

Conclusion: Fail to reject H₀. No evidence that births differ by quarter.

Problem 5: Lottery Numbers

A lottery uses digits 0-9. In 500 draws, the last digit frequencies are:

Digit 0 1 2 3 4 5 6 7 8 9
Obs 48 52 46 51 49 54 47 50 53 50

Test at α = 0.05: Is the lottery fair (all digits equally likely)?

Solution:

Expected: 500/10 = 50 for each digit

df: 10 - 1 = 9

χ²: Sum of (O-E)²/E for all digits = 1.12

Critical value (df=9, α=0.05): 16.919

Conclusion: Fail to reject H₀. The lottery appears fair.

Part 2: Test of Independence (6 Problems)

Problem 6: Exercise and Health

A health researcher surveys 300 adults about exercise habits and self-reported health:

Excellent Health Good Health Poor Health Total
Exercise Regularly 70 50 10 130
Don't Exercise 40 80 50 170
Total 110 130 60 300

Test at α = 0.01: Are exercise habits and health status independent?

Solution:

Test: Independence (one sample, two variables)

H₀: Exercise and health are independent

Hₐ: Exercise and health are associated

Expected frequencies:

  • Exercise & Excellent: (130×110)/300 = 47.67
  • Exercise & Good: (130×130)/300 = 56.33
  • Exercise & Poor: (130×60)/300 = 26
  • No Exercise & Excellent: (170×110)/300 = 62.33
  • No Exercise & Good: (170×130)/300 = 73.67
  • No Exercise & Poor: (170×60)/300 = 34

df: (2-1)(3-1) = 2

χ²: Sum of all (O-E)²/E = 34.48

Critical value (df=2, α=0.01): 9.210

Conclusion: REJECT H₀. There is sufficient evidence that exercise habits and health status are associated.

Problem 7: Smartphone Preference

Survey of 400 consumers about age and smartphone brand:

iPhone Android Other Total
18-34 90 55 15 160
35-54 60 70 10 140
55+ 30 55 15 100
Total 180 180 40 400

Test at α = 0.05: Are age and smartphone preference independent?

Solution:

df: (3-1)(3-1) = 4

Expected example: E(18-34 & iPhone) = (160×180)/400 = 72

Calculate all expected values, then χ² = 15.48

Critical value (df=4, α=0.05): 9.488

Conclusion: REJECT H₀. Age and smartphone preference are associated.

Problem 8: Education and Income (2×2 Table)

Random sample of 200 adults:

High Income Low Income Total
College Degree 70 50 120
No Degree 30 50 80
Total 100 100 200

Test at α = 0.05: Are education and income independent?

Solution:

df: (2-1)(2-1) = 1

Expected: All cells = (row total × col total)/200

College & High: 60, College & Low: 60, No Degree & High: 40, No Degree & Low: 40

χ²: (70-60)²/60 + (50-60)²/60 + (30-40)²/40 + (50-40)²/40 = 10

Critical value (df=1, α=0.05): 3.841

Conclusion: REJECT H₀. Education and income level are associated.

Problem 9: Voting and Party Affiliation

500 voters surveyed about party and whether they voted in the last election:

Voted Didn't Vote Total
Democrat 140 60 200
Republican 130 70 200
Independent 50 50 100
Total 320 180 500

Test at α = 0.05: Are party affiliation and voting behavior independent?

Solution:

df: (3-1)(2-1) = 2

Expected calculations yield χ² = 7.81

Critical value (df=2, α=0.05): 5.991

Conclusion: REJECT H₀. Party affiliation and voting behavior are associated.

Problem 10: Coffee and Productivity

250 employees surveyed:

High Productivity Medium Low Total
Drinks Coffee 55 70 25 150
No Coffee 35 45 20 100
Total 90 115 45 250

Test at α = 0.10: Are coffee consumption and productivity independent?

Solution:

df: (2-1)(3-1) = 2

χ² = 0.26 (very small)

Critical value (df=2, α=0.10): 4.605

Conclusion: Fail to reject H₀. No evidence that coffee and productivity are associated.

Problem 11: Social Media and Age

600 people surveyed:

Facebook Instagram TikTok Total
Under 30 60 100 90 250
30-50 110 70 20 200
Over 50 130 10 10 150
Total 300 180 120 600

Test at α = 0.01: Are age and social media platform independent?

Solution:

df: (3-1)(3-1) = 4

χ² ≈ 142.5 (very large!)

Critical value (df=4, α=0.01): 13.277

Conclusion: STRONGLY REJECT H₀. Age and social media platform are clearly associated.

Part 3: Test of Homogeneity (5 Problems)

Problem 12: Customer Satisfaction Across Stores

A company samples 100 customers from each of three stores:

Store Satisfied Neutral Unsatisfied Total
Store A 70 20 10 100
Store B 60 30 10 100
Store C 55 25 20 100
Total 185 75 40 300

Test at α = 0.05: Do the three stores have the same satisfaction distribution?

Solution:

Test: Homogeneity (multiple samples, one variable)

H₀: Satisfaction distribution is the same for all three stores

Hₐ: At least one store has a different distribution

df: (3-1)(3-1) = 4

χ² = 7.46

Critical value (df=4, α=0.05): 9.488

Conclusion: Fail to reject H₀. No evidence that satisfaction differs across stores.

Problem 13: Teaching Methods

Students randomly assigned to three methods (60 per group), then tested:

Method Pass Fail Total
Method A 48 12 60
Method B 52 8 60
Method C 40 20 60
Total 140 40 180

Test at α = 0.01: Do the methods produce different pass/fail rates?

Solution:

df: (3-1)(2-1) = 2

χ² = 6.43

Critical value (df=2, α=0.01): 9.210

Conclusion: Fail to reject H₀. No evidence methods produce different outcomes.

Problem 14: Regional Preferences

Sample 150 from each of four regions about product preference:

Region Product A Product B Product C Total
North 60 50 40 150
South 50 60 40 150
East 55 55 40 150
West 45 50 55 150
Total 210 215 175 600

Test at α = 0.05: Do regions have the same product preferences?

Solution:

df: (4-1)(3-1) = 6

χ² = 4.98

Critical value (df=6, α=0.05): 12.592

Conclusion: Fail to reject H₀. Regions appear to have homogeneous preferences.

Problem 15: Drug Trial Outcomes

200 patients per treatment group:

Treatment Improved No Change Worsened Total
Drug A 130 50 20 200
Drug B 110 70 20 200
Placebo 80 90 30 200
Total 320 210 70 600

Test at α = 0.01: Do the three treatments produce different outcome distributions?

Solution:

df: (3-1)(3-1) = 4

χ² = 19.05

Critical value (df=4, α=0.01): 13.277

Conclusion: REJECT H₀. The treatments produce different outcome distributions.

Problem 16: School Discipline Policies

Sample 120 students from each of two schools about discipline fairness:

School Fair Somewhat Fair Unfair Total
School 1 50 40 30 120
School 2 45 45 30 120
Total 95 85 60 240

Test at α = 0.05: Do the schools have the same fairness perception distribution?

Solution:

df: (2-1)(3-1) = 2

χ² = 0.51

Critical value (df=2, α=0.05): 5.991

Conclusion: Fail to reject H₀. Schools have similar fairness perception distributions.

Part 4: Choosing the Appropriate Test (4 Problems)

Problem 17: Identifying the Test

Scenario A: A researcher surveys 500 college students and records both their major (STEM, Humanities, Business) and their preferred study location (Library, Dorm, Coffee Shop).

Scenario B: A quality control manager samples 100 widgets from Factory 1, 100 from Factory 2, and 100 from Factory 3. Each widget is classified as Pass or Fail.

Scenario C: A casino rolls a die 300 times to test if all six faces are equally likely.

Question: Identify which chi-square test (Goodness of Fit, Independence, or Homogeneity) is appropriate for each scenario and explain why.

Solution:

Scenario A: Independence

  • One sample (500 students)
  • Two variables (major AND study location)
  • Question: Are major and study location associated?

Scenario B: Homogeneity

  • Three samples (100 from each factory)
  • One variable (pass/fail)
  • Question: Do factories have same pass/fail distribution?

Scenario C: Goodness of Fit

  • One sample (300 rolls)
  • One variable (die outcome)
  • Question: Does distribution match equal likelihood?

Problem 18: Checking Conditions

A researcher plans to test independence between gender and voting preference. They survey 80 people, resulting in this table:

Democrat Republican Other
Male O=25 O=12 O=3
Female O=28 O=10 O=2

Question: Calculate expected frequencies. Can the chi-square test be validly conducted? Why or why not?

Solution:

Row totals: Male: 40, Female: 40

Column totals: Democrat: 53, Republican: 22, Other: 5

Expected frequencies:

  • Male & Democrat: (40×53)/80 = 26.5
  • Male & Republican: (40×22)/80 = 11
  • Male & Other: (40×5)/80 = 2.5
  • Female & Democrat: 26.5
  • Female & Republican: 11
  • Female & Other: 2.5

Conclusion: NO, the test should NOT be conducted as is. Two cells have expected counts less than 5. Options: (1) Combine "Other" with another category, (2) Collect more data, or (3) Use Fisher's exact test.

Problem 19: Chi-Square vs. Other Tests

For each scenario, identify whether to use chi-square OR a different test:

A. Compare average test scores of students using three different study methods

B. Test if proportion of smokers differs between two cities

C. Determine if eye color and hair color are independent

D. Test if mean height differs between men and women

Solution:

A. ANOVA (comparing means of 3+ groups, quantitative data)

B. Two-proportion z-test (comparing proportions from two populations)

C. Chi-square test of independence (two categorical variables, one sample)

D. Two-sample t-test (comparing means of two groups, quantitative data)

Problem 20: Comprehensive Problem

A university administrator wants to know if student satisfaction with campus facilities is the same across three different campuses. They randomly survey 100 students from each campus:

Campus Very Satisfied Satisfied Dissatisfied Total
Main Campus 40 45 15 100
North Campus 35 50 15 100
South Campus 25 55 20 100
Total 100 150 50 300

Complete the following:

  1. Identify which chi-square test to use and explain why
  2. State the hypotheses
  3. Check all conditions
  4. Calculate the test statistic
  5. Find the critical value at α = 0.05
  6. Make a decision and state your conclusion

Complete Solution:

1. Test: Homogeneity

Reason: Three separate samples (one from each campus), one variable (satisfaction level)

2. Hypotheses:

  • H₀: The distribution of satisfaction is the same across all three campuses
  • Hₐ: At least one campus has a different satisfaction distribution

3. Conditions:

  • Random samples from each campus
  • Independent observations
  • All expected frequencies: All = (100×column total)/300, all ≥ 5

4. Test Statistic:

Expected for each campus: Very Satisfied = 33.33, Satisfied = 50, Dissatisfied = 16.67

χ² = Σ(O-E)²/E = 4.67

5. Critical Value:

df = (3-1)(3-1) = 4

Critical value (α=0.05, df=4) = 9.488

6. Decision and Conclusion:

Since 4.67 < 9.488, we fail to reject H₀.

Conclusion: At the 0.05 significance level, there is insufficient evidence to conclude that satisfaction levels differ across the three campuses. The campuses appear to have homogeneous satisfaction distributions.

Continue to Module Quiz →

← Back to Module 12 Overview