Practice Problems - Module 12: Chi-Square Tests

Problem 1: Traffic Patterns

A city planner believes traffic accidents occur equally throughout the week. Data from 140 accidents shows:

Day	Mon	Tue	Wed	Thu	Fri	Sat	Sun
Observed	18	16	22	21	28	19	16

Test at α = 0.05: Is there evidence that accidents are NOT equally distributed across days?

Solution:

Test: Goodness of Fit (one variable, testing equal distribution)

H₀: Accidents are equally distributed across all 7 days

Hₐ: Accidents are NOT equally distributed

Expected: E = 140/7 = 20 for each day

df: k - 1 = 7 - 1 = 6

Calculate χ²:

χ² = (18-20)²/20 + (16-20)²/20 + (22-20)²/20 + (21-20)²/20 + (28-20)²/20 + (19-20)²/20 + (16-20)²/20

χ² = 0.2 + 0.8 + 0.2 + 0.05 + 3.2 + 0.05 + 0.8 = 5.30

Critical value (df=6, α=0.05): 12.592

Decision: 5.30 < 12.592, fail to reject H₀

Conclusion: At the 0.05 significance level, there is insufficient evidence to conclude that accidents are not equally distributed across days of the week.

Problem 2: College Majors

A university's historical data shows the following distribution of majors: 30% STEM, 25% Business, 20% Humanities, 15% Social Sciences, 10% Arts. A random sample of 200 incoming freshmen shows:

Major	STEM	Business	Humanities	Social Sci	Arts
Observed	72	48	32	28	20

Test at α = 0.01: Does this incoming class follow the historical distribution?

Solution:

Test: Goodness of Fit

H₀: The distribution matches historical percentages

Hₐ: The distribution does NOT match

Expected frequencies:

STEM: 200 × 0.30 = 60
Business: 200 × 0.25 = 50
Humanities: 200 × 0.20 = 40
Social Sciences: 200 × 0.15 = 30
Arts: 200 × 0.10 = 20

All E ≥ 5

df: 5 - 1 = 4

χ²: (72-60)²/60 + (48-50)²/50 + (32-40)²/40 + (28-30)²/30 + (20-20)²/20

= 2.4 + 0.08 + 1.6 + 0.133 + 0 = 4.213

Critical value (df=4, α=0.01): 13.277

Decision: 4.213 < 13.277, fail to reject H₀

Conclusion: At the 0.01 significance level, there is insufficient evidence that the incoming class distribution differs from the historical distribution.

Problem 3: Jury Selection

A county's population is 60% White, 25% Hispanic, 10% Black, and 5% Asian. A random jury pool of 120 people contains:

Ethnicity	White	Hispanic	Black	Asian
Observed	82	22	10	6

Test at α = 0.05: Does the jury pool match the county's demographic distribution?

Solution:

Expected: White: 72, Hispanic: 30, Black: 12, Asian: 6

df: 4 - 1 = 3

χ²: (82-72)²/72 + (22-30)²/30 + (10-12)²/12 + (6-6)²/6

= 1.389 + 2.133 + 0.333 + 0 = 3.855

Critical value (df=3, α=0.05): 7.815

Conclusion: Fail to reject H₀. The jury pool distribution is consistent with the county demographics.

Problem 4: Birth Months

A researcher wants to test if births are equally likely in each quarter. Out of 400 births:

Quarter	Q1 (Jan-Mar)	Q2 (Apr-Jun)	Q3 (Jul-Sep)	Q4 (Oct-Dec)
Observed	115	95	92	98

Test at α = 0.10: Are births equally distributed across quarters?

Solution:

Expected: 400/4 = 100 for each quarter

df: 4 - 1 = 3

χ²: (115-100)²/100 + (95-100)²/100 + (92-100)²/100 + (98-100)²/100

= 2.25 + 0.25 + 0.64 + 0.04 = 3.18

Critical value (df=3, α=0.10): 6.251

Conclusion: Fail to reject H₀. No evidence that births differ by quarter.

Problem 5: Lottery Numbers

A lottery uses digits 0-9. In 500 draws, the last digit frequencies are:

Digit	0	1	2	3	4	5	6	7	8	9
Obs	48	52	46	51	49	54	47	50	53	50

Test at α = 0.05: Is the lottery fair (all digits equally likely)?

Solution:

Expected: 500/10 = 50 for each digit

df: 10 - 1 = 9

χ²: Sum of (O-E)²/E for all digits = 1.12

Critical value (df=9, α=0.05): 16.919

Conclusion: Fail to reject H₀. The lottery appears fair.

Problem 6: Exercise and Health

A health researcher surveys 300 adults about exercise habits and self-reported health:

	Excellent Health	Good Health	Poor Health	Total
Exercise Regularly	70	50	10	130
Don't Exercise	40	80	50	170
Total	110	130	60	300

Test at α = 0.01: Are exercise habits and health status independent?

Solution:

Test: Independence (one sample, two variables)

H₀: Exercise and health are independent

Hₐ: Exercise and health are associated

Expected frequencies:

Exercise & Excellent: (130×110)/300 = 47.67
Exercise & Good: (130×130)/300 = 56.33
Exercise & Poor: (130×60)/300 = 26
No Exercise & Excellent: (170×110)/300 = 62.33
No Exercise & Good: (170×130)/300 = 73.67
No Exercise & Poor: (170×60)/300 = 34

df: (2-1)(3-1) = 2

χ²: Sum of all (O-E)²/E = 34.48

Critical value (df=2, α=0.01): 9.210

Conclusion: REJECT H₀. There is sufficient evidence that exercise habits and health status are associated.

Problem 7: Smartphone Preference

Survey of 400 consumers about age and smartphone brand:

	iPhone	Android	Other	Total
18-34	90	55	15	160
35-54	60	70	10	140
55+	30	55	15	100
Total	180	180	40	400

Test at α = 0.05: Are age and smartphone preference independent?

Solution:

df: (3-1)(3-1) = 4

Expected example: E(18-34 & iPhone) = (160×180)/400 = 72

Calculate all expected values, then χ² = 15.48

Critical value (df=4, α=0.05): 9.488

Conclusion: REJECT H₀. Age and smartphone preference are associated.

Problem 8: Education and Income (2×2 Table)

Random sample of 200 adults:

	High Income	Low Income	Total
College Degree	70	50	120
No Degree	30	50	80
Total	100	100	200

Test at α = 0.05: Are education and income independent?

Solution:

df: (2-1)(2-1) = 1

Expected: All cells = (row total × col total)/200

College & High: 60, College & Low: 60, No Degree & High: 40, No Degree & Low: 40

χ²: (70-60)²/60 + (50-60)²/60 + (30-40)²/40 + (50-40)²/40 = 10

Critical value (df=1, α=0.05): 3.841

Conclusion: REJECT H₀. Education and income level are associated.

Problem 9: Voting and Party Affiliation

500 voters surveyed about party and whether they voted in the last election:

	Voted	Didn't Vote	Total
Democrat	140	60	200
Republican	130	70	200
Independent	50	50	100
Total	320	180	500

Test at α = 0.05: Are party affiliation and voting behavior independent?

Solution:

df: (3-1)(2-1) = 2

Expected calculations yield χ² = 7.81

Critical value (df=2, α=0.05): 5.991

Conclusion: REJECT H₀. Party affiliation and voting behavior are associated.

Problem 10: Coffee and Productivity

250 employees surveyed:

	High Productivity	Medium	Low	Total
Drinks Coffee	55	70	25	150
No Coffee	35	45	20	100
Total	90	115	45	250

Test at α = 0.10: Are coffee consumption and productivity independent?

Solution:

df: (2-1)(3-1) = 2

χ² = 0.26 (very small)

Critical value (df=2, α=0.10): 4.605

Conclusion: Fail to reject H₀. No evidence that coffee and productivity are associated.

Problem 11: Social Media and Age

600 people surveyed:

	Facebook	Instagram	TikTok	Total
Under 30	60	100	90	250
30-50	110	70	20	200
Over 50	130	10	10	150
Total	300	180	120	600

Test at α = 0.01: Are age and social media platform independent?

Solution:

df: (3-1)(3-1) = 4

χ² ≈ 142.5 (very large!)

Critical value (df=4, α=0.01): 13.277

Conclusion: STRONGLY REJECT H₀. Age and social media platform are clearly associated.

Problem 12: Customer Satisfaction Across Stores

A company samples 100 customers from each of three stores:

Store	Satisfied	Neutral	Unsatisfied	Total
Store A	70	20	10	100
Store B	60	30	10	100
Store C	55	25	20	100
Total	185	75	40	300

Test at α = 0.05: Do the three stores have the same satisfaction distribution?

Solution:

Test: Homogeneity (multiple samples, one variable)

H₀: Satisfaction distribution is the same for all three stores

Hₐ: At least one store has a different distribution

df: (3-1)(3-1) = 4

χ² = 7.46

Critical value (df=4, α=0.05): 9.488

Conclusion: Fail to reject H₀. No evidence that satisfaction differs across stores.

Problem 13: Teaching Methods

Students randomly assigned to three methods (60 per group), then tested:

Method	Pass	Fail	Total
Method A	48	12	60
Method B	52	8	60
Method C	40	20	60
Total	140	40	180

Test at α = 0.01: Do the methods produce different pass/fail rates?

Solution:

df: (3-1)(2-1) = 2

χ² = 6.43

Critical value (df=2, α=0.01): 9.210

Conclusion: Fail to reject H₀. No evidence methods produce different outcomes.

Problem 14: Regional Preferences

Sample 150 from each of four regions about product preference:

Region	Product A	Product B	Product C	Total
North	60	50	40	150
South	50	60	40	150
East	55	55	40	150
West	45	50	55	150
Total	210	215	175	600

Test at α = 0.05: Do regions have the same product preferences?

Solution:

df: (4-1)(3-1) = 6

χ² = 4.98

Critical value (df=6, α=0.05): 12.592

Conclusion: Fail to reject H₀. Regions appear to have homogeneous preferences.

Problem 15: Drug Trial Outcomes

200 patients per treatment group:

Treatment	Improved	No Change	Worsened	Total
Drug A	130	50	20	200
Drug B	110	70	20	200
Placebo	80	90	30	200
Total	320	210	70	600

Test at α = 0.01: Do the three treatments produce different outcome distributions?

Solution:

df: (3-1)(3-1) = 4

χ² = 19.05

Critical value (df=4, α=0.01): 13.277

Conclusion: REJECT H₀. The treatments produce different outcome distributions.

Problem 16: School Discipline Policies

Sample 120 students from each of two schools about discipline fairness:

School	Fair	Somewhat Fair	Unfair	Total
School 1	50	40	30	120
School 2	45	45	30	120
Total	95	85	60	240

Test at α = 0.05: Do the schools have the same fairness perception distribution?

Solution:

df: (2-1)(3-1) = 2

χ² = 0.51

Critical value (df=2, α=0.05): 5.991

Conclusion: Fail to reject H₀. Schools have similar fairness perception distributions.

Problem 17: Identifying the Test

Scenario A: A researcher surveys 500 college students and records both their major (STEM, Humanities, Business) and their preferred study location (Library, Dorm, Coffee Shop).

Scenario B: A quality control manager samples 100 widgets from Factory 1, 100 from Factory 2, and 100 from Factory 3. Each widget is classified as Pass or Fail.

Scenario C: A casino rolls a die 300 times to test if all six faces are equally likely.

Question: Identify which chi-square test (Goodness of Fit, Independence, or Homogeneity) is appropriate for each scenario and explain why.

Solution:

Scenario A: Independence

One sample (500 students)
Two variables (major AND study location)
Question: Are major and study location associated?

Scenario B: Homogeneity

Three samples (100 from each factory)
One variable (pass/fail)
Question: Do factories have same pass/fail distribution?

Scenario C: Goodness of Fit

One sample (300 rolls)
One variable (die outcome)
Question: Does distribution match equal likelihood?

Problem 18: Checking Conditions

A researcher plans to test independence between gender and voting preference. They survey 80 people, resulting in this table:

	Democrat	Republican	Other
Male	O=25	O=12	O=3
Female	O=28	O=10	O=2

Question: Calculate expected frequencies. Can the chi-square test be validly conducted? Why or why not?

Solution:

Row totals: Male: 40, Female: 40

Column totals: Democrat: 53, Republican: 22, Other: 5

Expected frequencies:

Male & Democrat: (40×53)/80 = 26.5
Male & Republican: (40×22)/80 = 11
Male & Other: (40×5)/80 = 2.5
Female & Democrat: 26.5
Female & Republican: 11
Female & Other: 2.5

Conclusion: NO, the test should NOT be conducted as is. Two cells have expected counts less than 5. Options: (1) Combine "Other" with another category, (2) Collect more data, or (3) Use Fisher's exact test.

Problem 19: Chi-Square vs. Other Tests

For each scenario, identify whether to use chi-square OR a different test:

A. Compare average test scores of students using three different study methods

B. Test if proportion of smokers differs between two cities

C. Determine if eye color and hair color are independent

D. Test if mean height differs between men and women

Solution:

A. ANOVA (comparing means of 3+ groups, quantitative data)

B. Two-proportion z-test (comparing proportions from two populations)

C. Chi-square test of independence (two categorical variables, one sample)

D. Two-sample t-test (comparing means of two groups, quantitative data)

Problem 20: Comprehensive Problem

A university administrator wants to know if student satisfaction with campus facilities is the same across three different campuses. They randomly survey 100 students from each campus:

Campus	Very Satisfied	Satisfied	Dissatisfied	Total
Main Campus	40	45	15	100
North Campus	35	50	15	100
South Campus	25	55	20	100
Total	100	150	50	300

Complete the following:

Identify which chi-square test to use and explain why
State the hypotheses
Check all conditions
Calculate the test statistic
Find the critical value at α = 0.05
Make a decision and state your conclusion

Complete Solution:

1. Test: Homogeneity

Reason: Three separate samples (one from each campus), one variable (satisfaction level)

2. Hypotheses:

H₀: The distribution of satisfaction is the same across all three campuses
Hₐ: At least one campus has a different satisfaction distribution

3. Conditions:

Random samples from each campus
Independent observations
All expected frequencies: All = (100×column total)/300, all ≥ 5

4. Test Statistic:

Expected for each campus: Very Satisfied = 33.33, Satisfied = 50, Dissatisfied = 16.67

χ² = Σ(O-E)²/E = 4.67

5. Critical Value:

df = (3-1)(3-1) = 4

Critical value (α=0.05, df=4) = 9.488

6. Decision and Conclusion:

Since 4.67 < 9.488, we fail to reject H₀.

Conclusion: At the 0.05 significance level, there is insufficient evidence to conclude that satisfaction levels differ across the three campuses. The campuses appear to have homogeneous satisfaction distributions.

Practice Problems: Chi-Square Tests

How to Use These Practice Problems

Part 1: Goodness of Fit Test (5 Problems)

Problem 1: Traffic Patterns

Solution:

Problem 2: College Majors

Solution:

Problem 3: Jury Selection

Solution:

Problem 4: Birth Months

Solution:

Problem 5: Lottery Numbers

Solution:

Part 2: Test of Independence (6 Problems)

Problem 6: Exercise and Health

Solution:

Problem 7: Smartphone Preference

Solution:

Problem 8: Education and Income (2×2 Table)

Solution:

Problem 9: Voting and Party Affiliation

Solution:

Problem 10: Coffee and Productivity

Solution:

Problem 11: Social Media and Age

Solution:

Part 3: Test of Homogeneity (5 Problems)

Problem 12: Customer Satisfaction Across Stores

Solution:

Problem 13: Teaching Methods

Solution:

Problem 14: Regional Preferences

Solution:

Problem 15: Drug Trial Outcomes

Solution:

Problem 16: School Discipline Policies

Solution:

Part 4: Choosing the Appropriate Test (4 Problems)

Problem 17: Identifying the Test

Solution:

Problem 18: Checking Conditions

Solution:

Problem 19: Chi-Square vs. Other Tests

Solution:

Problem 20: Comprehensive Problem

Complete Solution: