Practice Problems
20 comprehensive problems covering all two-sample hypothesis tests
About These Practice Problems
These 20 problems cover all concepts from Module 9. Problems are organized by topic with detailed solutions. Work through them systematically to build mastery!
- Part 1: Two-sample t-tests (independent) - Problems 1-5
- Part 2: Paired t-tests - Problems 6-10
- Part 3: Two-proportion tests - Problems 11-15
- Part 4: Choosing the right test - Problems 16-20
1Sleep and Test Performance
A researcher wants to test if students who get 8+ hours of sleep perform better on exams than students who get less than 8 hours. She randomly selects students and records:
- 8+ hours group: n₁ = 35, x̄₁ = 82, s₁ = 9
- < 8 hours group: n₂ = 40, x̄₂ = 76, s₂ = 11
Test at α = 0.05. Does sleep improve test performance?
Solution:
Step 1: Hypotheses
H₀: μ₁ = μ₂ (no difference)
Hₐ: μ₁ > μ₂ (8+ hours group scores higher) — Right-tailed test
Step 2: Check conditions
Independent random samples
Both n₁ ≥ 30 and n₂ ≥ 30 (CLT applies)
Step 3: Test statistic (unpooled)
t = (82 - 76) / √(9²/35 + 11²/40) = 6 / √(2.314 + 3.025) = 6 / 2.310 ≈ 2.60
Step 4: Degrees of freedom
Using Welch's approximation: df ≈ 72 (use technology)
Step 5: p-value
For t = 2.60, df = 72, right-tailed: p-value ≈ 0.006
Step 6: Decision
Since p-value (0.006) < α (0.05), reject H₀.
Conclusion: There is sufficient evidence to conclude that students who get 8+ hours of sleep score significantly higher on exams.
2Urban vs Rural Income
Is there a difference in average household income between urban and rural areas?
- Urban: n₁ = 50, x̄₁ = $68,000, s₁ = $15,000
- Rural: n₂ = 45, x̄₂ = $62,000, s₂ = $12,000
Test at α = 0.01 (two-tailed).
Solution:
Hypotheses: H₀: μ₁ = μ₂, Hₐ: μ₁ ≠ μ₂ (two-tailed)
Test statistic:
t = (68000 - 62000) / √(15000²/50 + 12000²/45) = 6000 / √(4500000 + 3200000) = 6000 / 2774.89 ≈ 2.16
Critical value: For α = 0.01 (two-tailed), df ≈ 88: t* ≈ ±2.63
Decision: Since |2.16| < 2.63, fail to reject H₀.
Conclusion: At the 0.01 significance level, there is insufficient evidence to conclude that average household incomes differ between urban and rural areas. The $6,000 difference could be due to sampling variability.
3Teaching Methods Comparison
Two teaching methods are compared. Method A is used with 25 students, Method B with 28 students. Assume equal variances.
- Method A: n₁ = 25, x̄₁ = 88, s₁ = 7
- Method B: n₂ = 28, x̄₂ = 84, s₂ = 8
Use pooled variance approach. Test at α = 0.05 if Method A is better.
Solution:
Hypotheses: H₀: μ₁ = μ₂, Hₐ: μ₁ > μ₂ (right-tailed)
Pooled variance:
sp² = [(25-1)(7²) + (28-1)(8²)] / (25+28-2) = [1176 + 1728] / 51 = 56.94
sp = 7.55
Test statistic:
t = (88 - 84) / (7.55√(1/25 + 1/28)) = 4 / (7.55 × 0.275) = 4 / 2.076 ≈ 1.93
Degrees of freedom: df = 25 + 28 - 2 = 51
Critical value: For α = 0.05 (right-tailed), df = 51: t* ≈ 1.675
Decision: Since 1.93 > 1.675, reject H₀.
Conclusion: There is sufficient evidence that Method A produces significantly higher scores than Method B.
4Drug Side Effects
Two drugs are compared for a side effect (headache duration in hours).
- Drug X: n₁ = 30, x̄₁ = 4.2 hours, s₁ = 1.5
- Drug Y: n₂ = 35, x̄₂ = 3.8 hours, s₂ = 1.8
Is there a significant difference at α = 0.10?
Solution:
Hypotheses: H₀: μ₁ = μ₂, Hₐ: μ₁ ≠ μ₂ (two-tailed)
Test statistic:
t = (4.2 - 3.8) / √(1.5²/30 + 1.8²/35) = 0.4 / √(0.075 + 0.0926) = 0.4 / 0.410 ≈ 0.98
p-value: For t = 0.98, df ≈ 62, two-tailed: p-value ≈ 0.33
Decision: Since p-value (0.33) > α (0.10), fail to reject H₀.
Conclusion: There is no significant difference in headache duration between the two drugs.
5Confidence Interval for Difference
Using the data from Problem 1 (8+ hours: n₁=35, x̄₁=82, s₁=9; <8 hours: n₂=40, x̄₂=76, s₂=11), construct a 95% confidence interval for μ₁ - μ₂.
Solution:
Formula: (x̄₁ - x̄₂) ± t* × √(s₁²/n₁ + s₂²/n₂)
Standard error:
SE = √(81/35 + 121/40) = √(2.314 + 3.025) = 2.310
Critical value: For 95% CI, df ≈ 72: t* ≈ 1.993
Confidence interval:
(82 - 76) ± 1.993 × 2.310 = 6 ± 4.604 = (1.396, 10.604)
Interpretation: We are 95% confident that students who get 8+ hours of sleep score between 1.4 and 10.6 points higher on exams than students who get less sleep. Since the interval doesn't contain 0, there is a significant difference.
6Weight Loss Program
Six people participate in a weight loss program. Their weights (in pounds) are recorded before and after:
| Person | Before | After | d = Before - After |
|---|---|---|---|
| 1 | 180 | 175 | 5 |
| 2 | 195 | 188 | 7 |
| 3 | 210 | 205 | 5 |
| 4 | 165 | 162 | 3 |
| 5 | 188 | 180 | 8 |
| 6 | 202 | 196 | 6 |
Test at α = 0.05 if the program results in significant weight loss.
Solution:
Step 1: Calculate d̄ and sd
d̄ = (5+7+5+3+8+6) / 6 = 34/6 = 5.667 pounds
Squared deviations: (5-5.667)²=0.444, (7-5.667)²=1.778, etc.
sd = √[Σ(d-d̄)²/(n-1)] = √[13.333/5] = 1.633
Step 2: Hypotheses
H₀: μd = 0, Hₐ: μd > 0 (right-tailed, expect weight loss)
Step 3: Test statistic
t = (5.667 - 0) / (1.633/√6) = 5.667 / 0.667 ≈ 8.50
Step 4: Critical value
df = 6 - 1 = 5, α = 0.05 (right-tailed): t* ≈ 2.015
Step 5: Decision
Since 8.50 > 2.015, reject H₀. p-value < 0.001
Conclusion: The weight loss program results in significant weight loss (average 5.67 pounds).
7Blood Pressure Medication
A blood pressure medication is tested on 10 patients. Systolic BP is measured before and after treatment:
- Mean difference (Before - After): d̄ = 8.5 mmHg
- Standard deviation of differences: sd = 6.2 mmHg
- n = 10 patients
Test at α = 0.01 if the medication reduces blood pressure.
Solution:
Hypotheses: H₀: μd = 0, Hₐ: μd > 0 (right-tailed)
Test statistic:
t = (8.5 - 0) / (6.2/√10) = 8.5 / 1.960 ≈ 4.34
Degrees of freedom: df = 10 - 1 = 9
Critical value: For α = 0.01 (right-tailed), df = 9: t* ≈ 2.821
Decision: Since 4.34 > 2.821, reject H₀.
Conclusion: At the 0.01 significance level, there is strong evidence that the medication significantly reduces blood pressure by an average of 8.5 mmHg.
8Tutoring Effectiveness
Twelve students take a pre-test before tutoring and a post-test after:
- Mean difference (Post - Pre): d̄ = 12.5 points
- Standard deviation: sd = 8.4 points
Construct a 95% confidence interval for the mean improvement.
Solution:
Formula: d̄ ± t* × (sd/√n)
Critical value: df = 12 - 1 = 11, 95% CI: t* ≈ 2.201
Margin of error:
ME = 2.201 × (8.4/√12) = 2.201 × 2.425 = 5.337
Confidence interval:
12.5 ± 5.337 = (7.16, 17.84) points
Interpretation: We are 95% confident that tutoring improves test scores by an average of 7.16 to 17.84 points. Since the interval doesn't contain 0, tutoring significantly improves scores.
9Reaction Time Study
Eight subjects' reaction times (in milliseconds) are measured on their dominant and non-dominant hands:
| Subject | Dominant | Non-Dominant |
|---|---|---|
| 1 | 285 | 310 |
| 2 | 290 | 305 |
| 3 | 275 | 295 |
| 4 | 300 | 320 |
| 5 | 280 | 300 |
| 6 | 295 | 315 |
| 7 | 270 | 285 |
| 8 | 288 | 308 |
Test if there's a significant difference at α = 0.05.
Solution:
Calculate differences (Dominant - Non-Dominant):
d: -25, -15, -20, -20, -20, -20, -15, -20
Statistics:
d̄ = -155/8 = -19.375 ms
sd ≈ 3.204 ms
Hypotheses: H₀: μd = 0, Hₐ: μd ≠ 0 (two-tailed)
Test statistic:
t = -19.375 / (3.204/√8) = -19.375 / 1.133 ≈ -17.10
Decision: df = 7, |t| = 17.10 >> critical value. Reject H₀ (p < 0.001).
Conclusion: There is overwhelming evidence that reaction time is significantly faster on the dominant hand (by about 19 ms on average).
10Identify the Design
For each scenario, state whether you should use a paired test or independent test:
- Comparing anxiety levels before and after therapy for 20 patients
- Comparing average salaries of teachers vs. nurses (different people)
- Testing if identical twins differ in IQ (one twin raised in each environment)
- Comparing recovery times for patients receiving Drug A vs. Drug B (random assignment)
Solution:
(a) Paired test - Same 20 patients measured twice (before/after)
(b) Independent test - Two separate groups (teachers vs. nurses)
(c) Paired test - Matched pairs design (twins matched)
(d) Independent test - Two separate groups of patients
11Drug Cure Rates
Two drugs are compared:
- Drug A: 85 out of 150 patients cured
- Drug B: 92 out of 180 patients cured
Test at α = 0.05 if the cure rates differ.
Solution:
Sample proportions:
p̂₁ = 85/150 = 0.567, p̂₂ = 92/180 = 0.511
Check conditions:
n₁p̂₁ = 85 ≥ 10, n₁(1-p̂₁) = 65 ≥ 10
n₂p̂₂ = 92 ≥ 10, n₂(1-p̂₂) = 88 ≥ 10
Hypotheses: H₀: p₁ = p₂, Hₐ: p₁ ≠ p₂ (two-tailed)
Pooled proportion:
p̄ = (85+92)/(150+180) = 177/330 = 0.536
Test statistic:
z = (0.567-0.511) / √[0.536×0.464×(1/150+1/180)]
z = 0.056 / √[0.249×0.01111] = 0.056 / 0.0527 ≈ 1.06
Decision: For α = 0.05 (two-tailed), z* = ±1.96. Since |1.06| < 1.96, fail to reject H₀.
Conclusion: There is insufficient evidence to conclude the cure rates differ between the two drugs.
12Gender and Policy Support
A survey asks about support for a new policy:
- Men: 240 out of 400 support (60%)
- Women: 300 out of 450 support (66.7%)
Is there a significant difference at α = 0.01?
Solution:
Hypotheses: H₀: p₁ = p₂, Hₐ: p₁ ≠ p₂
Pooled proportion:
p̄ = (240+300)/(400+450) = 540/850 = 0.635
Test statistic:
z = (0.60-0.667) / √[0.635×0.365×(1/400+1/450)]
z = -0.067 / 0.0331 ≈ -2.02
Critical value: α = 0.01 (two-tailed): z* = ±2.576
Decision: Since |-2.02| < 2.576, fail to reject H₀.
Conclusion: At the 0.01 level, there is insufficient evidence to conclude men and women differ in their support for the policy.
13Online vs In-Person Pass Rates
Compare pass rates for online vs. in-person classes:
- Online: 78 out of 120 students pass
- In-person: 95 out of 130 students pass
Construct a 95% confidence interval for the difference in pass rates.
Solution:
Sample proportions:
p̂₁ = 78/120 = 0.65, p̂₂ = 95/130 = 0.731
Formula (NO pooling for CI):
(p̂₁ - p̂₂) ± z* × √[(p̂₁(1-p̂₁)/n₁) + (p̂₂(1-p̂₂)/n₂)]
Standard error:
SE = √[(0.65×0.35/120) + (0.731×0.269/130)]
SE = √[0.001896 + 0.001512] = 0.0584
95% CI: z* = 1.96
(0.65 - 0.731) ± 1.96 × 0.0584
-0.081 ± 0.114 = (-0.195, 0.033)
Interpretation: We're 95% confident the difference in pass rates is between -19.5% and +3.3%. Since the interval contains 0, there's no significant difference.
14Quality Control Comparison
Two factories' defect rates are compared:
- Factory 1: 18 defective out of 250 items (7.2%)
- Factory 2: 25 defective out of 300 items (8.3%)
Test at α = 0.10 if Factory 2 has a higher defect rate (one-tailed).
Solution:
Hypotheses: H₀: p₂ = p₁, Hₐ: p₂ > p₁ (right-tailed)
Pooled proportion:
p̄ = (18+25)/(250+300) = 43/550 = 0.0782
Test statistic:
z = (0.083-0.072) / √[0.0782×0.9218×(1/250+1/300)]
z = 0.011 / 0.0229 ≈ 0.48
Critical value: α = 0.10 (right-tailed): z* = 1.28
Decision: Since 0.48 < 1.28, fail to reject H₀.
Conclusion: There is insufficient evidence that Factory 2 has a higher defect rate.
15Checking Conditions
Can you conduct a two-proportion z-test for these scenarios?
- n₁ = 50 with 8 successes, n₂ = 60 with 45 successes
- n₁ = 150 with 120 successes, n₂ = 200 with 30 successes
- n₁ = 100 with 55 successes, n₂ = 90 with 40 successes
Solution:
(a) NO - n₁p̂₁ = 8 < 10. Fails success-failure condition.
(b) NO - n₂p̂₂ = 30 and n₂(1-p̂₂) = 170, but check n₁: p̂₁ = 0.8, so n₁(1-p̂₁) = 30 ≥ 10. Actually this one works! Both conditions met.
Correction (b) YES - All conditions satisfied.
(c) YES - n₁p̂₁ = 55 ≥ 10, n₁(1-p̂₁) = 45 ≥ 10, n₂p̂₂ = 40 ≥ 10, n₂(1-p̂₂) = 50 ≥ 10. All conditions met.
16Test Selection Practice
For each scenario, identify which hypothesis test to use:
- A researcher compares average commute times in City A (n=50) vs. City B (n=60).
- A company claims 90% customer satisfaction. You survey 200 customers to test this.
- Nurses measure patients' pain levels before and after a treatment (same 30 patients).
- Compare proportion of voters supporting Candidate X in Texas vs. California.
Solution:
(a) Independent two-sample t-test - Two separate cities (independent), testing means (commute times)
(b) One-sample z-test for proportion - One sample, testing proportion against claimed 90%
(c) Paired t-test - Same 30 patients measured twice (before/after), testing means (pain levels)
(d) Two-sample z-test for proportions - Two states (independent), testing proportions (voter support)
17Independent or Paired?
Determine if each scenario requires independent or paired test:
- Compare average test scores of 40 students using Method A vs. 35 different students using Method B
- Measure blood sugar levels in 25 diabetic patients before and after a diet change
- Compare average heights of 50 adult men vs. 50 adult women
- Test reading comprehension in 20 children at age 5 and again at age 7
Solution:
(a) Independent - Different students in each group
(b) Paired - Same 25 patients measured twice
(c) Independent - Different people (men vs. women)
(d) Paired - Same 20 children measured twice (at different ages)
18Complete Analysis
A fitness instructor wants to test if a new workout program improves mile run times. She records times for 15 participants before and after the 8-week program:
- Mean difference (Before - After): d̄ = 1.2 minutes
- Standard deviation: sd = 0.8 minutes
a) Which test should be used?
b) Test at α = 0.05 if the program improves times.
c) Construct a 90% confidence interval.
Solution:
(a) Paired t-test - Same 15 participants measured twice
(b) Hypothesis test:
H₀: μd = 0, Hₐ: μd > 0 (improvement means positive difference)
t = 1.2 / (0.8/√15) = 1.2 / 0.2066 = 5.81
df = 14, critical value ≈ 1.761. Since 5.81 > 1.761, reject H₀.
Conclusion: The program significantly improves run times (p < 0.001).
(c) 90% CI: t* ≈ 1.761 for df = 14
1.2 ± 1.761 × 0.2066 = 1.2 ± 0.364 = (0.836, 1.564) minutes
We're 90% confident the program improves times by 0.84 to 1.56 minutes on average.
19Mixed Practice
Identify the test AND the hypotheses for each:
- Test if more than 70% of college students have part-time jobs (sample: 250 students)
- Compare average GPAs of athletes vs. non-athletes at a university
- Test if a meditation app reduces stress scores (measure same 40 people before/after)
Solution:
(a) One-sample z-test for proportion
H₀: p = 0.70, Hₐ: p > 0.70 (right-tailed)
(b) Independent two-sample t-test
H₀: μ₁ = μ₂, Hₐ: μ₁ ≠ μ₂ (two-tailed, unless direction specified)
(c) Paired t-test
H₀: μd = 0, Hₐ: μd > 0 (right-tailed, if d = Before - After, expecting reduction)
20Critical Thinking Challenge
A researcher wants to test if a new teaching method improves test scores. She has two options:
Design A: Randomly assign 50 students to new method, 50 to traditional method. Compare final exam scores.
Design B: Give all 50 students a pre-test, teach using new method, then give post-test. Compare before/after scores.
a) Which test would be used for each design?
b) Which design is more powerful (better at detecting real effects)?
c) What are the tradeoffs?
Solution:
(a) Tests:
- Design A: Independent two-sample t-test (two separate groups)
- Design B: Paired t-test (same students, before/after)
(b) More Powerful: Design B (paired)
Paired designs control for individual variability. Each student serves as their own control, eliminating noise from differing baseline abilities.
(c) Tradeoffs:
Design A Advantages:
- Can directly compare two methods simultaneously
- No practice effects from taking test twice
Design A Disadvantages:
- Needs more subjects for same power
- Individual differences add noise
Design B Advantages:
- More powerful (controls individual variability)
- Needs fewer subjects
Design B Disadvantages:
- No comparison group (can't isolate method effect from practice effect)
- Students might improve just from test practice
- Can't tell if new method is better than traditional
Best approach: Use Design A if you want to compare two methods. Use Design B only if you also have a control group taking the same pre/post tests with traditional method!