Learn Without Walls

Practice Problems

20 comprehensive problems covering all two-sample hypothesis tests

About These Practice Problems

These 20 problems cover all concepts from Module 9. Problems are organized by topic with detailed solutions. Work through them systematically to build mastery!

Part 1: Independent Two-Sample t-Tests (Problems 1-5)

1Sleep and Test Performance

A researcher wants to test if students who get 8+ hours of sleep perform better on exams than students who get less than 8 hours. She randomly selects students and records:

Test at α = 0.05. Does sleep improve test performance?

Solution:

Step 1: Hypotheses

H₀: μ₁ = μ₂ (no difference)
Hₐ: μ₁ > μ₂ (8+ hours group scores higher) — Right-tailed test

Step 2: Check conditions

Independent random samples
Both n₁ ≥ 30 and n₂ ≥ 30 (CLT applies)

Step 3: Test statistic (unpooled)

t = (82 - 76) / √(9²/35 + 11²/40) = 6 / √(2.314 + 3.025) = 6 / 2.310 ≈ 2.60

Step 4: Degrees of freedom

Using Welch's approximation: df ≈ 72 (use technology)

Step 5: p-value

For t = 2.60, df = 72, right-tailed: p-value ≈ 0.006

Step 6: Decision

Since p-value (0.006) < α (0.05), reject H₀.

Conclusion: There is sufficient evidence to conclude that students who get 8+ hours of sleep score significantly higher on exams.

2Urban vs Rural Income

Is there a difference in average household income between urban and rural areas?

Test at α = 0.01 (two-tailed).

Solution:

Hypotheses: H₀: μ₁ = μ₂, Hₐ: μ₁ ≠ μ₂ (two-tailed)

Test statistic:

t = (68000 - 62000) / √(15000²/50 + 12000²/45) = 6000 / √(4500000 + 3200000) = 6000 / 2774.89 ≈ 2.16

Critical value: For α = 0.01 (two-tailed), df ≈ 88: t* ≈ ±2.63

Decision: Since |2.16| < 2.63, fail to reject H₀.

Conclusion: At the 0.01 significance level, there is insufficient evidence to conclude that average household incomes differ between urban and rural areas. The $6,000 difference could be due to sampling variability.

3Teaching Methods Comparison

Two teaching methods are compared. Method A is used with 25 students, Method B with 28 students. Assume equal variances.

Use pooled variance approach. Test at α = 0.05 if Method A is better.

Solution:

Hypotheses: H₀: μ₁ = μ₂, Hₐ: μ₁ > μ₂ (right-tailed)

Pooled variance:

sp² = [(25-1)(7²) + (28-1)(8²)] / (25+28-2) = [1176 + 1728] / 51 = 56.94

sp = 7.55

Test statistic:

t = (88 - 84) / (7.55√(1/25 + 1/28)) = 4 / (7.55 × 0.275) = 4 / 2.076 ≈ 1.93

Degrees of freedom: df = 25 + 28 - 2 = 51

Critical value: For α = 0.05 (right-tailed), df = 51: t* ≈ 1.675

Decision: Since 1.93 > 1.675, reject H₀.

Conclusion: There is sufficient evidence that Method A produces significantly higher scores than Method B.

4Drug Side Effects

Two drugs are compared for a side effect (headache duration in hours).

Is there a significant difference at α = 0.10?

Solution:

Hypotheses: H₀: μ₁ = μ₂, Hₐ: μ₁ ≠ μ₂ (two-tailed)

Test statistic:

t = (4.2 - 3.8) / √(1.5²/30 + 1.8²/35) = 0.4 / √(0.075 + 0.0926) = 0.4 / 0.410 ≈ 0.98

p-value: For t = 0.98, df ≈ 62, two-tailed: p-value ≈ 0.33

Decision: Since p-value (0.33) > α (0.10), fail to reject H₀.

Conclusion: There is no significant difference in headache duration between the two drugs.

5Confidence Interval for Difference

Using the data from Problem 1 (8+ hours: n₁=35, x̄₁=82, s₁=9; <8 hours: n₂=40, x̄₂=76, s₂=11), construct a 95% confidence interval for μ₁ - μ₂.

Solution:

Formula: (x̄₁ - x̄₂) ± t* × √(s₁²/n₁ + s₂²/n₂)

Standard error:

SE = √(81/35 + 121/40) = √(2.314 + 3.025) = 2.310

Critical value: For 95% CI, df ≈ 72: t* ≈ 1.993

Confidence interval:

(82 - 76) ± 1.993 × 2.310 = 6 ± 4.604 = (1.396, 10.604)

Interpretation: We are 95% confident that students who get 8+ hours of sleep score between 1.4 and 10.6 points higher on exams than students who get less sleep. Since the interval doesn't contain 0, there is a significant difference.

Part 2: Paired t-Tests (Problems 6-10)

6Weight Loss Program

Six people participate in a weight loss program. Their weights (in pounds) are recorded before and after:

PersonBeforeAfterd = Before - After
11801755
21951887
32102055
41651623
51881808
62021966

Test at α = 0.05 if the program results in significant weight loss.

Solution:

Step 1: Calculate d̄ and sd

d̄ = (5+7+5+3+8+6) / 6 = 34/6 = 5.667 pounds

Squared deviations: (5-5.667)²=0.444, (7-5.667)²=1.778, etc.

sd = √[Σ(d-d̄)²/(n-1)] = √[13.333/5] = 1.633

Step 2: Hypotheses

H₀: μd = 0, Hₐ: μd > 0 (right-tailed, expect weight loss)

Step 3: Test statistic

t = (5.667 - 0) / (1.633/√6) = 5.667 / 0.667 ≈ 8.50

Step 4: Critical value

df = 6 - 1 = 5, α = 0.05 (right-tailed): t* ≈ 2.015

Step 5: Decision

Since 8.50 > 2.015, reject H₀. p-value < 0.001

Conclusion: The weight loss program results in significant weight loss (average 5.67 pounds).

7Blood Pressure Medication

A blood pressure medication is tested on 10 patients. Systolic BP is measured before and after treatment:

Test at α = 0.01 if the medication reduces blood pressure.

Solution:

Hypotheses: H₀: μd = 0, Hₐ: μd > 0 (right-tailed)

Test statistic:

t = (8.5 - 0) / (6.2/√10) = 8.5 / 1.960 ≈ 4.34

Degrees of freedom: df = 10 - 1 = 9

Critical value: For α = 0.01 (right-tailed), df = 9: t* ≈ 2.821

Decision: Since 4.34 > 2.821, reject H₀.

Conclusion: At the 0.01 significance level, there is strong evidence that the medication significantly reduces blood pressure by an average of 8.5 mmHg.

8Tutoring Effectiveness

Twelve students take a pre-test before tutoring and a post-test after:

Construct a 95% confidence interval for the mean improvement.

Solution:

Formula: d̄ ± t* × (sd/√n)

Critical value: df = 12 - 1 = 11, 95% CI: t* ≈ 2.201

Margin of error:

ME = 2.201 × (8.4/√12) = 2.201 × 2.425 = 5.337

Confidence interval:

12.5 ± 5.337 = (7.16, 17.84) points

Interpretation: We are 95% confident that tutoring improves test scores by an average of 7.16 to 17.84 points. Since the interval doesn't contain 0, tutoring significantly improves scores.

9Reaction Time Study

Eight subjects' reaction times (in milliseconds) are measured on their dominant and non-dominant hands:

SubjectDominantNon-Dominant
1285310
2290305
3275295
4300320
5280300
6295315
7270285
8288308

Test if there's a significant difference at α = 0.05.

Solution:

Calculate differences (Dominant - Non-Dominant):

d: -25, -15, -20, -20, -20, -20, -15, -20

Statistics:

d̄ = -155/8 = -19.375 ms

sd ≈ 3.204 ms

Hypotheses: H₀: μd = 0, Hₐ: μd ≠ 0 (two-tailed)

Test statistic:

t = -19.375 / (3.204/√8) = -19.375 / 1.133 ≈ -17.10

Decision: df = 7, |t| = 17.10 >> critical value. Reject H₀ (p < 0.001).

Conclusion: There is overwhelming evidence that reaction time is significantly faster on the dominant hand (by about 19 ms on average).

10Identify the Design

For each scenario, state whether you should use a paired test or independent test:

  1. Comparing anxiety levels before and after therapy for 20 patients
  2. Comparing average salaries of teachers vs. nurses (different people)
  3. Testing if identical twins differ in IQ (one twin raised in each environment)
  4. Comparing recovery times for patients receiving Drug A vs. Drug B (random assignment)

Solution:

(a) Paired test - Same 20 patients measured twice (before/after)

(b) Independent test - Two separate groups (teachers vs. nurses)

(c) Paired test - Matched pairs design (twins matched)

(d) Independent test - Two separate groups of patients

Part 3: Two-Proportion Tests (Problems 11-15)

11Drug Cure Rates

Two drugs are compared:

Test at α = 0.05 if the cure rates differ.

Solution:

Sample proportions:

p̂₁ = 85/150 = 0.567, p̂₂ = 92/180 = 0.511

Check conditions:

n₁p̂₁ = 85 ≥ 10, n₁(1-p̂₁) = 65 ≥ 10

n₂p̂₂ = 92 ≥ 10, n₂(1-p̂₂) = 88 ≥ 10

Hypotheses: H₀: p₁ = p₂, Hₐ: p₁ ≠ p₂ (two-tailed)

Pooled proportion:

p̄ = (85+92)/(150+180) = 177/330 = 0.536

Test statistic:

z = (0.567-0.511) / √[0.536×0.464×(1/150+1/180)]

z = 0.056 / √[0.249×0.01111] = 0.056 / 0.0527 ≈ 1.06

Decision: For α = 0.05 (two-tailed), z* = ±1.96. Since |1.06| < 1.96, fail to reject H₀.

Conclusion: There is insufficient evidence to conclude the cure rates differ between the two drugs.

12Gender and Policy Support

A survey asks about support for a new policy:

Is there a significant difference at α = 0.01?

Solution:

Hypotheses: H₀: p₁ = p₂, Hₐ: p₁ ≠ p₂

Pooled proportion:

p̄ = (240+300)/(400+450) = 540/850 = 0.635

Test statistic:

z = (0.60-0.667) / √[0.635×0.365×(1/400+1/450)]

z = -0.067 / 0.0331 ≈ -2.02

Critical value: α = 0.01 (two-tailed): z* = ±2.576

Decision: Since |-2.02| < 2.576, fail to reject H₀.

Conclusion: At the 0.01 level, there is insufficient evidence to conclude men and women differ in their support for the policy.

13Online vs In-Person Pass Rates

Compare pass rates for online vs. in-person classes:

Construct a 95% confidence interval for the difference in pass rates.

Solution:

Sample proportions:

p̂₁ = 78/120 = 0.65, p̂₂ = 95/130 = 0.731

Formula (NO pooling for CI):

(p̂₁ - p̂₂) ± z* × √[(p̂₁(1-p̂₁)/n₁) + (p̂₂(1-p̂₂)/n₂)]

Standard error:

SE = √[(0.65×0.35/120) + (0.731×0.269/130)]

SE = √[0.001896 + 0.001512] = 0.0584

95% CI: z* = 1.96

(0.65 - 0.731) ± 1.96 × 0.0584

-0.081 ± 0.114 = (-0.195, 0.033)

Interpretation: We're 95% confident the difference in pass rates is between -19.5% and +3.3%. Since the interval contains 0, there's no significant difference.

14Quality Control Comparison

Two factories' defect rates are compared:

Test at α = 0.10 if Factory 2 has a higher defect rate (one-tailed).

Solution:

Hypotheses: H₀: p₂ = p₁, Hₐ: p₂ > p₁ (right-tailed)

Pooled proportion:

p̄ = (18+25)/(250+300) = 43/550 = 0.0782

Test statistic:

z = (0.083-0.072) / √[0.0782×0.9218×(1/250+1/300)]

z = 0.011 / 0.0229 ≈ 0.48

Critical value: α = 0.10 (right-tailed): z* = 1.28

Decision: Since 0.48 < 1.28, fail to reject H₀.

Conclusion: There is insufficient evidence that Factory 2 has a higher defect rate.

15Checking Conditions

Can you conduct a two-proportion z-test for these scenarios?

  1. n₁ = 50 with 8 successes, n₂ = 60 with 45 successes
  2. n₁ = 150 with 120 successes, n₂ = 200 with 30 successes
  3. n₁ = 100 with 55 successes, n₂ = 90 with 40 successes

Solution:

(a) NO - n₁p̂₁ = 8 < 10. Fails success-failure condition.

(b) NO - n₂p̂₂ = 30 and n₂(1-p̂₂) = 170, but check n₁: p̂₁ = 0.8, so n₁(1-p̂₁) = 30 ≥ 10. Actually this one works! Both conditions met.

Correction (b) YES - All conditions satisfied.

(c) YES - n₁p̂₁ = 55 ≥ 10, n₁(1-p̂₁) = 45 ≥ 10, n₂p̂₂ = 40 ≥ 10, n₂(1-p̂₂) = 50 ≥ 10. All conditions met.

Part 4: Choosing the Right Test (Problems 16-20)

16Test Selection Practice

For each scenario, identify which hypothesis test to use:

  1. A researcher compares average commute times in City A (n=50) vs. City B (n=60).
  2. A company claims 90% customer satisfaction. You survey 200 customers to test this.
  3. Nurses measure patients' pain levels before and after a treatment (same 30 patients).
  4. Compare proportion of voters supporting Candidate X in Texas vs. California.

Solution:

(a) Independent two-sample t-test - Two separate cities (independent), testing means (commute times)

(b) One-sample z-test for proportion - One sample, testing proportion against claimed 90%

(c) Paired t-test - Same 30 patients measured twice (before/after), testing means (pain levels)

(d) Two-sample z-test for proportions - Two states (independent), testing proportions (voter support)

17Independent or Paired?

Determine if each scenario requires independent or paired test:

  1. Compare average test scores of 40 students using Method A vs. 35 different students using Method B
  2. Measure blood sugar levels in 25 diabetic patients before and after a diet change
  3. Compare average heights of 50 adult men vs. 50 adult women
  4. Test reading comprehension in 20 children at age 5 and again at age 7

Solution:

(a) Independent - Different students in each group

(b) Paired - Same 25 patients measured twice

(c) Independent - Different people (men vs. women)

(d) Paired - Same 20 children measured twice (at different ages)

18Complete Analysis

A fitness instructor wants to test if a new workout program improves mile run times. She records times for 15 participants before and after the 8-week program:

a) Which test should be used?
b) Test at α = 0.05 if the program improves times.
c) Construct a 90% confidence interval.

Solution:

(a) Paired t-test - Same 15 participants measured twice

(b) Hypothesis test:

H₀: μd = 0, Hₐ: μd > 0 (improvement means positive difference)

t = 1.2 / (0.8/√15) = 1.2 / 0.2066 = 5.81

df = 14, critical value ≈ 1.761. Since 5.81 > 1.761, reject H₀.

Conclusion: The program significantly improves run times (p < 0.001).

(c) 90% CI: t* ≈ 1.761 for df = 14

1.2 ± 1.761 × 0.2066 = 1.2 ± 0.364 = (0.836, 1.564) minutes

We're 90% confident the program improves times by 0.84 to 1.56 minutes on average.

19Mixed Practice

Identify the test AND the hypotheses for each:

  1. Test if more than 70% of college students have part-time jobs (sample: 250 students)
  2. Compare average GPAs of athletes vs. non-athletes at a university
  3. Test if a meditation app reduces stress scores (measure same 40 people before/after)

Solution:

(a) One-sample z-test for proportion

H₀: p = 0.70, Hₐ: p > 0.70 (right-tailed)

(b) Independent two-sample t-test

H₀: μ₁ = μ₂, Hₐ: μ₁ ≠ μ₂ (two-tailed, unless direction specified)

(c) Paired t-test

H₀: μd = 0, Hₐ: μd > 0 (right-tailed, if d = Before - After, expecting reduction)

20Critical Thinking Challenge

A researcher wants to test if a new teaching method improves test scores. She has two options:

Design A: Randomly assign 50 students to new method, 50 to traditional method. Compare final exam scores.

Design B: Give all 50 students a pre-test, teach using new method, then give post-test. Compare before/after scores.

a) Which test would be used for each design?
b) Which design is more powerful (better at detecting real effects)?
c) What are the tradeoffs?

Solution:

(a) Tests:

  • Design A: Independent two-sample t-test (two separate groups)
  • Design B: Paired t-test (same students, before/after)

(b) More Powerful: Design B (paired)

Paired designs control for individual variability. Each student serves as their own control, eliminating noise from differing baseline abilities.

(c) Tradeoffs:

Design A Advantages:

  • Can directly compare two methods simultaneously
  • No practice effects from taking test twice

Design A Disadvantages:

  • Needs more subjects for same power
  • Individual differences add noise

Design B Advantages:

  • More powerful (controls individual variability)
  • Needs fewer subjects

Design B Disadvantages:

  • No comparison group (can't isolate method effect from practice effect)
  • Students might improve just from test practice
  • Can't tell if new method is better than traditional

Best approach: Use Design A if you want to compare two methods. Use Design B only if you also have a control group taking the same pre/post tests with traditional method!

Ready for the Quiz? →