Learn Without Walls

Practice Problems: Hypothesis Testing

Apply what you've learned with 20 comprehensive practice problems

How to Use These Problems

Part 1: Conceptual Understanding (Problems 1-5)

Problem 1 Easy

Identifying Hypotheses

A nutritionist claims that the average American consumes more than 3000 calories per day. Write the null and alternative hypotheses for testing this claim. Is this a one-tailed or two-tailed test?

The null hypothesis always represents the status quo or equality. The alternative is what the researcher is trying to prove. Look for words like "more than" to determine one-tailed vs two-tailed.

Solution:

  • H₀: μ = 3000 (average consumption is 3000 calories)
  • Hₐ: μ > 3000 (average consumption is more than 3000 calories)

This is a one-tailed test (specifically, a right-tailed test) because we're only interested in whether consumption is greater than 3000, not different in either direction.

Problem 2 Easy

Type I and Type II Errors

A pharmaceutical company tests a new drug:
H₀: The drug has no side effects
Hₐ: The drug has side effects

Describe what a Type I error and a Type II error would be in this context. Which error would have more serious consequences?

Type I = rejecting true H₀ (false positive). Type II = failing to reject false H₀ (false negative). Think about patient safety.

Type I Error (α): Conclude the drug has side effects when it actually doesn't.

Consequence: A safe drug is rejected and doesn't reach patients who could benefit.

Type II Error (β): Conclude the drug has no side effects when it actually does.

Consequence: A dangerous drug is approved and patients are harmed.

Which is worse? Type II error is more serious in this case because patient safety is at risk. It's better to be overly cautious (reject some safe drugs) than to approve dangerous drugs.

Problem 3 Medium

Power and Sample Size

A researcher designs a study with power = 0.65. Is this acceptable? If they want to increase power to 0.80, what are three ways they could achieve this?

Standard recommendation is power ≥ 0.80. Think about the four factors that affect power: sample size, α, effect size, and variability.

Is power = 0.65 acceptable? No. This means only a 65% chance of detecting a real effect, and a 35% chance of missing it (Type II error). The standard recommendation is power ≥ 0.80 (80%).

Three ways to increase power to 0.80:

  1. Increase sample size (n) — Most practical and common approach
  2. Increase significance level (α) — Use α = 0.10 instead of 0.05 (though this increases Type I error risk)
  3. Reduce variability — Use more precise measurement instruments or more homogeneous sample

Best choice: Increasing sample size is usually the best option because it doesn't involve tradeoffs like the other methods.

Problem 4 Easy

Z-Test vs T-Test

You want to test if the mean height of college students is 68 inches. You have a sample of 50 students with x̄ = 67.2 inches and s = 4 inches. The population standard deviation is unknown. Should you use a z-test or t-test? Why?

The key question is: Do we know σ (population standard deviation) or only s (sample standard deviation)?

Use a t-test.

Reason: The population standard deviation (σ) is unknown. We only have the sample standard deviation (s = 4 inches). When σ is unknown, we must use the t-test with degrees of freedom df = n - 1 = 50 - 1 = 49.

General rule: In real-world scenarios, we almost always use t-tests for means because we rarely know the true population standard deviation.

Problem 5 Medium

P-Value Interpretation

A hypothesis test produces a p-value of 0.038. If α = 0.05, what is your decision? If α = 0.01, what is your decision? Interpret what the p-value means in plain English.

Rule: Reject H₀ if p-value ≤ α. The p-value is the probability of getting results this extreme if H₀ is true.

Decision at α = 0.05:

p-value (0.038) < α (0.05) → Reject H₀

There is sufficient evidence to reject the null hypothesis at the 0.05 significance level.

Decision at α = 0.01:

p-value (0.038) > α (0.01) → Fail to reject H₀

There is not sufficient evidence to reject the null hypothesis at the 0.01 significance level.

Plain English interpretation:

If the null hypothesis were true, there would be only a 3.8% chance of obtaining results as extreme as (or more extreme than) what we observed. This is fairly unlikely, which suggests the null hypothesis may not be true.

Part 2: Z-Tests for Means (Problems 6-9)

Problem 6 Medium

Z-Test: Two-Tailed

A battery manufacturer claims their batteries last 500 hours on average. A consumer group tests 40 batteries and finds x̄ = 485 hours. The population standard deviation is known to be σ = 50 hours. Test the claim at α = 0.05.

This is a two-tailed test (testing if mean is "different from" 500, not specifically higher or lower). Use z = (x̄ - μ₀) / (σ/√n). Critical values for α = 0.05, two-tailed: ±1.96.

Step 1: Hypotheses

  • H₀: μ = 500
  • Hₐ: μ ≠ 500 (two-tailed)
  • α = 0.05

Step 2: Test statistic

z = (485 - 500) / (50/√40) = -15 / 7.906 = -1.90

Step 3: Critical value

Two-tailed, α = 0.05: ±1.96

Step 4: Decision

|z| = 1.90 < 1.96 → Fail to reject H₀

Conclusion: There is not sufficient evidence to reject the manufacturer's claim that batteries last 500 hours on average. The observed difference could be due to sampling variability.

Problem 7 Medium

Z-Test: Right-Tailed with P-Value

A fitness program claims to increase bench press strength by at least 30 pounds. Test 35 participants: x̄ = 33 pounds, σ = 8 pounds. Use α = 0.05. Calculate the p-value.

"At least 30" means testing if μ > 30 (right-tailed). For p-value in right-tailed test: P(Z > z).

Step 1: Hypotheses

  • H₀: μ = 30
  • Hₐ: μ > 30 (right-tailed)
  • α = 0.05

Step 2: Test statistic

z = (33 - 30) / (8/√35) = 3 / 1.352 = 2.22

Step 3: P-value

P(Z > 2.22) = 1 - 0.9868 = 0.0132

Step 4: Decision

p-value (0.0132) < α (0.05) → Reject H₀

Conclusion: There is strong evidence (p = 0.0132) that the program increases strength by more than 30 pounds.

Problem 8 Hard

Z-Test: Left-Tailed

A diet company claims their program results in weight loss of 15 pounds. A skeptic believes the actual weight loss is less. Test 60 participants: x̄ = 13.5 pounds, σ = 5 pounds. Use α = 0.01.

"Less than 15" means left-tailed test (Hₐ: μ < 15). For α = 0.01, left-tailed critical value = -2.326.

Step 1: Hypotheses

  • H₀: μ = 15
  • Hₐ: μ < 15 (left-tailed)
  • α = 0.01

Step 2: Test statistic

z = (13.5 - 15) / (5/√60) = -1.5 / 0.6455 = -2.32

Step 3: Critical value

Left-tailed, α = 0.01: -2.326

Step 4: Decision

z = -2.32 < -2.326 → Reject H₀ (barely!)

Conclusion: There is sufficient evidence at the 0.01 level that the average weight loss is less than the claimed 15 pounds. The z-value is very close to the critical value, making this a borderline decision.

Problem 9 Hard

Critical Thinking: Borderline Result

In Problem 8, the test statistic was z = -2.32 and the critical value was -2.326. The null hypothesis was rejected. However, what if the sample mean had been 13.6 instead of 13.5? Recalculate and discuss how small changes in data can affect conclusions.

Recalculate z with x̄ = 13.6. Compare to the same critical value -2.326. This illustrates why borderline results should be interpreted cautiously.

Recalculation with x̄ = 13.6:

z = (13.6 - 15) / (5/√60) = -1.4 / 0.6455 = -2.17

Decision:

z = -2.17 > -2.326 → Fail to reject H₀

Discussion:

A change of just 0.1 pounds in the sample mean (from 13.5 to 13.6) completely reverses our conclusion! This illustrates important points:

  • When p-values are close to α, results are borderline and sensitive to small changes
  • Statistical significance doesn't mean practical significance
  • We should be cautious about making strong claims from borderline results
  • Replication studies are important to confirm findings

Part 3: T-Tests for Means (Problems 10-13)

Problem 10 Medium

T-Test: Two-Tailed

A college claims the average student debt at graduation is $25,000. A survey of 25 graduates shows x̄ = $27,400 and s = $6,000. Test at α = 0.05.

Use t-test since σ is unknown. df = n - 1 = 24. For two-tailed, α = 0.05, df = 24: critical values = ±2.064.

Step 1: Hypotheses

  • H₀: μ = 25000
  • Hₐ: μ ≠ 25000 (two-tailed)
  • α = 0.05

Step 2: Test statistic

t = (27400 - 25000) / (6000/√25) = 2400 / 1200 = 2.0

df = 25 - 1 = 24

Step 3: Critical value

Two-tailed, α = 0.05, df = 24: ±2.064

Step 4: Decision

|t| = 2.0 < 2.064 → Fail to reject H₀

Conclusion: There is not sufficient evidence to conclude the average student debt differs from $25,000. Though the sample mean is higher, the difference is not statistically significant.

Problem 11 Medium

T-Test: Right-Tailed

A teacher believes a new teaching method increases test scores above the previous average of 75. Test 18 students: x̄ = 79, s = 8. Use α = 0.05.

Right-tailed test. df = 17. Critical value for α = 0.05, df = 17, right-tailed: 1.740.

Step 1: Hypotheses

  • H₀: μ = 75
  • Hₐ: μ > 75 (right-tailed)
  • α = 0.05

Step 2: Test statistic

t = (79 - 75) / (8/√18) = 4 / 1.886 = 2.12

df = 18 - 1 = 17

Step 3: Critical value

Right-tailed, α = 0.05, df = 17: 1.740

Step 4: Decision

t = 2.12 > 1.740 → Reject H₀

Conclusion: There is sufficient evidence that the new teaching method increases test scores above 75. The improvement is statistically significant.

Problem 12 Hard

T-Test: Left-Tailed with P-Value

A city claims the average commute time is 30 minutes. A transportation study samples 22 commuters: x̄ = 27.5 minutes, s = 6 minutes. Test if commute time is less than claimed (α = 0.01). Find the p-value range.

Left-tailed test, df = 21. Calculate t, then use t-table to find p-value range. Critical value for α = 0.01: -2.518.

Step 1: Hypotheses

  • H₀: μ = 30
  • Hₐ: μ < 30 (left-tailed)
  • α = 0.01

Step 2: Test statistic

t = (27.5 - 30) / (6/√22) = -2.5 / 1.279 = -1.95

df = 22 - 1 = 21

Step 3: Critical value and p-value

Critical value (α = 0.01, df = 21): -2.518

From t-table (df = 21): For t = -1.95, p-value is between 0.025 and 0.05 (one-tailed)

Step 4: Decision

t = -1.95 > -2.518 → Fail to reject H₀

p-value > α (0.01) → Fail to reject H₀

Conclusion: There is not sufficient evidence at the 0.01 level that the average commute time is less than 30 minutes.

Problem 13 Hard

Sample Size and Significance

Two studies test the same hypothesis about mean salary (H₀: μ = $50,000):
Study A: n = 20, x̄ = $52,000, s = $8,000
Study B: n = 100, x̄ = $52,000, s = $8,000
Calculate the test statistic for each. Which study is more likely to reject H₀? Why?

Both have the same x̄ and s, but different n. Larger sample size gives smaller standard error, thus larger |t|. This demonstrates the importance of sample size.

Study A (n = 20):

t = (52000 - 50000) / (8000/√20) = 2000 / 1789 = 1.12

df = 19, critical value (α = 0.05, two-tailed) ≈ 2.093

Decision: Fail to reject H₀ (1.12 < 2.093)

Study B (n = 100):

t = (52000 - 50000) / (8000/√100) = 2000 / 800 = 2.5

df = 99, critical value (α = 0.05, two-tailed) ≈ 1.984

Decision: Reject H₀ (2.5 > 1.984)

Analysis: Study B is more likely to reject H₀ because:

  • Larger sample size reduces standard error
  • Same difference (x̄ - μ₀) produces larger test statistic with bigger sample
  • This shows why sample size matters: with enough data, even small differences can be statistically significant
  • This also illustrates the difference between statistical significance (Study B) and practical significance ($2,000 difference may not be practically meaningful)

Part 4: Tests for Proportions (Problems 14-20)

Problem 14 Easy

Checking Conditions for Proportion Test

A researcher wants to test if p = 0.15. They have a sample of n = 80. Check if the normal approximation conditions are met.

Check: np₀ ≥ 10 AND n(1-p₀) ≥ 10

Check conditions:

np₀ = 80(0.15) = 12 ≥ 10

n(1-p₀) = 80(0.85) = 68 ≥ 10

Conclusion: Both conditions are satisfied. We can use the normal approximation (z-test) for this proportion test.

Problem 15 Medium

Proportion Test: Two-Tailed

A university claims 80% of graduates are employed within 6 months. Survey 200 graduates: 150 are employed. Test at α = 0.05.

p̂ = 150/200 = 0.75. Test if p ≠ 0.80 (two-tailed). z = (p̂ - p₀) / √(p₀(1-p₀)/n)

Step 1: Hypotheses

  • H₀: p = 0.80
  • Hₐ: p ≠ 0.80 (two-tailed)
  • α = 0.05

Step 2: Check conditions

np₀ = 200(0.80) = 160 ≥ 10

n(1-p₀) = 200(0.20) = 40 ≥ 10

Step 3: Test statistic

p̂ = 150/200 = 0.75

z = (0.75 - 0.80) / √(0.80(0.20)/200) = -0.05 / 0.0283 = -1.77

Step 4: Critical value

Two-tailed, α = 0.05: ±1.96

Step 5: Decision

|z| = 1.77 < 1.96 → Fail to reject H₀

Conclusion: There is not sufficient evidence that the employment rate differs from 80%.

Problem 16 Medium

Proportion Test: Right-Tailed

A politician claims that more than 55% of voters support a bill. Poll 300 voters: 180 support it. Test at α = 0.01.

Right-tailed test (Hₐ: p > 0.55). p̂ = 180/300 = 0.60. Critical value for α = 0.01: 2.326.

Step 1: Hypotheses

  • H₀: p = 0.55
  • Hₐ: p > 0.55 (right-tailed)
  • α = 0.01

Step 2: Check conditions

np₀ = 300(0.55) = 165 ≥ 10

n(1-p₀) = 300(0.45) = 135 ≥ 10

Step 3: Test statistic

p̂ = 180/300 = 0.60

z = (0.60 - 0.55) / √(0.55(0.45)/300) = 0.05 / 0.0287 = 1.74

Step 4: Critical value

Right-tailed, α = 0.01: 2.326

Step 5: Decision

z = 1.74 < 2.326 → Fail to reject H₀

Conclusion: There is not sufficient evidence at the 0.01 level that more than 55% support the bill.

Problem 17 Hard

Proportion Test: Left-Tailed with P-Value

A company claims their defect rate is 5%. Quality control suspects it's lower. Inspect 400 items: 15 defects. Test at α = 0.05 and find the p-value.

Left-tailed test (Hₐ: p < 0.05). p̂ = 15/400 = 0.0375. For left-tailed p-value: P(Z < z).

Step 1: Hypotheses

  • H₀: p = 0.05
  • Hₐ: p < 0.05 (left-tailed)
  • α = 0.05

Step 2: Check conditions

np₀ = 400(0.05) = 20 ≥ 10

n(1-p₀) = 400(0.95) = 380 ≥ 10

Step 3: Test statistic

p̂ = 15/400 = 0.0375

z = (0.0375 - 0.05) / √(0.05(0.95)/400) = -0.0125 / 0.0109 = -1.15

Step 4: P-value

P(Z < -1.15) = 0.1251

Step 5: Decision

p-value (0.1251) > α (0.05) → Fail to reject H₀

Conclusion: There is not sufficient evidence that the defect rate is lower than 5%.

Problem 18 Hard

When Conditions Fail

A researcher wants to test if p = 0.02 using a sample of n = 60. Can they use the normal approximation? If not, what should they do?

Check np₀ ≥ 10 AND n(1-p₀) ≥ 10. If either fails, normal approximation is not valid.

Check conditions:

np₀ = 60(0.02) = 1.2 < 10 (FAILS!)

n(1-p₀) = 60(0.98) = 58.8 ≥ 10

Conclusion: NO, they cannot use the normal approximation because np₀ < 10.

What to do instead:

  • Option 1: Increase sample size. Need n ≥ 10/0.02 = 500 to meet the condition
  • Option 2: Use exact binomial test (not covered in this course, but the proper method when normal approximation fails)
  • Option 3: Use a different statistical approach designed for rare events

Key lesson: Always check conditions before conducting a test. If conditions aren't met, the test results are not valid!

Problem 19 Hard

Real-World Application: Quality Control

A factory produces computer chips with a claimed defect rate of 2%. A quality inspector randomly samples 500 chips and finds 15 defective. The manager wants to know if the defect rate has increased above 2%. Test at α = 0.05 and provide a recommendation.

Right-tailed test (Hₐ: p > 0.02). p̂ = 15/500 = 0.03. This is a practical decision with business implications.

Step 1: Hypotheses

  • H₀: p = 0.02 (defect rate is 2%)
  • Hₐ: p > 0.02 (defect rate has increased) — right-tailed
  • α = 0.05

Step 2: Check conditions

np₀ = 500(0.02) = 10 ≥ 10 (just barely)

n(1-p₀) = 500(0.98) = 490 ≥ 10

Step 3: Test statistic

p̂ = 15/500 = 0.03

z = (0.03 - 0.02) / √(0.02(0.98)/500) = 0.01 / 0.00626 = 1.60

Step 4: Critical value and p-value

Critical value (α = 0.05, right-tailed): 1.645

z = 1.60 < 1.645 → Fail to reject H₀

P-value: P(Z > 1.60) = 0.0548

Step 5: Statistical conclusion

There is not quite enough evidence at the 0.05 level to conclude the defect rate has increased above 2%.

Practical recommendation:

While the result is not statistically significant (p = 0.055 is very close to 0.05), the observed 3% defect rate is 50% higher than the claimed 2%. From a quality control perspective:

  • Recommendation: Investigate the production process
  • The result is borderline—very close to significance
  • Cost of investigating < cost of shipping defective chips
  • Better safe than sorry in quality control
  • Monitor next batches closely

This illustrates that statistical decisions should be combined with practical considerations!

Problem 20 Hard

Comprehensive Analysis: Medical Trial

A pharmaceutical company tests a new drug. Historical cure rate is 60%. Trial of 150 patients: 105 cured.
(a) Test if cure rate differs from 60% (α = 0.01)
(b) What would be a Type I error in this context?
(c) What would be a Type II error?
(d) Which error would be worse for patients?
(e) If you were designing this study, would you use α = 0.01 or α = 0.05? Why?

Start with the hypothesis test, then think critically about the consequences of each type of error in a medical context.

(a) Hypothesis Test:

H₀: p = 0.60, Hₐ: p ≠ 0.60 (two-tailed), α = 0.01

Check conditions: np₀ = 90 ≥ 10 , n(1-p₀) = 60 ≥ 10

p̂ = 105/150 = 0.70

z = (0.70 - 0.60) / √(0.60(0.40)/150) = 0.10 / 0.04 = 2.5

Critical values: ±2.576

|z| = 2.5 < 2.576 → Fail to reject H₀

Statistical conclusion: Not sufficient evidence at α = 0.01 that cure rate differs from 60%.

(b) Type I Error:

Conclude the new drug has a different cure rate when it's actually the same as the historical 60%.

Consequence: Company invests in a drug that's no better than existing treatment.

(c) Type II Error:

Conclude the drug's cure rate is not different from 60% when it actually is different (better or worse).

Consequence: If better, an improved treatment is rejected. If worse, a harmful drug continues in development.

(d) Which error is worse?

Type II error is worse if the drug is actually worse (lower cure rate) because patients could be harmed. However, Type II error is also bad if the drug is actually better—patients miss out on an improved treatment.

In medical contexts, both errors have serious consequences, but patient safety (avoiding harmful treatments) is typically prioritized.

(e) Choice of α:

Recommended: α = 0.05

Reasoning:

  • α = 0.01 is very conservative (harder to detect real differences)
  • With α = 0.05, our z = 2.5 > 1.96 would reject H₀, suggesting the drug is better
  • In this case, the drug appears to improve cure rate from 60% to 70%—that's clinically meaningful
  • Using α = 0.05 balances Type I and Type II error risks
  • Further testing (Phase III trials) would confirm findings before approval

Note: This demonstrates that α = 0.05 would detect the improvement while α = 0.01 would not. The choice of α matters!

Ready for the Quiz? →