Practice Problems: Hypothesis Testing
Apply what you've learned with 20 comprehensive practice problems
How to Use These Problems
- Try each problem yourself first before looking at hints or solutions
- Use the hint button if you're stuck
- Check the full solution to see detailed step-by-step work
- Problems increase in difficulty: Easy → Medium → Hard
- Cover all topics: hypothesis formulation, errors & power, tests for means, and tests for proportions
Part 1: Conceptual Understanding (Problems 1-5)
Identifying Hypotheses
A nutritionist claims that the average American consumes more than 3000 calories per day. Write the null and alternative hypotheses for testing this claim. Is this a one-tailed or two-tailed test?
Solution:
- H₀: μ = 3000 (average consumption is 3000 calories)
- Hₐ: μ > 3000 (average consumption is more than 3000 calories)
This is a one-tailed test (specifically, a right-tailed test) because we're only interested in whether consumption is greater than 3000, not different in either direction.
Type I and Type II Errors
A pharmaceutical company tests a new drug:
H₀: The drug has no side effects
Hₐ: The drug has side effects
Describe what a Type I error and a Type II error would be in this context. Which error would have more serious consequences?
Type I Error (α): Conclude the drug has side effects when it actually doesn't.
Consequence: A safe drug is rejected and doesn't reach patients who could benefit.
Type II Error (β): Conclude the drug has no side effects when it actually does.
Consequence: A dangerous drug is approved and patients are harmed.
Which is worse? Type II error is more serious in this case because patient safety is at risk. It's better to be overly cautious (reject some safe drugs) than to approve dangerous drugs.
Power and Sample Size
A researcher designs a study with power = 0.65. Is this acceptable? If they want to increase power to 0.80, what are three ways they could achieve this?
Is power = 0.65 acceptable? No. This means only a 65% chance of detecting a real effect, and a 35% chance of missing it (Type II error). The standard recommendation is power ≥ 0.80 (80%).
Three ways to increase power to 0.80:
- Increase sample size (n) — Most practical and common approach
- Increase significance level (α) — Use α = 0.10 instead of 0.05 (though this increases Type I error risk)
- Reduce variability — Use more precise measurement instruments or more homogeneous sample
Best choice: Increasing sample size is usually the best option because it doesn't involve tradeoffs like the other methods.
Z-Test vs T-Test
You want to test if the mean height of college students is 68 inches. You have a sample of 50 students with x̄ = 67.2 inches and s = 4 inches. The population standard deviation is unknown. Should you use a z-test or t-test? Why?
Use a t-test.
Reason: The population standard deviation (σ) is unknown. We only have the sample standard deviation (s = 4 inches). When σ is unknown, we must use the t-test with degrees of freedom df = n - 1 = 50 - 1 = 49.
General rule: In real-world scenarios, we almost always use t-tests for means because we rarely know the true population standard deviation.
P-Value Interpretation
A hypothesis test produces a p-value of 0.038. If α = 0.05, what is your decision? If α = 0.01, what is your decision? Interpret what the p-value means in plain English.
Decision at α = 0.05:
p-value (0.038) < α (0.05) → Reject H₀
There is sufficient evidence to reject the null hypothesis at the 0.05 significance level.
Decision at α = 0.01:
p-value (0.038) > α (0.01) → Fail to reject H₀
There is not sufficient evidence to reject the null hypothesis at the 0.01 significance level.
Plain English interpretation:
If the null hypothesis were true, there would be only a 3.8% chance of obtaining results as extreme as (or more extreme than) what we observed. This is fairly unlikely, which suggests the null hypothesis may not be true.
Part 2: Z-Tests for Means (Problems 6-9)
Z-Test: Two-Tailed
A battery manufacturer claims their batteries last 500 hours on average. A consumer group tests 40 batteries and finds x̄ = 485 hours. The population standard deviation is known to be σ = 50 hours. Test the claim at α = 0.05.
Step 1: Hypotheses
- H₀: μ = 500
- Hₐ: μ ≠ 500 (two-tailed)
- α = 0.05
Step 2: Test statistic
z = (485 - 500) / (50/√40) = -15 / 7.906 = -1.90
Step 3: Critical value
Two-tailed, α = 0.05: ±1.96
Step 4: Decision
|z| = 1.90 < 1.96 → Fail to reject H₀
Conclusion: There is not sufficient evidence to reject the manufacturer's claim that batteries last 500 hours on average. The observed difference could be due to sampling variability.
Z-Test: Right-Tailed with P-Value
A fitness program claims to increase bench press strength by at least 30 pounds. Test 35 participants: x̄ = 33 pounds, σ = 8 pounds. Use α = 0.05. Calculate the p-value.
Step 1: Hypotheses
- H₀: μ = 30
- Hₐ: μ > 30 (right-tailed)
- α = 0.05
Step 2: Test statistic
z = (33 - 30) / (8/√35) = 3 / 1.352 = 2.22
Step 3: P-value
P(Z > 2.22) = 1 - 0.9868 = 0.0132
Step 4: Decision
p-value (0.0132) < α (0.05) → Reject H₀
Conclusion: There is strong evidence (p = 0.0132) that the program increases strength by more than 30 pounds.
Z-Test: Left-Tailed
A diet company claims their program results in weight loss of 15 pounds. A skeptic believes the actual weight loss is less. Test 60 participants: x̄ = 13.5 pounds, σ = 5 pounds. Use α = 0.01.
Step 1: Hypotheses
- H₀: μ = 15
- Hₐ: μ < 15 (left-tailed)
- α = 0.01
Step 2: Test statistic
z = (13.5 - 15) / (5/√60) = -1.5 / 0.6455 = -2.32
Step 3: Critical value
Left-tailed, α = 0.01: -2.326
Step 4: Decision
z = -2.32 < -2.326 → Reject H₀ (barely!)
Conclusion: There is sufficient evidence at the 0.01 level that the average weight loss is less than the claimed 15 pounds. The z-value is very close to the critical value, making this a borderline decision.
Critical Thinking: Borderline Result
In Problem 8, the test statistic was z = -2.32 and the critical value was -2.326. The null hypothesis was rejected. However, what if the sample mean had been 13.6 instead of 13.5? Recalculate and discuss how small changes in data can affect conclusions.
Recalculation with x̄ = 13.6:
z = (13.6 - 15) / (5/√60) = -1.4 / 0.6455 = -2.17
Decision:
z = -2.17 > -2.326 → Fail to reject H₀
Discussion:
A change of just 0.1 pounds in the sample mean (from 13.5 to 13.6) completely reverses our conclusion! This illustrates important points:
- When p-values are close to α, results are borderline and sensitive to small changes
- Statistical significance doesn't mean practical significance
- We should be cautious about making strong claims from borderline results
- Replication studies are important to confirm findings
Part 3: T-Tests for Means (Problems 10-13)
T-Test: Two-Tailed
A college claims the average student debt at graduation is $25,000. A survey of 25 graduates shows x̄ = $27,400 and s = $6,000. Test at α = 0.05.
Step 1: Hypotheses
- H₀: μ = 25000
- Hₐ: μ ≠ 25000 (two-tailed)
- α = 0.05
Step 2: Test statistic
t = (27400 - 25000) / (6000/√25) = 2400 / 1200 = 2.0
df = 25 - 1 = 24
Step 3: Critical value
Two-tailed, α = 0.05, df = 24: ±2.064
Step 4: Decision
|t| = 2.0 < 2.064 → Fail to reject H₀
Conclusion: There is not sufficient evidence to conclude the average student debt differs from $25,000. Though the sample mean is higher, the difference is not statistically significant.
T-Test: Right-Tailed
A teacher believes a new teaching method increases test scores above the previous average of 75. Test 18 students: x̄ = 79, s = 8. Use α = 0.05.
Step 1: Hypotheses
- H₀: μ = 75
- Hₐ: μ > 75 (right-tailed)
- α = 0.05
Step 2: Test statistic
t = (79 - 75) / (8/√18) = 4 / 1.886 = 2.12
df = 18 - 1 = 17
Step 3: Critical value
Right-tailed, α = 0.05, df = 17: 1.740
Step 4: Decision
t = 2.12 > 1.740 → Reject H₀
Conclusion: There is sufficient evidence that the new teaching method increases test scores above 75. The improvement is statistically significant.
T-Test: Left-Tailed with P-Value
A city claims the average commute time is 30 minutes. A transportation study samples 22 commuters: x̄ = 27.5 minutes, s = 6 minutes. Test if commute time is less than claimed (α = 0.01). Find the p-value range.
Step 1: Hypotheses
- H₀: μ = 30
- Hₐ: μ < 30 (left-tailed)
- α = 0.01
Step 2: Test statistic
t = (27.5 - 30) / (6/√22) = -2.5 / 1.279 = -1.95
df = 22 - 1 = 21
Step 3: Critical value and p-value
Critical value (α = 0.01, df = 21): -2.518
From t-table (df = 21): For t = -1.95, p-value is between 0.025 and 0.05 (one-tailed)
Step 4: Decision
t = -1.95 > -2.518 → Fail to reject H₀
p-value > α (0.01) → Fail to reject H₀
Conclusion: There is not sufficient evidence at the 0.01 level that the average commute time is less than 30 minutes.
Sample Size and Significance
Two studies test the same hypothesis about mean salary (H₀: μ = $50,000):
Study A: n = 20, x̄ = $52,000, s = $8,000
Study B: n = 100, x̄ = $52,000, s = $8,000
Calculate the test statistic for each. Which study is more likely to reject H₀? Why?
Study A (n = 20):
t = (52000 - 50000) / (8000/√20) = 2000 / 1789 = 1.12
df = 19, critical value (α = 0.05, two-tailed) ≈ 2.093
Decision: Fail to reject H₀ (1.12 < 2.093)
Study B (n = 100):
t = (52000 - 50000) / (8000/√100) = 2000 / 800 = 2.5
df = 99, critical value (α = 0.05, two-tailed) ≈ 1.984
Decision: Reject H₀ (2.5 > 1.984)
Analysis: Study B is more likely to reject H₀ because:
- Larger sample size reduces standard error
- Same difference (x̄ - μ₀) produces larger test statistic with bigger sample
- This shows why sample size matters: with enough data, even small differences can be statistically significant
- This also illustrates the difference between statistical significance (Study B) and practical significance ($2,000 difference may not be practically meaningful)
Part 4: Tests for Proportions (Problems 14-20)
Checking Conditions for Proportion Test
A researcher wants to test if p = 0.15. They have a sample of n = 80. Check if the normal approximation conditions are met.
Check conditions:
np₀ = 80(0.15) = 12 ≥ 10
n(1-p₀) = 80(0.85) = 68 ≥ 10
Conclusion: Both conditions are satisfied. We can use the normal approximation (z-test) for this proportion test.
Proportion Test: Two-Tailed
A university claims 80% of graduates are employed within 6 months. Survey 200 graduates: 150 are employed. Test at α = 0.05.
Step 1: Hypotheses
- H₀: p = 0.80
- Hₐ: p ≠ 0.80 (two-tailed)
- α = 0.05
Step 2: Check conditions
np₀ = 200(0.80) = 160 ≥ 10
n(1-p₀) = 200(0.20) = 40 ≥ 10
Step 3: Test statistic
p̂ = 150/200 = 0.75
z = (0.75 - 0.80) / √(0.80(0.20)/200) = -0.05 / 0.0283 = -1.77
Step 4: Critical value
Two-tailed, α = 0.05: ±1.96
Step 5: Decision
|z| = 1.77 < 1.96 → Fail to reject H₀
Conclusion: There is not sufficient evidence that the employment rate differs from 80%.
Proportion Test: Right-Tailed
A politician claims that more than 55% of voters support a bill. Poll 300 voters: 180 support it. Test at α = 0.01.
Step 1: Hypotheses
- H₀: p = 0.55
- Hₐ: p > 0.55 (right-tailed)
- α = 0.01
Step 2: Check conditions
np₀ = 300(0.55) = 165 ≥ 10
n(1-p₀) = 300(0.45) = 135 ≥ 10
Step 3: Test statistic
p̂ = 180/300 = 0.60
z = (0.60 - 0.55) / √(0.55(0.45)/300) = 0.05 / 0.0287 = 1.74
Step 4: Critical value
Right-tailed, α = 0.01: 2.326
Step 5: Decision
z = 1.74 < 2.326 → Fail to reject H₀
Conclusion: There is not sufficient evidence at the 0.01 level that more than 55% support the bill.
Proportion Test: Left-Tailed with P-Value
A company claims their defect rate is 5%. Quality control suspects it's lower. Inspect 400 items: 15 defects. Test at α = 0.05 and find the p-value.
Step 1: Hypotheses
- H₀: p = 0.05
- Hₐ: p < 0.05 (left-tailed)
- α = 0.05
Step 2: Check conditions
np₀ = 400(0.05) = 20 ≥ 10
n(1-p₀) = 400(0.95) = 380 ≥ 10
Step 3: Test statistic
p̂ = 15/400 = 0.0375
z = (0.0375 - 0.05) / √(0.05(0.95)/400) = -0.0125 / 0.0109 = -1.15
Step 4: P-value
P(Z < -1.15) = 0.1251
Step 5: Decision
p-value (0.1251) > α (0.05) → Fail to reject H₀
Conclusion: There is not sufficient evidence that the defect rate is lower than 5%.
When Conditions Fail
A researcher wants to test if p = 0.02 using a sample of n = 60. Can they use the normal approximation? If not, what should they do?
Check conditions:
np₀ = 60(0.02) = 1.2 < 10 (FAILS!)
n(1-p₀) = 60(0.98) = 58.8 ≥ 10
Conclusion: NO, they cannot use the normal approximation because np₀ < 10.
What to do instead:
- Option 1: Increase sample size. Need n ≥ 10/0.02 = 500 to meet the condition
- Option 2: Use exact binomial test (not covered in this course, but the proper method when normal approximation fails)
- Option 3: Use a different statistical approach designed for rare events
Key lesson: Always check conditions before conducting a test. If conditions aren't met, the test results are not valid!
Real-World Application: Quality Control
A factory produces computer chips with a claimed defect rate of 2%. A quality inspector randomly samples 500 chips and finds 15 defective. The manager wants to know if the defect rate has increased above 2%. Test at α = 0.05 and provide a recommendation.
Step 1: Hypotheses
- H₀: p = 0.02 (defect rate is 2%)
- Hₐ: p > 0.02 (defect rate has increased) — right-tailed
- α = 0.05
Step 2: Check conditions
np₀ = 500(0.02) = 10 ≥ 10 (just barely)
n(1-p₀) = 500(0.98) = 490 ≥ 10
Step 3: Test statistic
p̂ = 15/500 = 0.03
z = (0.03 - 0.02) / √(0.02(0.98)/500) = 0.01 / 0.00626 = 1.60
Step 4: Critical value and p-value
Critical value (α = 0.05, right-tailed): 1.645
z = 1.60 < 1.645 → Fail to reject H₀
P-value: P(Z > 1.60) = 0.0548
Step 5: Statistical conclusion
There is not quite enough evidence at the 0.05 level to conclude the defect rate has increased above 2%.
Practical recommendation:
While the result is not statistically significant (p = 0.055 is very close to 0.05), the observed 3% defect rate is 50% higher than the claimed 2%. From a quality control perspective:
- Recommendation: Investigate the production process
- The result is borderline—very close to significance
- Cost of investigating < cost of shipping defective chips
- Better safe than sorry in quality control
- Monitor next batches closely
This illustrates that statistical decisions should be combined with practical considerations!
Comprehensive Analysis: Medical Trial
A pharmaceutical company tests a new drug. Historical cure rate is 60%. Trial of 150 patients: 105 cured.
(a) Test if cure rate differs from 60% (α = 0.01)
(b) What would be a Type I error in this context?
(c) What would be a Type II error?
(d) Which error would be worse for patients?
(e) If you were designing this study, would you use α = 0.01 or α = 0.05? Why?
(a) Hypothesis Test:
H₀: p = 0.60, Hₐ: p ≠ 0.60 (two-tailed), α = 0.01
Check conditions: np₀ = 90 ≥ 10 , n(1-p₀) = 60 ≥ 10
p̂ = 105/150 = 0.70
z = (0.70 - 0.60) / √(0.60(0.40)/150) = 0.10 / 0.04 = 2.5
Critical values: ±2.576
|z| = 2.5 < 2.576 → Fail to reject H₀
Statistical conclusion: Not sufficient evidence at α = 0.01 that cure rate differs from 60%.
(b) Type I Error:
Conclude the new drug has a different cure rate when it's actually the same as the historical 60%.
Consequence: Company invests in a drug that's no better than existing treatment.
(c) Type II Error:
Conclude the drug's cure rate is not different from 60% when it actually is different (better or worse).
Consequence: If better, an improved treatment is rejected. If worse, a harmful drug continues in development.
(d) Which error is worse?
Type II error is worse if the drug is actually worse (lower cure rate) because patients could be harmed. However, Type II error is also bad if the drug is actually better—patients miss out on an improved treatment.
In medical contexts, both errors have serious consequences, but patient safety (avoiding harmful treatments) is typically prioritized.
(e) Choice of α:
Recommended: α = 0.05
Reasoning:
- α = 0.01 is very conservative (harder to detect real differences)
- With α = 0.05, our z = 2.5 > 1.96 would reject H₀, suggesting the drug is better
- In this case, the drug appears to improve cure rate from 60% to 70%—that's clinically meaningful
- Using α = 0.05 balances Type I and Type II error risks
- Further testing (Phase III trials) would confirm findings before approval
Note: This demonstrates that α = 0.05 would detect the improvement while α = 0.01 would not. The choice of α matters!