Two-Sample Tests for Proportions
Learn how to compare proportions from two independent populations
Lesson Objectives
By the end of this lesson, you will be able to:
- Formulate hypotheses for comparing two population proportions
- Calculate the pooled proportion under the null hypothesis
- Check conditions for valid two-proportion z-tests
- Conduct and interpret two-proportion z-tests
- Apply two-proportion tests to real-world scenarios
1. When Do We Compare Two Proportions?
Many research questions involve comparing success rates, percentages, or proportions between two groups:
- Medical Research: Is the cure rate higher for Drug A than Drug B?
- Marketing: Do more customers prefer Product X or Product Y?
- Education: Is the pass rate different for online vs. in-person classes?
- Political Polling: Do men and women vote differently on an issue?
- Quality Control: Is the defect rate the same for two manufacturing plants?
Definition: Two-Sample Proportion Test
A two-proportion z-test is used to test whether two population proportions (p₁ and p₂) are equal based on data from two independent random samples.
2. Hypotheses for Two-Proportion Tests
We compare two population proportions: p₁ and p₂.
| Test Type | Null Hypothesis (H₀) | Alternative Hypothesis (Hₐ) |
|---|---|---|
| Two-tailed | H₀: p₁ = p₂ or H₀: p₁ - p₂ = 0 | Hₐ: p₁ ≠ p₂ or Hₐ: p₁ - p₂ ≠ 0 |
| Right-tailed | H₀: p₁ = p₂ | Hₐ: p₁ > p₂ or Hₐ: p₁ - p₂ > 0 |
| Left-tailed | H₀: p₁ = p₂ | Hₐ: p₁ < p₂ or Hₐ: p₁ - p₂ < 0 |
Example: Setting Up Hypotheses
Scenario: A researcher wants to test if the proportion of college graduates differs between two cities.
- H₀: p₁ = p₂ (the proportions are equal)
- Hₐ: p₁ ≠ p₂ (the proportions are different) — two-tailed test
3. Conditions for Two-Proportion z-Test
Before conducting the test, verify these conditions:
- Independence:
- The two samples are independent of each other
- Both samples are random samples from their populations
- Each sample size is less than 10% of its population (if sampling without replacement)
- Success-Failure Condition: For both samples, we need enough successes and failures:
- n₁p̂₁ ≥ 10 and n₁(1 - p̂₁) ≥ 10
- n₂p̂₂ ≥ 10 and n₂(1 - p̂₂) ≥ 10
4. The Pooled Proportion
Under the null hypothesis (H₀: p₁ = p₂), we assume the two populations have the same proportion. We estimate this common proportion by combining (pooling) the data from both samples.
Pooled Proportion
where:
x₁ = number of successes in sample 1
x₂ = number of successes in sample 2
n₁ = size of sample 1
n₂ = size of sample 2
Alternatively, if you're given sample proportions:
5. Test Statistic for Two Proportions
Test Statistic for Two-Proportion z-Test
where:
p̂₁, p̂₂ = sample proportions
p̄ = pooled proportion
n₁, n₂ = sample sizes
This test statistic follows a standard normal distribution (z-distribution) when the null hypothesis is true and conditions are met.
Decision Rules
- Two-tailed: Reject H₀ if |z| > z* (e.g., z* = 1.96 for α = 0.05)
- Right-tailed: Reject H₀ if z > z* (e.g., z* = 1.645 for α = 0.05)
- Left-tailed: Reject H₀ if z < -z* (e.g., z* = -1.645 for α = 0.05)
Or use the p-value approach: Reject H₀ if p-value < α.
6. Complete Example: Drug Efficacy Study
Example 1: Comparing Drug Cure Rates
Research Question: Is Drug A more effective than Drug B at curing a disease?
Data:
- Drug A: 120 out of 200 patients cured (p̂₁ = 0.60)
- Drug B: 90 out of 180 patients cured (p̂₂ = 0.50)
Significance level: α = 0.05
Step 1: State hypotheses
- H₀: p₁ = p₂ (cure rates are equal)
- Hₐ: p₁ > p₂ (Drug A has a higher cure rate) — right-tailed test
Step 2: Check conditions
- Independent random samples from two groups
- n₁p̂₁ = 200(0.60) = 120 ≥ 10, n₁(1-p̂₁) = 200(0.40) = 80 ≥ 10
- n₂p̂₂ = 180(0.50) = 90 ≥ 10, n₂(1-p̂₂) = 180(0.50) = 90 ≥ 10
- All conditions satisfied!
Step 3: Calculate pooled proportion
p̄ = (x₁ + x₂) / (n₁ + n₂)
p̄ = (120 + 90) / (200 + 180)
p̄ = 210 / 380
p̄ ≈ 0.5526
Step 4: Calculate test statistic
z = (p̂₁ - p̂₂) / √[p̄(1 - p̄)(1/n₁ + 1/n₂)]
z = (0.60 - 0.50) / √[0.5526(0.4474)(1/200 + 1/180)]
z = 0.10 / √[0.2472 × 0.01056]
z = 0.10 / √0.00261
z = 0.10 / 0.0511
z ≈ 1.96
Step 5: Find critical value and p-value
For α = 0.05 (right-tailed): z-critical = 1.645
For z = 1.96: p-value ≈ 0.025
Step 6: Make decision
Since z = 1.96 > 1.645 (or p-value = 0.025 < 0.05), we reject H₀.
Step 7: Conclusion
There is sufficient evidence at the 0.05 significance level to conclude that Drug A has a higher cure rate than Drug B. The difference in cure rates (60% vs. 50%) is statistically significant.
7. Example: Two-Tailed Test
Example 2: Gender Differences in Support
Research Question: Is there a difference in support for a policy between men and women?
Data:
- Men: 156 out of 300 support the policy (p̂₁ = 0.52)
- Women: 198 out of 350 support the policy (p̂₂ = 0.566)
Significance level: α = 0.05, two-tailed test
Step 1: Hypotheses
- H₀: p₁ = p₂ (no gender difference)
- Hₐ: p₁ ≠ p₂ (there is a gender difference)
Step 2: Check conditions
All success-failure conditions met (verify yourself for practice!)
Step 3: Calculate pooled proportion
p̄ = (156 + 198) / (300 + 350) = 354 / 650 ≈ 0.545
Step 4: Calculate test statistic
z = (0.52 - 0.566) / √[0.545(0.455)(1/300 + 1/350)]
z = -0.046 / √[0.248 × 0.00619]
z = -0.046 / 0.0391
z ≈ -1.18
Step 5: Decision
For α = 0.05 (two-tailed): z-critical = ±1.96
Since |z| = 1.18 < 1.96, we fail to reject H₀.
Step 6: Conclusion
There is insufficient evidence at the 0.05 significance level to conclude that men and women differ in their support for the policy. The observed difference (52% vs. 56.6%) could reasonably be due to sampling variability.
8. Confidence Interval for Difference in Proportions
In addition to hypothesis testing, you can construct a confidence interval for the difference p₁ - p₂. Note that for confidence intervals, we do NOT use the pooled proportion!
Confidence Interval for p₁ - p₂
where z* is the critical value for the desired confidence level
(e.g., z* = 1.96 for 95% confidence)
- Hypothesis test: Uses pooled proportion p̄
- Confidence interval: Uses individual sample proportions p̂₁ and p̂₂
Example: CI for Drug Cure Rate Difference
Using the drug data: p̂₁ = 0.60, p̂₂ = 0.50, n₁ = 200, n₂ = 180
95% CI = (0.60 - 0.50) ± 1.96√[(0.60×0.40/200) + (0.50×0.50/180)]
= 0.10 ± 1.96√[0.0012 + 0.00139]
= 0.10 ± 1.96√0.00259
= 0.10 ± 1.96(0.0509)
= 0.10 ± 0.0998
= (0.0002, 0.1998) or (0.02%, 19.98%)
Interpretation: We are 95% confident that Drug A's cure rate is between 0.02% and 19.98% higher than Drug B's cure rate. Since the interval does NOT contain 0, we can conclude Drug A is significantly better.
Check Your Understanding
Question 1: Why do we use a pooled proportion in hypothesis testing but not in confidence intervals?
Question 2: A study has n₁ = 50 with 8 successes and n₂ = 60 with 10 successes. Can you conduct a two-proportion z-test?
Question 3: If a 95% confidence interval for p₁ - p₂ is (-0.05, 0.15), what can you conclude about a two-tailed test at α = 0.05?
Key Takeaways
- Two-proportion z-tests compare proportions from two independent samples
- Pooled proportion: p̄ = (x₁ + x₂) / (n₁ + n₂) — used in hypothesis testing
- Test statistic: z = (p̂₁ - p̂₂) / √[p̄(1 - p̄)(1/n₁ + 1/n₂)]
- Check success-failure condition: At least 10 successes and 10 failures in each sample
- Confidence intervals use individual p̂₁ and p̂₂, NOT the pooled proportion
- Use z-distribution (not t-distribution) for proportion tests
- Applications: Medical trials, quality control, survey comparisons, A/B testing