Sampling Distribution of Proportions
Apply CLT concepts to categorical data and sample proportions
Lesson Objectives
By the end of this lesson, you will be able to:
- Understand the sampling distribution of sample proportions (p̂)
- Apply the success-failure condition for normal approximation
- Calculate the mean and standard error for sample proportions
- Find probabilities involving sample proportions
1. Introduction to Sample Proportions
Sample Proportion
The sample proportion p̂ (pronounced "p-hat") is the fraction of observations in a sample that have a particular characteristic.
where x = number of "successes" in the sample, n = sample size
Just like sample means (x̄) estimate population means (μ), sample proportions (p̂) estimate population proportions (p).
Example 1: Calculating Sample Proportion
In a random sample of 200 voters, 108 support Candidate A. What is the sample proportion?
Solution:
- x = 108 (voters who support Candidate A)
- n = 200 (total voters in sample)
- p̂ = x/n = 108/200 = 0.54
Answer: p̂ = 0.54 or 54% of the sample supports Candidate A.
2. Sampling Distribution of p̂
Just as different samples produce different sample means (x̄), different samples produce different sample proportions (p̂). The sampling distribution of p̂ describes how p̂ varies across all possible samples.
- Center: μₚ̂ = p (the true population proportion)
- Spread: σₚ̂ = √(p(1-p)/n) (standard error)
- Shape: Approximately normal when certain conditions are met
Standard Error for Sample Proportion:
Notice: Unlike the SE for means (σ/√n), the SE for proportions depends on p itself. The variability is highest when p = 0.5 and lower when p is near 0 or 1.
3. Conditions for Normal Approximation
The sampling distribution of p̂ is approximately normal when the success-failure condition is met:
Both of these must be true:
- np ≥ 10 (expected number of successes)
- n(1-p) ≥ 10 (expected number of failures)
If both conditions are met, then p̂ ~ N(p, √(p(1-p)/n))
Example 2: Checking the Success-Failure Condition
Determine whether normal approximation is appropriate for p̂ in these situations:
(a) p = 0.3, n = 50
- np = 50(0.3) = 15 ≥ 10
- n(1-p) = 50(0.7) = 35 ≥ 10
- Result: Normal approximation is appropriate
(b) p = 0.05, n = 100
- np = 100(0.05) = 5 < 10
- n(1-p) = 100(0.95) = 95 ≥ 10
- Result: Normal approximation is NOT appropriate (first condition fails)
(c) p = 0.2, n = 80
- np = 80(0.2) = 16 ≥ 10
- n(1-p) = 80(0.8) = 64 ≥ 10
- Result: Normal approximation is appropriate
4. Finding Probabilities with Sample Proportions
When conditions are met, we can use the normal distribution to find probabilities about sample proportions.
z-score for Sample Proportion:
Example 3: Probability Involving Sample Proportion
In a large city, 35% of residents support a new tax proposal (p = 0.35). A random sample of 200 residents is selected. What is the probability that the sample proportion supporting the tax is between 0.30 and 0.40?
Solution:
Step 1: Check conditions
- np = 200(0.35) = 70 ≥ 10
- n(1-p) = 200(0.65) = 130 ≥ 10
- Normal approximation is appropriate
Step 2: Find sampling distribution
- μₚ̂ = p = 0.35
- σₚ̂ = √(p(1-p)/n) = √(0.35 × 0.65 / 200) = √(0.2275/200) = √0.0011375 ≈ 0.0337
Step 3: Calculate z-scores
- For p̂ = 0.30: z = (0.30 - 0.35) / 0.0337 = -0.05 / 0.0337 ≈ -1.48
- For p̂ = 0.40: z = (0.40 - 0.35) / 0.0337 = 0.05 / 0.0337 ≈ 1.48
Step 4: Find probability
- P(-1.48 < z < 1.48) = P(z < 1.48) - P(z < -1.48)
- = 0.9306 - 0.0694 = 0.8612
Answer: There's about an 86.12% chance that the sample proportion will be between 0.30 and 0.40.
Example 4: Finding Unusual Sample Proportions
A company claims that 10% of its products are defective (p = 0.10). A quality inspector takes a random sample of 400 products and finds 52 defective items (p̂ = 0.13). Is this sample result unusually high if the company's claim is true?
Solution:
Step 1: Check conditions
- np = 400(0.10) = 40 ≥ 10
- n(1-p) = 400(0.90) = 360 ≥ 10
Step 2: Find sampling distribution
- μₚ̂ = 0.10
- σₚ̂ = √(0.10 × 0.90 / 400) = √(0.09/400) = √0.000225 = 0.015
Step 3: Calculate z-score
- z = (0.13 - 0.10) / 0.015 = 0.03 / 0.015 = 2.0
Step 4: Find probability
- P(p̂ ≥ 0.13) = P(z ≥ 2.0) = 1 - 0.9772 = 0.0228
Answer: Only about 2.28% of samples would have p̂ ≥ 0.13 if the true proportion is 0.10. This is unusual! The inspector might question the company's claim.
Example 5: Finding Required Sample Size
A pollster wants to estimate the proportion of voters supporting a candidate (assume p ≈ 0.5). What sample size is needed so that the standard error is no more than 0.02?
Solution:
We want: σₚ̂ ≤ 0.02
√(p(1-p)/n) ≤ 0.02
Using p = 0.5 (worst case, maximum variability):
√(0.5 × 0.5 / n) ≤ 0.02
√(0.25/n) ≤ 0.02
0.25/n ≤ 0.0004
0.25 ≤ 0.0004n
n ≥ 0.25/0.0004 = 625
Answer: The pollster needs at least n = 625 voters in the sample.
Check Your Understanding
Question 1: In a sample of 150 students, 45 are left-handed. What is the sample proportion?
Answer: p̂ = 0.30 or 30%
Calculation: p̂ = x/n = 45/150 = 0.30
Question 2: For p = 0.4 and n = 60, verify that normal approximation is appropriate for the sampling distribution of p̂.
Answer: Yes, normal approximation is appropriate.
Check:
- np = 60(0.4) = 24 ≥ 10
- n(1-p) = 60(0.6) = 36 ≥ 10
Both conditions are satisfied.
Question 3: If p = 0.6 and n = 100, what is the standard error of the sample proportion?
Answer: σₚ̂ = 0.049 or about 0.05
Calculation:
- σₚ̂ = √(p(1-p)/n)
- = √(0.6 × 0.4 / 100)
- = √(0.24/100)
- = √0.0024 ≈ 0.049
Question 4: True or False: The standard error for proportions is largest when p = 0.5.
Answer: True
Explanation: The expression p(1-p) is maximized when p = 0.5, giving p(1-p) = 0.5 × 0.5 = 0.25. When p is near 0 or 1, p(1-p) is smaller, resulting in less variability in sample proportions.
Question 5: Nationally, 25% of college students work full-time. In a random sample of 200 students, what's the probability that fewer than 20% work full-time?
Answer: P(p̂ < 0.20) ≈ 0.0516 or 5.16%
Solution:
- Check: np = 200(0.25) = 50 , n(1-p) = 150
- σₚ̂ = √(0.25 × 0.75 / 200) = √(0.1875/200) ≈ 0.0306
- z = (0.20 - 0.25) / 0.0306 = -0.05 / 0.0306 ≈ -1.63
- P(z < -1.63) ≈ 0.0516
Lesson Summary
- Sample proportion: p̂ = x/n estimates population proportion p
- Sampling distribution of p̂: Mean = p, SE = √(p(1-p)/n)
- Success-failure condition: np ≥ 10 AND n(1-p) ≥ 10 for normal approximation
- z-score: z = (p̂ - p) / √(p(1-p)/n)
- SE is largest when p = 0.5 (maximum variability)
- Use normal distribution methods when conditions are met