Lesson 4: Normal Approximation & Applications
Learn when to use the normal distribution and apply it to real-world scenarios
When Can We Use the Normal Distribution?
Not all data follows a normal distribution! Before using normal distribution methods, we should check whether the data is approximately normally distributed.
Characteristics of Normal Data
Data is likely normally distributed if it exhibits:
- Symmetry: Data is roughly symmetric around the center
- Bell shape: Most values cluster near the mean, with fewer at extremes
- Unimodal: One clear peak (not bimodal or multimodal)
- No extreme outliers: Very few values far from the mean
- Continuous data: Can take any value in a range (height, weight, time)
Visual Methods to Check Normality
1. Histogram
A histogram should show a roughly bell-shaped, symmetric pattern if the data is normal.
- Good sign: Single peak near center, symmetric tails
- Warning signs: Skewed left/right, multiple peaks, gaps, extreme outliers
2. Normal Probability Plot (Q-Q Plot)
A normal probability plot (also called a Q-Q plot) plots the data points against what we'd expect if the data were perfectly normal.
- Points form a straight line: Data is approximately normal
- Points curve upward at ends: Data has heavier tails (more extreme values)
- Points curve at one end: Data is skewed
- Points have an S-shape: Data is skewed or has outliers
- Data is strongly skewed (income, house prices)
- Data has discrete values only (unless using approximation)
- Data has natural boundaries (percentages can't exceed 100%)
- Data shows multiple peaks (bimodal distributions)
- Sample size is very small (harder to assess normality)
Normal Approximation to the Binomial
One powerful application is using the normal distribution to approximate binomial probabilities when the number of trials is large.
When to Use Normal Approximation to Binomial
If X ~ Binomial(n, p), we can approximate with a normal distribution when:
- np ≥ 10 (expected number of successes)
- n(1-p) ≥ 10 (expected number of failures)
Then: X ~ approximately N(μ, σ) where:
- μ = np (mean)
- σ = √[np(1-p)] (standard deviation)
Example 1: Checking If We Can Use Normal Approximation
Scenario A: 200 coin flips, p = 0.5
Check conditions:
- np = 200(0.5) = 100 ≥ 10
- n(1-p) = 200(0.5) = 100 ≥ 10
Conclusion: Normal approximation is appropriate!
Scenario B: 20 trials, p = 0.1
Check conditions:
- np = 20(0.1) = 2 < 10
- n(1-p) = 20(0.9) = 18 ≥ 10
Conclusion: Normal approximation is NOT appropriate (np too small).
Continuity Correction
When using a continuous normal distribution to approximate a discrete binomial distribution, we need a continuity correction to improve accuracy.
What is Continuity Correction?
Since the binomial is discrete (counts whole numbers) but the normal is continuous, we adjust by ±0.5 to account for this difference:
- P(X = k) becomes P(k - 0.5 < X < k + 0.5)
- P(X ≤ k) becomes P(X < k + 0.5)
- P(X ≥ k) becomes P(X > k - 0.5)
- P(X < k) becomes P(X < k - 0.5)
- P(X > k) becomes P(X > k + 0.5)
Example 2: Normal Approximation with Continuity Correction
Scenario: A fair coin is flipped 100 times. What is the probability of getting exactly 60 heads?
Step 1: Check if normal approximation is appropriate
- n = 100, p = 0.5
- np = 100(0.5) = 50 ≥ 10
- n(1-p) = 100(0.5) = 50 ≥ 10
Step 2: Find parameters of normal approximation
- μ = np = 100(0.5) = 50
- σ = √[np(1-p)] = √[100(0.5)(0.5)] = √25 = 5
Step 3: Apply continuity correction
- P(X = 60) ≈ P(59.5 < X < 60.5)
Step 4: Calculate z-scores
- z₁ = (59.5 - 50) / 5 = 9.5 / 5 = 1.9
- z₂ = (60.5 - 50) / 5 = 10.5 / 5 = 2.1
Step 5: Find probability (using z-table)
- P(Z ≤ 1.9) ≈ 0.9713
- P(Z ≤ 2.1) ≈ 0.9821
- P(59.5 < X < 60.5) = 0.9821 - 0.9713 = 0.0108
Answer: About 1.08% chance of getting exactly 60 heads.
Real-World Applications
1. Quality Control in Manufacturing
Example 3: Quality Control
Scenario: A factory produces bottles labeled "500 mL." The actual volumes are normally distributed with μ = 502 mL and σ = 3 mL.
Question: What percentage of bottles contain less than the advertised 500 mL?
Solution:
- z = (500 - 502) / 3 = -2/3 ≈ -0.67
- P(Z ≤ -0.67) ≈ 0.2514
Answer: About 25% of bottles are underfilled. This might be a quality concern!
Follow-up: If the company wants no more than 1% of bottles underfilled, how should they adjust the mean?
- Need P(X < 500) = 0.01, which corresponds to z ≈ -2.33
- -2.33 = (500 - μ) / 3
- -6.99 = 500 - μ
- μ = 506.99 mL
Solution: Set the target fill to about 507 mL to ensure less than 1% are underfilled.
2. Medical and Health Applications
Example 4: Blood Pressure Screening
Scenario: Systolic blood pressure in adult males is approximately N(120, 15) mmHg. High blood pressure is defined as 140 mmHg or higher.
Question A: What percentage of adult males have high blood pressure?
Solution:
- z = (140 - 120) / 15 = 20/15 ≈ 1.33
- P(Z ≤ 1.33) ≈ 0.9082
- P(X > 140) = 1 - 0.9082 = 0.0918
Answer: About 9.2% have high blood pressure.
Question B: A medical study wants to recruit men in the top 5% of blood pressure. What's the minimum blood pressure for inclusion?
Solution:
- Top 5% means 95th percentile
- z ≈ 1.645 (from z-table for P = 0.95)
- x = 120 + (1.645)(15) = 120 + 24.675 = 144.7
Answer: Minimum blood pressure of about 145 mmHg.
3. Educational Testing and Placement
Example 5: College Admission Standards
Scenario: A university wants to admit students in the top 15% of test scores. Scores are N(1200, 150).
Question: What minimum score is needed for admission?
Solution:
- Top 15% means 85th percentile (since 100% - 15% = 85%)
- From z-table: P(Z ≤ 1.04) ≈ 0.85
- x = 1200 + (1.04)(150) = 1200 + 156 = 1356
Answer: Minimum score of about 1356 for admission.
Check Your Understanding
Try these questions to test what you've learned in this lesson.
Question 1: Can we use normal approximation for a binomial with n = 50 and p = 0.05?
Answer: No, normal approximation is not appropriate.
Explanation:
- np = 50(0.05) = 2.5 < 10 (fails condition)
- n(1-p) = 50(0.95) = 47.5 ≥ 10
- Since np < 10, we should not use normal approximation.
Question 2: What does it mean if a normal probability plot shows points forming a straight line?
Answer: The data is approximately normally distributed.
Explanation: When data points fall close to a straight line on a normal probability plot, it indicates the data follows a normal distribution reasonably well.
Question 3: For binomial with n = 100, p = 0.4, what are μ and σ for the normal approximation?
Answer: μ = 40, σ ≈ 4.90
Solution:
- μ = np = 100(0.4) = 40
- σ = √[np(1-p)] = √[100(0.4)(0.6)] = √24 ≈ 4.90
Question 4: Using continuity correction, how would you write P(X ≥ 50) for a discrete distribution?
Answer: P(X > 49.5)
Explanation: For "greater than or equal to k", we use "greater than k - 0.5" in the continuous approximation. So P(X ≥ 50) becomes P(X > 49.5).
Question 5: Package weights are N(500, 10) grams. Below what weight are the lightest 2.5% of packages? (z = -1.96 gives P(Z ≤ -1.96) = 0.025)
Answer: About 480.4 grams (or approximately 480 grams)
Solution:
- Need 2.5th percentile, so z ≈ -1.96
- x = μ + z·σ = 500 + (-1.96)(10) = 500 - 19.6 = 480.4
Key Takeaways from Lesson 4
- Check data for symmetry, bell shape, and no extreme outliers before assuming normality
- Use histograms and normal probability plots to visually assess normality
- Normal approximation to binomial: Use when np ≥ 10 AND n(1-p) ≥ 10
- For approximation: μ = np and σ = √[np(1-p)]
- Always use continuity correction when approximating discrete with continuous
- Normal distribution applies to quality control, medical screening, testing, and more
- Can work backwards to find cutoff values for percentiles