Learn Without Walls

Lesson 4: Normal Approximation & Applications

Learn when to use the normal distribution and apply it to real-world scenarios

When Can We Use the Normal Distribution?

Not all data follows a normal distribution! Before using normal distribution methods, we should check whether the data is approximately normally distributed.

Characteristics of Normal Data

Data is likely normally distributed if it exhibits:

  • Symmetry: Data is roughly symmetric around the center
  • Bell shape: Most values cluster near the mean, with fewer at extremes
  • Unimodal: One clear peak (not bimodal or multimodal)
  • No extreme outliers: Very few values far from the mean
  • Continuous data: Can take any value in a range (height, weight, time)

Visual Methods to Check Normality

1. Histogram

A histogram should show a roughly bell-shaped, symmetric pattern if the data is normal.

2. Normal Probability Plot (Q-Q Plot)

A normal probability plot (also called a Q-Q plot) plots the data points against what we'd expect if the data were perfectly normal.

How to interpret:
  • Points form a straight line: Data is approximately normal
  • Points curve upward at ends: Data has heavier tails (more extreme values)
  • Points curve at one end: Data is skewed
  • Points have an S-shape: Data is skewed or has outliers
When NOT to Use Normal Distribution:
  • Data is strongly skewed (income, house prices)
  • Data has discrete values only (unless using approximation)
  • Data has natural boundaries (percentages can't exceed 100%)
  • Data shows multiple peaks (bimodal distributions)
  • Sample size is very small (harder to assess normality)

Normal Approximation to the Binomial

One powerful application is using the normal distribution to approximate binomial probabilities when the number of trials is large.

When to Use Normal Approximation to Binomial

If X ~ Binomial(n, p), we can approximate with a normal distribution when:

  • np ≥ 10 (expected number of successes)
  • n(1-p) ≥ 10 (expected number of failures)

Then: X ~ approximately N(μ, σ) where:

  • μ = np (mean)
  • σ = √[np(1-p)] (standard deviation)
Why does this work? When n is large, the discrete binomial distribution becomes more and more bell-shaped and symmetric, resembling a normal distribution. This is a consequence of the Central Limit Theorem!

Example 1: Checking If We Can Use Normal Approximation

Scenario A: 200 coin flips, p = 0.5

Check conditions:

  • np = 200(0.5) = 100 ≥ 10
  • n(1-p) = 200(0.5) = 100 ≥ 10

Conclusion: Normal approximation is appropriate!


Scenario B: 20 trials, p = 0.1

Check conditions:

  • np = 20(0.1) = 2 < 10
  • n(1-p) = 20(0.9) = 18 ≥ 10

Conclusion: Normal approximation is NOT appropriate (np too small).

Continuity Correction

When using a continuous normal distribution to approximate a discrete binomial distribution, we need a continuity correction to improve accuracy.

What is Continuity Correction?

Since the binomial is discrete (counts whole numbers) but the normal is continuous, we adjust by ±0.5 to account for this difference:

  • P(X = k) becomes P(k - 0.5 < X < k + 0.5)
  • P(X ≤ k) becomes P(X < k + 0.5)
  • P(X ≥ k) becomes P(X > k - 0.5)
  • P(X < k) becomes P(X < k - 0.5)
  • P(X > k) becomes P(X > k + 0.5)

Example 2: Normal Approximation with Continuity Correction

Scenario: A fair coin is flipped 100 times. What is the probability of getting exactly 60 heads?

Step 1: Check if normal approximation is appropriate

  • n = 100, p = 0.5
  • np = 100(0.5) = 50 ≥ 10
  • n(1-p) = 100(0.5) = 50 ≥ 10

Step 2: Find parameters of normal approximation

  • μ = np = 100(0.5) = 50
  • σ = √[np(1-p)] = √[100(0.5)(0.5)] = √25 = 5

Step 3: Apply continuity correction

  • P(X = 60) ≈ P(59.5 < X < 60.5)

Step 4: Calculate z-scores

  • z₁ = (59.5 - 50) / 5 = 9.5 / 5 = 1.9
  • z₂ = (60.5 - 50) / 5 = 10.5 / 5 = 2.1

Step 5: Find probability (using z-table)

  • P(Z ≤ 1.9) ≈ 0.9713
  • P(Z ≤ 2.1) ≈ 0.9821
  • P(59.5 < X < 60.5) = 0.9821 - 0.9713 = 0.0108

Answer: About 1.08% chance of getting exactly 60 heads.

Real-World Applications

1. Quality Control in Manufacturing

Example 3: Quality Control

Scenario: A factory produces bottles labeled "500 mL." The actual volumes are normally distributed with μ = 502 mL and σ = 3 mL.

Question: What percentage of bottles contain less than the advertised 500 mL?

Solution:

  • z = (500 - 502) / 3 = -2/3 ≈ -0.67
  • P(Z ≤ -0.67) ≈ 0.2514

Answer: About 25% of bottles are underfilled. This might be a quality concern!

Follow-up: If the company wants no more than 1% of bottles underfilled, how should they adjust the mean?

  • Need P(X < 500) = 0.01, which corresponds to z ≈ -2.33
  • -2.33 = (500 - μ) / 3
  • -6.99 = 500 - μ
  • μ = 506.99 mL

Solution: Set the target fill to about 507 mL to ensure less than 1% are underfilled.

2. Medical and Health Applications

Example 4: Blood Pressure Screening

Scenario: Systolic blood pressure in adult males is approximately N(120, 15) mmHg. High blood pressure is defined as 140 mmHg or higher.

Question A: What percentage of adult males have high blood pressure?

Solution:

  • z = (140 - 120) / 15 = 20/15 ≈ 1.33
  • P(Z ≤ 1.33) ≈ 0.9082
  • P(X > 140) = 1 - 0.9082 = 0.0918

Answer: About 9.2% have high blood pressure.

Question B: A medical study wants to recruit men in the top 5% of blood pressure. What's the minimum blood pressure for inclusion?

Solution:

  • Top 5% means 95th percentile
  • z ≈ 1.645 (from z-table for P = 0.95)
  • x = 120 + (1.645)(15) = 120 + 24.675 = 144.7

Answer: Minimum blood pressure of about 145 mmHg.

3. Educational Testing and Placement

Example 5: College Admission Standards

Scenario: A university wants to admit students in the top 15% of test scores. Scores are N(1200, 150).

Question: What minimum score is needed for admission?

Solution:

  • Top 15% means 85th percentile (since 100% - 15% = 85%)
  • From z-table: P(Z ≤ 1.04) ≈ 0.85
  • x = 1200 + (1.04)(150) = 1200 + 156 = 1356

Answer: Minimum score of about 1356 for admission.

Check Your Understanding

Try these questions to test what you've learned in this lesson.

Question 1: Can we use normal approximation for a binomial with n = 50 and p = 0.05?

Answer: No, normal approximation is not appropriate.

Explanation:

  • np = 50(0.05) = 2.5 < 10 (fails condition)
  • n(1-p) = 50(0.95) = 47.5 ≥ 10
  • Since np < 10, we should not use normal approximation.

Question 2: What does it mean if a normal probability plot shows points forming a straight line?

Answer: The data is approximately normally distributed.

Explanation: When data points fall close to a straight line on a normal probability plot, it indicates the data follows a normal distribution reasonably well.

Question 3: For binomial with n = 100, p = 0.4, what are μ and σ for the normal approximation?

Answer: μ = 40, σ ≈ 4.90

Solution:

  • μ = np = 100(0.4) = 40
  • σ = √[np(1-p)] = √[100(0.4)(0.6)] = √24 ≈ 4.90

Question 4: Using continuity correction, how would you write P(X ≥ 50) for a discrete distribution?

Answer: P(X > 49.5)

Explanation: For "greater than or equal to k", we use "greater than k - 0.5" in the continuous approximation. So P(X ≥ 50) becomes P(X > 49.5).

Question 5: Package weights are N(500, 10) grams. Below what weight are the lightest 2.5% of packages? (z = -1.96 gives P(Z ≤ -1.96) = 0.025)

Answer: About 480.4 grams (or approximately 480 grams)

Solution:

  • Need 2.5th percentile, so z ≈ -1.96
  • x = μ + z·σ = 500 + (-1.96)(10) = 500 - 19.6 = 480.4

Key Takeaways from Lesson 4

← Previous: Lesson 3 Next: Practice Problems →