The Central Limit Theorem
One of the most powerful theorems in all of statistics
Lesson Objectives
By the end of this lesson, you will be able to:
- State the Central Limit Theorem
- Identify when the CLT applies (conditions)
- Calculate the mean and standard error of the sampling distribution
- Use the CLT to find probabilities about sample means
1. Statement of the Central Limit Theorem
The Central Limit Theorem (CLT)
For a population with mean μ and standard deviation σ, the sampling distribution of the sample mean x̄ from samples of size n will be approximately normal with:
- Mean: μₓ̄ = μ
- Standard deviation (Standard Error): σₓ̄ = σ/√n
This approximation improves as the sample size n increases, and is generally considered good for n ≥ 30.
The CLT says that regardless of the population's shape (skewed, bimodal, uniform, etc.), the distribution of sample means will be approximately normal if the sample size is large enough!
This allows us to use normal distribution tools (z-scores, probabilities) even when the population isn't normal.
2. Conditions for the Central Limit Theorem
The CLT works best when:
- Random sampling: Samples are randomly selected from the population
- Independence: Individual observations are independent
- For sampling without replacement: population should be at least 10 times the sample size (10n rule)
- Sample size: One of these must be true:
- n ≥ 30 (works for most populations), OR
- The population is already normally distributed (then any n works)
Rule of thumb: If the population is strongly skewed or has extreme outliers, you may need n > 30 (sometimes n ≥ 50 or more) for the CLT to work well.
3. Shape, Center, and Spread of Sampling Distribution
Center: μₓ̄ = μ
The mean of the sampling distribution equals the population mean. Sample means center around the true population mean—they're unbiased estimates.
Spread: σₓ̄ = σ/√n (Standard Error)
Standard Error Formula:
where σ = population standard deviation, n = sample size
The standard error (SE) measures how much sample means vary from sample to sample. Notice: As n increases, SE decreases. Larger samples give more precise estimates!
- Doubling the sample size doesn't halve the SE
- To cut SE in half, you need to quadruple the sample size (because of √n)
- Example: If n = 100 gives SE = 2, then n = 400 gives SE = 1
Shape: Approximately Normal
When n is large enough (typically n ≥ 30), the sampling distribution of x̄ is approximately normal, regardless of the population's shape.
Example 1: Finding the Sampling Distribution
A population of exam scores has μ = 75 and σ = 12. We take random samples of size n = 36. Describe the sampling distribution of x̄.
Solution:
- Check conditions:
- Assume random sampling
- n = 36 ≥ 30
- CLT applies!
- Shape: Approximately normal (by CLT)
- Center: μₓ̄ = μ = 75
- Spread: σₓ̄ = σ/√n = 12/√36 = 12/6 = 2
Answer: The sampling distribution of x̄ is approximately normal with mean 75 and standard error 2. We write: x̄ ~ N(75, 2).
4. Working with Sampling Distributions
Once we know the sampling distribution is approximately normal, we can use z-scores and the normal distribution to find probabilities!
z-score for Sample Mean:
Example 2: Finding Probability for a Sample Mean
Weights of apples have μ = 150 grams and σ = 20 grams. If we select a random sample of 64 apples, what is the probability that the sample mean weight is less than 145 grams?
Solution:
Step 1: Check CLT conditions
- Random sample
- n = 64 ≥ 30
- CLT applies: x̄ ~ N(μ, σ/√n)
Step 2: Find the sampling distribution
- μₓ̄ = μ = 150
- σₓ̄ = σ/√n = 20/√64 = 20/8 = 2.5
- x̄ ~ N(150, 2.5)
Step 3: Calculate z-score
z = (x̄ - μ) / (σ/√n) = (145 - 150) / 2.5 = -5 / 2.5 = -2.0
Step 4: Find probability
P(x̄ < 145) = P(z < -2.0) = 0.0228 (from z-table)
Answer: There's about a 2.28% chance the sample mean will be less than 145 grams. This is unlikely!
Example 3: Finding a Range of Sample Means
SAT scores have μ = 1050 and σ = 200. For random samples of 100 students, what range contains the middle 95% of sample means?
Solution:
Step 1: Find sampling distribution
- μₓ̄ = 1050
- σₓ̄ = 200/√100 = 200/10 = 20
Step 2: Find z-scores for middle 95%
Middle 95% means 2.5% in each tail → z = ±1.96
Step 3: Convert z-scores to x̄ values
- Lower bound: x̄ = μ + z·σₓ̄ = 1050 + (-1.96)(20) = 1050 - 39.2 = 1010.8
- Upper bound: x̄ = μ + z·σₓ̄ = 1050 + (1.96)(20) = 1050 + 39.2 = 1089.2
Answer: The middle 95% of sample means fall between 1010.8 and 1089.2.
Interpretation: If we repeatedly take samples of 100 students, 95% of the time the sample mean will be between 1010.8 and 1089.2.
Example 4: Comparing Individual Values to Sample Means
Heights of adult men have μ = 70 inches and σ = 3 inches. Compare:
- (a) Probability that one randomly selected man is taller than 73 inches
- (b) Probability that the mean height of 25 randomly selected men exceeds 73 inches
Solution:
Part (a): Individual value
- z = (x - μ) / σ = (73 - 70) / 3 = 1.0
- P(x > 73) = P(z > 1.0) = 1 - 0.8413 = 0.1587 ≈ 15.87%
Part (b): Sample mean
- σₓ̄ = σ/√n = 3/√25 = 3/5 = 0.6
- z = (x̄ - μ) / σₓ̄ = (73 - 70) / 0.6 = 3 / 0.6 = 5.0
- P(x̄ > 73) = P(z > 5.0) ≈ 0.0000003 (essentially 0)
Answer:
- (a) About 16% chance for one man to exceed 73 inches
- (b) Essentially 0% chance for the mean of 25 men to exceed 73 inches
Key insight: Sample means vary much less than individual values! The SE (0.6) is much smaller than σ (3).
Example 5: Required Sample Size
A population has μ = 100 and σ = 24. What sample size is needed so that the standard error is no more than 3?
Solution:
We want: σₓ̄ ≤ 3
σ/√n ≤ 3
24/√n ≤ 3
24 ≤ 3√n
8 ≤ √n
64 ≤ n
Answer: We need at least n = 64 observations.
Check Your Understanding
Question 1: A population has mean 50 and standard deviation 8. For samples of size 64, what is the mean and standard error of the sampling distribution of x̄?
Answer: μₓ̄ = 50, σₓ̄ = 1
Explanation:
- μₓ̄ = μ = 50
- σₓ̄ = σ/√n = 8/√64 = 8/8 = 1
Question 2: True or False: The Central Limit Theorem says that if we take a large enough sample, the population distribution will be approximately normal.
Answer: False
Explanation: The CLT says the sampling distribution of x̄ will be approximately normal, not the population distribution. The population can have any shape; the CLT tells us about the distribution of sample means, not individual values.
Question 3: If you want to reduce the standard error by half, by what factor must you increase the sample size?
Answer: Multiply sample size by 4
Explanation: Since σₓ̄ = σ/√n, to cut SE in half:
- Original: σₓ̄ = σ/√n
- Want: σₓ̄/2 = σ/√(new n)
- σ/(2√n) = σ/√(new n)
- √(new n) = 2√n
- new n = 4n
Question 4: A population has μ = 80 and σ = 15. For samples of size 100, find P(x̄ > 82).
Answer: P(x̄ > 82) ≈ 0.0918 or 9.18%
Solution:
- σₓ̄ = 15/√100 = 15/10 = 1.5
- z = (82 - 80) / 1.5 = 2 / 1.5 = 1.33
- P(z > 1.33) = 1 - 0.9082 = 0.0918
Question 5: When can we NOT use the Central Limit Theorem?
Answer: When:
- The sample is not random
- Observations are not independent
- Sample size is too small (n < 30) AND population is not normal
- Population has extreme outliers/skewness and n is only slightly above 30
Lesson Summary
- The Central Limit Theorem states that x̄ ~ N(μ, σ/√n) when n is large (n ≥ 30)
- CLT works regardless of population shape (if n is large enough)
- Standard Error: σₓ̄ = σ/√n measures variability of sample means
- Sample means vary less than individual values (smaller SE)
- Use z = (x̄ - μ) / (σ/√n) to find probabilities about sample means
- Larger samples give smaller SE and more precise estimates