Introduction to Sampling Distributions
Learn about parameters, statistics, and why samples vary
Lesson Objectives
By the end of this lesson, you will be able to:
- Distinguish between parameters and statistics
- Understand proper notation for population and sample measures
- Explain what a sampling distribution is
- Describe sampling variability and why it occurs
1. Parameters vs. Statistics
Key Definitions
Parameter: A numerical summary of a population. Parameters are fixed but usually unknown values.
Statistic: A numerical summary of a sample. Statistics are calculated from data and vary from sample to sample.
When we study populations, we want to know parameters (like the true population mean or proportion). But populations are often too large or expensive to measure completely, so we take samples and calculate statistics to estimate the parameters.
Common Parameters and Statistics
| Measure | Population Parameter | Sample Statistic |
|---|---|---|
| Mean | μ (mu) | x̄ (x-bar) |
| Standard Deviation | σ (sigma) | s |
| Proportion | p | p̂ (p-hat) |
| Variance | σ² | s² |
Example 1: Identifying Parameters and Statistics
Scenario: A university wants to know the average GPA of all 25,000 students.
- Population: All 25,000 students at the university
- Parameter of interest: μ = true average GPA of all 25,000 students (unknown)
- Sample: 200 randomly selected students
- Sample statistic: x̄ = 3.24 (average GPA of the 200 students)
We use the sample statistic x̄ = 3.24 to estimate the population parameter μ. The sample mean is our best guess of the true population mean.
Example 2: Proportion Parameters
Scenario: A political poll wants to know what percentage of voters support a candidate.
- Population: All registered voters (millions)
- Parameter: p = true proportion of all voters who support the candidate (unknown)
- Sample: 1,000 randomly selected voters
- Sample statistic: p̂ = 0.52 (52% of the 1,000 voters support the candidate)
We use p̂ = 0.52 to estimate the unknown population proportion p.
2. What is a Sampling Distribution?
Definition: Sampling Distribution
A sampling distribution is the probability distribution of a sample statistic (like x̄ or p̂) based on all possible samples of size n from the population.
Imagine taking every possible sample of size n from a population, calculating the statistic (like the mean) for each sample, and then plotting the distribution of those statistics. That's a sampling distribution!
- Population distribution: Distribution of individual values in the population
- Sample distribution: Distribution of individual values in one sample
- Sampling distribution: Distribution of a statistic across all possible samples
Example 3: Visualizing a Sampling Distribution
Setup: A population of 5 students with test scores: {60, 70, 80, 90, 100}
Population mean: μ = (60+70+80+90+100)/5 = 80
Question: What is the sampling distribution of x̄ for samples of size n = 2?
Solution: List all possible samples of size 2:
- {60, 70} → x̄ = 65
- {60, 80} → x̄ = 70
- {60, 90} → x̄ = 75
- {60, 100} → x̄ = 80
- {70, 80} → x̄ = 75
- {70, 90} → x̄ = 80
- {70, 100} → x̄ = 85
- {80, 90} → x̄ = 85
- {80, 100} → x̄ = 90
- {90, 100} → x̄ = 95
Sampling distribution of x̄:
| x̄ | Frequency |
|---|---|
| 65 | 1 |
| 70 | 1 |
| 75 | 2 |
| 80 | 2 |
| 85 | 2 |
| 90 | 1 |
| 95 | 1 |
Notice: The sampling distribution centers around μ = 80, but individual sample means vary!
3. Sampling Variability
Sampling variability (or sampling error) is the natural variation in statistics from sample to sample. Different random samples from the same population will produce different sample statistics.
- Random chance in selecting samples
- Not every sample perfectly represents the population
- Smaller samples tend to have more variability
Example 4: Sampling Variability in Action
Population: A university with 40% students majoring in STEM (p = 0.40)
Three random samples of n = 100 students each:
- Sample 1: 43 STEM students → p̂₁ = 0.43
- Sample 2: 38 STEM students → p̂₂ = 0.38
- Sample 3: 41 STEM students → p̂₃ = 0.41
All three samples came from the same population (p = 0.40), but each produced a different sample proportion. This is sampling variability!
Key insight: None of the samples gave exactly p = 0.40, but they're all reasonably close. This is normal and expected.
4. Distribution of Sample Means
The sampling distribution of the sample mean (x̄) is particularly important. It describes how sample means behave across all possible samples of size n.
- Center: The mean of all sample means equals the population mean (μₓ̄ = μ)
- Spread: Sample means vary less than individual values
- Shape: Under certain conditions, the distribution is approximately normal
This is powerful! Even if the population isn't normally distributed, the distribution of sample means often is. This is the foundation of the Central Limit Theorem (next lesson).
Check Your Understanding
Question 1: A factory wants to know the average weight of all widgets it produces. They measure 50 widgets and find x̄ = 12.3 ounces. Is 12.3 ounces a parameter or a statistic?
Answer: Statistic
Explanation: The value 12.3 ounces is calculated from a sample of 50 widgets, not the entire population. Therefore, x̄ = 12.3 is a sample statistic that estimates the unknown population parameter μ.
Question 2: True or False: If we take two different random samples from the same population, we should expect to get exactly the same sample mean.
Answer: False
Explanation: Due to sampling variability, different random samples will produce different sample means. This is normal! The sample means will likely be close to each other and to the population mean, but rarely exactly equal.
Question 3: A population consists of the values {2, 4, 6}. If we take all possible samples of size 2 (with replacement), what is the mean of the sampling distribution of x̄?
Answer: μₓ̄ = 4
Explanation: The population mean is μ = (2+4+6)/3 = 4. A key property of sampling distributions is that the mean of all sample means equals the population mean. So μₓ̄ = μ = 4.
You could verify by listing all samples: {2,2}, {2,4}, {2,6}, {4,2}, {4,4}, {4,6}, {6,2}, {6,4}, {6,6}, calculating their means, and finding the average.
Question 4: Which notation represents the standard deviation of a sample: σ or s?
Answer: s
Explanation: The Greek letter σ (sigma) represents the population standard deviation (a parameter). The Roman letter s represents the sample standard deviation (a statistic).
Question 5: What is the primary reason for studying sampling distributions?
Answer: To understand how sample statistics behave and to make inferences about population parameters.
Explanation: Sampling distributions tell us how much sample statistics vary from sample to sample. This allows us to determine how reliable our estimates are and make probability statements about populations based on sample data. They're the foundation of statistical inference!
Lesson Summary
- Parameters describe populations (μ, σ, p); statistics describe samples (x̄, s, p̂)
- A sampling distribution shows how a statistic varies across all possible samples
- Sampling variability is the natural variation in statistics from sample to sample
- The sampling distribution of x̄ centers at μ and has important predictable properties
- Understanding sampling distributions is essential for making inferences about populations