Learn Without Walls

Introduction to Sampling Distributions

Learn about parameters, statistics, and why samples vary

Lesson Objectives

By the end of this lesson, you will be able to:

1. Parameters vs. Statistics

Key Definitions

Parameter: A numerical summary of a population. Parameters are fixed but usually unknown values.

Statistic: A numerical summary of a sample. Statistics are calculated from data and vary from sample to sample.

When we study populations, we want to know parameters (like the true population mean or proportion). But populations are often too large or expensive to measure completely, so we take samples and calculate statistics to estimate the parameters.

Common Parameters and Statistics

Measure Population Parameter Sample Statistic
Mean μ (mu) x̄ (x-bar)
Standard Deviation σ (sigma) s
Proportion p p̂ (p-hat)
Variance σ²
Remember: Greek letters (μ, σ, ρ) represent population parameters. Roman letters (x̄, s, p̂) represent sample statistics.

Example 1: Identifying Parameters and Statistics

Scenario: A university wants to know the average GPA of all 25,000 students.

  • Population: All 25,000 students at the university
  • Parameter of interest: μ = true average GPA of all 25,000 students (unknown)
  • Sample: 200 randomly selected students
  • Sample statistic: x̄ = 3.24 (average GPA of the 200 students)

We use the sample statistic x̄ = 3.24 to estimate the population parameter μ. The sample mean is our best guess of the true population mean.

Example 2: Proportion Parameters

Scenario: A political poll wants to know what percentage of voters support a candidate.

  • Population: All registered voters (millions)
  • Parameter: p = true proportion of all voters who support the candidate (unknown)
  • Sample: 1,000 randomly selected voters
  • Sample statistic: p̂ = 0.52 (52% of the 1,000 voters support the candidate)

We use p̂ = 0.52 to estimate the unknown population proportion p.

2. What is a Sampling Distribution?

Definition: Sampling Distribution

A sampling distribution is the probability distribution of a sample statistic (like x̄ or p̂) based on all possible samples of size n from the population.

Imagine taking every possible sample of size n from a population, calculating the statistic (like the mean) for each sample, and then plotting the distribution of those statistics. That's a sampling distribution!

Important Distinction:
  • Population distribution: Distribution of individual values in the population
  • Sample distribution: Distribution of individual values in one sample
  • Sampling distribution: Distribution of a statistic across all possible samples

Example 3: Visualizing a Sampling Distribution

Setup: A population of 5 students with test scores: {60, 70, 80, 90, 100}

Population mean: μ = (60+70+80+90+100)/5 = 80

Question: What is the sampling distribution of x̄ for samples of size n = 2?

Solution: List all possible samples of size 2:

  • {60, 70} → x̄ = 65
  • {60, 80} → x̄ = 70
  • {60, 90} → x̄ = 75
  • {60, 100} → x̄ = 80
  • {70, 80} → x̄ = 75
  • {70, 90} → x̄ = 80
  • {70, 100} → x̄ = 85
  • {80, 90} → x̄ = 85
  • {80, 100} → x̄ = 90
  • {90, 100} → x̄ = 95

Sampling distribution of x̄:

Frequency
651
701
752
802
852
901
951

Notice: The sampling distribution centers around μ = 80, but individual sample means vary!

3. Sampling Variability

Sampling variability (or sampling error) is the natural variation in statistics from sample to sample. Different random samples from the same population will produce different sample statistics.

Why does sampling variability occur?
  • Random chance in selecting samples
  • Not every sample perfectly represents the population
  • Smaller samples tend to have more variability

Example 4: Sampling Variability in Action

Population: A university with 40% students majoring in STEM (p = 0.40)

Three random samples of n = 100 students each:

  • Sample 1: 43 STEM students → p̂₁ = 0.43
  • Sample 2: 38 STEM students → p̂₂ = 0.38
  • Sample 3: 41 STEM students → p̂₃ = 0.41

All three samples came from the same population (p = 0.40), but each produced a different sample proportion. This is sampling variability!

Key insight: None of the samples gave exactly p = 0.40, but they're all reasonably close. This is normal and expected.

4. Distribution of Sample Means

The sampling distribution of the sample mean (x̄) is particularly important. It describes how sample means behave across all possible samples of size n.

Key Properties (we'll prove these in Lesson 2):
  1. Center: The mean of all sample means equals the population mean (μₓ̄ = μ)
  2. Spread: Sample means vary less than individual values
  3. Shape: Under certain conditions, the distribution is approximately normal

This is powerful! Even if the population isn't normally distributed, the distribution of sample means often is. This is the foundation of the Central Limit Theorem (next lesson).

Check Your Understanding

Question 1: A factory wants to know the average weight of all widgets it produces. They measure 50 widgets and find x̄ = 12.3 ounces. Is 12.3 ounces a parameter or a statistic?

Answer: Statistic

Explanation: The value 12.3 ounces is calculated from a sample of 50 widgets, not the entire population. Therefore, x̄ = 12.3 is a sample statistic that estimates the unknown population parameter μ.

Question 2: True or False: If we take two different random samples from the same population, we should expect to get exactly the same sample mean.

Answer: False

Explanation: Due to sampling variability, different random samples will produce different sample means. This is normal! The sample means will likely be close to each other and to the population mean, but rarely exactly equal.

Question 3: A population consists of the values {2, 4, 6}. If we take all possible samples of size 2 (with replacement), what is the mean of the sampling distribution of x̄?

Answer: μₓ̄ = 4

Explanation: The population mean is μ = (2+4+6)/3 = 4. A key property of sampling distributions is that the mean of all sample means equals the population mean. So μₓ̄ = μ = 4.

You could verify by listing all samples: {2,2}, {2,4}, {2,6}, {4,2}, {4,4}, {4,6}, {6,2}, {6,4}, {6,6}, calculating their means, and finding the average.

Question 4: Which notation represents the standard deviation of a sample: σ or s?

Answer: s

Explanation: The Greek letter σ (sigma) represents the population standard deviation (a parameter). The Roman letter s represents the sample standard deviation (a statistic).

Question 5: What is the primary reason for studying sampling distributions?

Answer: To understand how sample statistics behave and to make inferences about population parameters.

Explanation: Sampling distributions tell us how much sample statistics vary from sample to sample. This allows us to determine how reliable our estimates are and make probability statements about populations based on sample data. They're the foundation of statistical inference!

Lesson Summary

← Back to Module 5 Next: Central Limit Theorem →