Discrete Probability Distributions
Master probability distributions, expected value, and variance for discrete random variables
Lesson Objectives
By the end of this lesson, you will be able to:
- Define a discrete probability distribution and identify its properties
- Verify if a given distribution is valid
- Calculate expected value E(X) = μ using the formula Σ[x · P(X=x)]
- Calculate variance Var(X) = σ² using the formula Σ[(x - μ)² · P(X=x)]
- Calculate standard deviation σ = √Var(X)
- Interpret expected value and standard deviation in context
1. What is a Discrete Probability Distribution?
Definition: Discrete Probability Distribution
A discrete probability distribution lists all possible values of a discrete random variable along with their probabilities. It shows the complete probability model for the random variable.
A discrete probability distribution can be represented as a:
- Table showing x values and P(X = x)
- Graph (probability histogram or bar chart)
- Formula that generates probabilities
Properties of a Valid Discrete Probability Distribution
For a distribution to be valid, it must satisfy TWO conditions:
Example 1: Valid vs. Invalid Distributions
Distribution A:
| x | P(X = x) |
|---|---|
| 0 | 0.2 |
| 1 | 0.5 |
| 2 | 0.3 |
Is this valid? Check: All probabilities between 0 and 1
Sum: 0.2 + 0.5 + 0.3 = 1.0 VALID!
Distribution B:
| x | P(X = x) |
|---|---|
| 1 | 0.4 |
| 2 | 0.3 |
| 3 | 0.2 |
Is this valid? Check: All probabilities between 0 and 1
Sum: 0.4 + 0.3 + 0.2 = 0.9 INVALID! (Doesn't sum to 1)
2. US Household Size Distribution (2020 Data)
Let's examine a real-world discrete probability distribution using data from the 2020 US Census. Let X = number of people in a randomly selected US household.
US Household Size Distribution (2020)
| Household Size (x) | Probability P(X = x) |
|---|---|
| 1 person | 0.28 |
| 2 people | 0.35 |
| 3 people | 0.15 |
| 4 people | 0.12 |
| 5 people | 0.06 |
| 6 people | 0.03 |
| 7+ people | 0.01 |
Check: Σ P(X = x) = 0.28 + 0.35 + 0.15 + 0.12 + 0.06 + 0.03 + 0.01 = 1.00
This distribution tells us, for example:
- P(X = 1) = 0.28 → 28% of US households have 1 person
- P(X = 2) = 0.35 → 35% have 2 people (most common)
- P(X ≥ 5) = 0.06 + 0.03 + 0.01 = 0.10 → 10% have 5 or more people
If you randomly select a US household, the probability distribution tells you the likelihood of each possible household size. The most common size is 2 people (35%), while larger households are progressively less common.
3. Expected Value E(X)
Definition: Expected Value
The expected value (also called the mean or expectation) of a discrete random variable X is the long-run average value of X over many repetitions. It represents the "center" of the distribution.
Expected Value Formula
In words: Multiply each value by its probability, then sum all products
Example 2: Calculating E(X) for US Household Size
Let's calculate the expected household size using the 2020 data:
| x | P(X = x) | x · P(X = x) |
|---|---|---|
| 1 | 0.28 | 1 × 0.28 = 0.28 |
| 2 | 0.35 | 2 × 0.35 = 0.70 |
| 3 | 0.15 | 3 × 0.15 = 0.45 |
| 4 | 0.12 | 4 × 0.12 = 0.48 |
| 5 | 0.06 | 5 × 0.06 = 0.30 |
| 6 | 0.03 | 6 × 0.03 = 0.18 |
| 7 | 0.01 | 7 × 0.01 = 0.07 |
| E(X) = Σ [x · P(X = x)] = | 2.46 | |
Interpretation: The expected (average) US household size is 2.46 people. This doesn't mean any household has exactly 2.46 people (impossible!), but it's the long-run average if you surveyed many households.
- E(X) may not be a possible value of X (like 2.46 people)
- E(X) is the "balance point" or center of the distribution
- E(X) is affected by extreme values (like large households)
- We use μ (mu) to denote E(X), just like population mean
4. Variance and Standard Deviation
Definition: Variance and Standard Deviation
The variance Var(X) = σ² measures the spread or variability of a probability distribution around its mean. It quantifies how far values typically are from E(X).
The standard deviation σ = √Var(X) is the square root of variance, measured in the same units as X.
Variance Formula
In words: For each value, find (x - μ)², multiply by P(X = x), then sum
Standard Deviation Formula
Example 3: Calculating Variance and SD for US Household Size
We already found μ = E(X) = 2.46. Now let's calculate variance:
| x | P(X = x) | (x - μ) | (x - μ)² | (x - μ)² · P(X = x) |
|---|---|---|---|---|
| 1 | 0.28 | 1 - 2.46 = -1.46 | 2.1316 | 0.5968 |
| 2 | 0.35 | 2 - 2.46 = -0.46 | 0.2116 | 0.0741 |
| 3 | 0.15 | 3 - 2.46 = 0.54 | 0.2916 | 0.0437 |
| 4 | 0.12 | 4 - 2.46 = 1.54 | 2.3716 | 0.2846 |
| 5 | 0.06 | 5 - 2.46 = 2.54 | 6.4516 | 0.3871 |
| 6 | 0.03 | 6 - 2.46 = 3.54 | 12.5316 | 0.3759 |
| 7 | 0.01 | 7 - 2.46 = 4.54 | 20.6116 | 0.2061 |
| Var(X) = σ² = Σ [(x - μ)² · P(X = x)] = | 1.97 | |||
Now calculate standard deviation:
σ = √Var(X) = √1.97 ≈ 1.40 people
Interpretation: US household sizes vary by about 1.40 people from the average of 2.46. This tells us there's moderate variability in household sizes—some households are much smaller or larger than average.
5. Step-by-Step Process
- Verify the distribution is valid (probabilities sum to 1)
- Calculate E(X) = μ: Multiply each x by P(X=x) and sum
- Calculate Var(X) = σ²:
- For each x, compute (x - μ)²
- Multiply by P(X = x)
- Sum all products
- Calculate σ: Take the square root of Var(X)
- Interpret in context: Explain what μ and σ mean for the problem
Example 4: Complete Calculation
A student takes a 3-question quiz. Let X = number of correct answers. The distribution is:
| x | P(X = x) |
|---|---|
| 0 | 0.1 |
| 1 | 0.3 |
| 2 | 0.4 |
| 3 | 0.2 |
Step 1: Expected Value
E(X) = 0(0.1) + 1(0.3) + 2(0.4) + 3(0.2) = 0 + 0.3 + 0.8 + 0.6 = 1.7 questions
Step 2: Variance
Var(X) = (0-1.7)²(0.1) + (1-1.7)²(0.3) + (2-1.7)²(0.4) + (3-1.7)²(0.2)
= 2.89(0.1) + 0.49(0.3) + 0.09(0.4) + 1.69(0.2)
= 0.289 + 0.147 + 0.036 + 0.338 = 0.81
Step 3: Standard Deviation
σ = √0.81 = 0.9 questions
Interpretation: On average, students get 1.7 questions correct, with typical variation of about 0.9 questions.
Check Your Understanding
Question 1: What are the two requirements for a valid discrete probability distribution?
Answer:
1) Each probability must be between 0 and 1: 0 ≤ P(X = x) ≤ 1
2) All probabilities must sum to 1: Σ P(X = x) = 1
Question 2: Why is E(X) = 2.46 for US household size even though no household can have 2.46 people?
Answer: E(X) is the long-run average over many households, not a value any single household must have. It's the weighted average of all possible values, where weights are the probabilities. Think of it as the "balance point" of the distribution.
Question 3: What does a larger standard deviation tell you about a probability distribution?
Answer: A larger σ indicates more variability or spread in the distribution. Values are more spread out from the mean. A smaller σ means values cluster more tightly around E(X).
Question 4: If all probabilities in a distribution are equal, where will E(X) be located?
Answer: E(X) will be at the midpoint of the possible values. For example, if X can be 1, 2, or 3 with equal probabilities (1/3 each), then E(X) = 2, the middle value.
Question 5: Is this a valid probability distribution? x: 1, 2, 3 with P(X=x): 0.5, 0.3, 0.3
Answer: No, it is NOT valid. While all probabilities are between 0 and 1, they sum to 0.5 + 0.3 + 0.3 = 1.1, which exceeds 1. The sum must equal exactly 1.0 for a valid distribution.
Summary
- A discrete probability distribution must have probabilities between 0 and 1 that sum to 1
- Expected value: E(X) = μ = Σ [x · P(X = x)] is the long-run average
- Variance: Var(X) = σ² = Σ [(x - μ)² · P(X = x)] measures spread
- Standard deviation: σ = √Var(X) is in the same units as X
- The US household size example shows μ = 2.46 people with σ = 1.40 people
- E(X) doesn't have to be a possible value—it's the theoretical average