Save or print this lesson:

Discrete Probability Distributions

Master probability distributions, expected value, and variance for discrete random variables

Lesson Objectives

By the end of this lesson, you will be able to:

Define a discrete probability distribution and identify its properties
Verify if a given distribution is valid
Calculate expected value E(X) = μ using the formula Σ[x · P(X=x)]
Calculate variance Var(X) = σ² using the formula Σ[(x - μ)² · P(X=x)]
Calculate standard deviation σ = √Var(X)
Interpret expected value and standard deviation in context

1. What is a Discrete Probability Distribution?

Definition: Discrete Probability Distribution

A discrete probability distribution lists all possible values of a discrete random variable along with their probabilities. It shows the complete probability model for the random variable.

A discrete probability distribution can be represented as a:

Table showing x values and P(X = x)
Graph (probability histogram or bar chart)
Formula that generates probabilities

Properties of a Valid Discrete Probability Distribution

For a distribution to be valid, it must satisfy TWO conditions:

1. Each probability is between 0 and 1: 0 ≤ P(X = x) ≤ 1

2. All probabilities sum to 1: Σ P(X = x) = 1

Example 1: Valid vs. Invalid Distributions

Distribution A:

x	P(X = x)
0	0.2
1	0.5
2	0.3

Is this valid? Check: All probabilities between 0 and 1
Sum: 0.2 + 0.5 + 0.3 = 1.0 VALID!

Distribution B:

x	P(X = x)
1	0.4
2	0.3
3	0.2

Is this valid? Check: All probabilities between 0 and 1
Sum: 0.4 + 0.3 + 0.2 = 0.9 INVALID! (Doesn't sum to 1)

2. US Household Size Distribution (2020 Data)

Let's examine a real-world discrete probability distribution using data from the 2020 US Census. Let X = number of people in a randomly selected US household.

US Household Size Distribution (2020)

Household Size (x)	Probability P(X = x)
1 person	0.28
2 people	0.35
3 people	0.15
4 people	0.12
5 people	0.06
6 people	0.03
7+ people	0.01

Check: Σ P(X = x) = 0.28 + 0.35 + 0.15 + 0.12 + 0.06 + 0.03 + 0.01 = 1.00

This distribution tells us, for example:

P(X = 1) = 0.28 → 28% of US households have 1 person
P(X = 2) = 0.35 → 35% have 2 people (most common)
P(X ≥ 5) = 0.06 + 0.03 + 0.01 = 0.10 → 10% have 5 or more people

Interpreting Probabilities:

If you randomly select a US household, the probability distribution tells you the likelihood of each possible household size. The most common size is 2 people (35%), while larger households are progressively less common.

3. Expected Value E(X)

Definition: Expected Value

The expected value (also called the mean or expectation) of a discrete random variable X is the long-run average value of X over many repetitions. It represents the "center" of the distribution.

Expected Value Formula

E(X) = μ = Σ [x · P(X = x)]

In words: Multiply each value by its probability, then sum all products

Example 2: Calculating E(X) for US Household Size

Let's calculate the expected household size using the 2020 data:

x	P(X = x)	x · P(X = x)
1	0.28	1 × 0.28 = 0.28
2	0.35	2 × 0.35 = 0.70
3	0.15	3 × 0.15 = 0.45
4	0.12	4 × 0.12 = 0.48
5	0.06	5 × 0.06 = 0.30
6	0.03	6 × 0.03 = 0.18
7	0.01	7 × 0.01 = 0.07
E(X) = Σ [x · P(X = x)] =		2.46

Interpretation: The expected (average) US household size is 2.46 people. This doesn't mean any household has exactly 2.46 people (impossible!), but it's the long-run average if you surveyed many households.

Important Notes about E(X):

E(X) may not be a possible value of X (like 2.46 people)
E(X) is the "balance point" or center of the distribution
E(X) is affected by extreme values (like large households)
We use μ (mu) to denote E(X), just like population mean

4. Variance and Standard Deviation

Definition: Variance and Standard Deviation

The variance Var(X) = σ² measures the spread or variability of a probability distribution around its mean. It quantifies how far values typically are from E(X).

The standard deviation σ = √Var(X) is the square root of variance, measured in the same units as X.

Variance Formula

Var(X) = σ² = Σ [(x - μ)² · P(X = x)]

In words: For each value, find (x - μ)², multiply by P(X = x), then sum

Standard Deviation Formula

σ = √Var(X) = √σ²

Example 3: Calculating Variance and SD for US Household Size

We already found μ = E(X) = 2.46. Now let's calculate variance:

x	P(X = x)	(x - μ)	(x - μ)²	(x - μ)² · P(X = x)
1	0.28	1 - 2.46 = -1.46	2.1316	0.5968
2	0.35	2 - 2.46 = -0.46	0.2116	0.0741
3	0.15	3 - 2.46 = 0.54	0.2916	0.0437
4	0.12	4 - 2.46 = 1.54	2.3716	0.2846
5	0.06	5 - 2.46 = 2.54	6.4516	0.3871
6	0.03	6 - 2.46 = 3.54	12.5316	0.3759
7	0.01	7 - 2.46 = 4.54	20.6116	0.2061
Var(X) = σ² = Σ [(x - μ)² · P(X = x)] =				1.97

Now calculate standard deviation:

σ = √Var(X) = √1.97 ≈ 1.40 people

Interpretation: US household sizes vary by about 1.40 people from the average of 2.46. This tells us there's moderate variability in household sizes—some households are much smaller or larger than average.

5. Step-by-Step Process

How to Calculate E(X), Var(X), and σ:

Verify the distribution is valid (probabilities sum to 1)
Calculate E(X) = μ: Multiply each x by P(X=x) and sum
Calculate Var(X) = σ²:
- For each x, compute (x - μ)²
- Multiply by P(X = x)
- Sum all products
Calculate σ: Take the square root of Var(X)
Interpret in context: Explain what μ and σ mean for the problem

Example 4: Complete Calculation

A student takes a 3-question quiz. Let X = number of correct answers. The distribution is:

x	P(X = x)
0	0.1
1	0.3
2	0.4
3	0.2

Step 1: Expected Value

E(X) = 0(0.1) + 1(0.3) + 2(0.4) + 3(0.2) = 0 + 0.3 + 0.8 + 0.6 = 1.7 questions

Step 2: Variance

Var(X) = (0-1.7)²(0.1) + (1-1.7)²(0.3) + (2-1.7)²(0.4) + (3-1.7)²(0.2)

= 2.89(0.1) + 0.49(0.3) + 0.09(0.4) + 1.69(0.2)

= 0.289 + 0.147 + 0.036 + 0.338 = 0.81

Step 3: Standard Deviation

σ = √0.81 = 0.9 questions

Interpretation: On average, students get 1.7 questions correct, with typical variation of about 0.9 questions.

Check Your Understanding

Question 1: What are the two requirements for a valid discrete probability distribution?

Answer: 1) Each probability must be between 0 and 1: 0 ≤ P(X = x) ≤ 1
2) All probabilities must sum to 1: Σ P(X = x) = 1

Question 2: Why is E(X) = 2.46 for US household size even though no household can have 2.46 people?

Answer: E(X) is the long-run average over many households, not a value any single household must have. It's the weighted average of all possible values, where weights are the probabilities. Think of it as the "balance point" of the distribution.

Question 3: What does a larger standard deviation tell you about a probability distribution?

Answer: A larger σ indicates more variability or spread in the distribution. Values are more spread out from the mean. A smaller σ means values cluster more tightly around E(X).

Question 4: If all probabilities in a distribution are equal, where will E(X) be located?

Answer: E(X) will be at the midpoint of the possible values. For example, if X can be 1, 2, or 3 with equal probabilities (1/3 each), then E(X) = 2, the middle value.

Question 5: Is this a valid probability distribution? x: 1, 2, 3 with P(X=x): 0.5, 0.3, 0.3

Answer: No, it is NOT valid. While all probabilities are between 0 and 1, they sum to 0.5 + 0.3 + 0.3 = 1.1, which exceeds 1. The sum must equal exactly 1.0 for a valid distribution.

Summary

Key Takeaways:

A discrete probability distribution must have probabilities between 0 and 1 that sum to 1
Expected value: E(X) = μ = Σ [x · P(X = x)] is the long-run average
Variance: Var(X) = σ² = Σ [(x - μ)² · P(X = x)] measures spread
Standard deviation: σ = √Var(X) is in the same units as X
The US household size example shows μ = 2.46 people with σ = 1.40 people
E(X) doesn't have to be a possible value—it's the theoretical average

← Previous: Random Variables Next: Binomial Distribution →