1. Random Variables

Random Variable: A variable whose value is determined by the outcome of a random event. We use X, Y, Z to denote random variables.

Types of Random Variables

Discrete Random Variables

Definition: Takes on a countable set of distinct values

Examples: Number of heads in coin flips, number of students in a class, number of defective items
Can list all possible values: 0, 1, 2, 3, ...
Gaps between values (no values between 2 and 3)

Continuous Random Variables

Definition: Takes on any value in an interval (infinite possibilities)

Examples: Height, weight, temperature, time
Can't list all possible values
Infinitely many values between any two numbers

KEY: Module 4 focuses on DISCRETE random variables!

2. Probability Distributions

Probability Distribution: Lists all possible values of a random variable and their corresponding probabilities.

Requirements for Valid Probability Distribution

Each probability P(X = x) is between 0 and 1 (inclusive)
The sum of all probabilities equals exactly 1.0: Σ P(X = x) = 1

Example: US Household Size Distribution

Scenario: The probability distribution for US household size:

Household Size (X)	Probability P(X)
1 person	0.28
2 people	0.34
3 people	0.15
4 people	0.13
5+ people	0.10
Total	1.00

Check: All probabilities sum to 1.00

Visualization of Probability Distributions

A probability histogram shows:

X-axis: Values of the random variable
Y-axis: Probabilities (heights of bars)
Bar height = Probability of that value

For household size, the bar at X=2 has height 0.34 (34% of households have 2 people).

3. Expected Value (Mean)

Expected Value, E(X): The long-run average value when a random experiment is repeated many times. It's the "theoretical mean" of a probability distribution.

Formula for Expected Value

E(X) = Σ [x · P(X = x)]

Interpretation: Multiply each value by its probability, then sum all products.

Fully Worked Example: US Household Size

Calculate E(X) using the household size distribution:

Step 1: Multiply each value by its probability

Size (x)	P(X = x)	x · P(X = x)
1	0.28	1 × 0.28 = 0.28
2	0.34	2 × 0.34 = 0.68
3	0.15	3 × 0.15 = 0.45
4	0.13	4 × 0.13 = 0.52
5+	0.10	5 × 0.10 = 0.50

Step 2: Sum all the products

E(X) = 0.28 + 0.68 + 0.45 + 0.52 + 0.50 = 2.43

Interpretation: The average US household has 2.43 people. This doesn't mean any household has exactly 2.43 people—it's the average across all households.

4. Variance and Standard Deviation

Variance, Var(X): Measures the average squared distance from each value to the expected value. Shows how spread out the distribution is.

Formulas for Variance and Standard Deviation

Var(X) = Σ [(x − E(X))² · P(X = x)]

Alternative form: Var(X) = E(X²) − [E(X)]²

SD(X) = √[Var(X)]

Standard Deviation: The square root of variance (easier to interpret!)

KEY DIFFERENCE: Variance is in squared units; Standard Deviation is in the original units.

Interpreting Variance and Standard Deviation

What they tell us:

Low variance/SD: Values are close to the expected value (clustered)
High variance/SD: Values are spread far from the expected value (scattered)

Example: If E(X) = 2.43 and SD(X) = 1.70 for household size:
• Household sizes typically vary from the average by about 1.70 people
• The distribution is somewhat spread out (not tightly clustered)

5. The Binomial Distribution

Binomial Distribution: Describes the number of successes in a fixed number of independent trials, where each trial has two outcomes (success or failure).

Four Requirements for a Binomial Experiment

Fixed number of trials (n): The number of trials is predetermined
Two outcomes per trial: Each trial results in either "success" or "failure"
Constant probability (p): The probability of success is the SAME for each trial
Independent trials: The outcome of one trial doesn't affect another

Binomial Parameters

n: Number of trials
p: Probability of success on each trial (between 0 and 1)
X: Number of successes (can be 0, 1, 2, ..., n)

Expected Value and Variance of Binomial

E(X) = n × p

Expected number of successes in n trials

Var(X) = n × p × (1 − p)

Also written as: Var(X) = n × p × q, where q = (1 − p)

Binomial Distribution Examples

Example 1: Coin Flipping

Flip a fair coin 10 times. Let X = number of heads.

n = 10 (10 trials)
p = 0.5 (probability of heads on each flip)
E(X) = 10 × 0.5 = 5 (expect 5 heads on average)
Var(X) = 10 × 0.5 × 0.5 = 2.5
SD(X) = √2.5 ≈ 1.58

Interpretation: In 10 flips, expect about 5 heads, with variation of about 1.58 heads.

Example 2: Quality Control

A manufacturer produces items with 2% defect rate. Inspect 50 items. Let X = number of defects.

n = 50 (50 items)
p = 0.02 (2% defect rate)
E(X) = 50 × 0.02 = 1 (expect 1 defective item)
Var(X) = 50 × 0.02 × 0.98 = 0.98
SD(X) = √0.98 ≈ 0.99

Interpretation: In 50 items, expect about 1 defect, with variation of about 1 defect.

Shape of Binomial Distribution

The shape depends on both n and p:

p = 0.5: Symmetric distribution (bell-shaped)
p < 0.5: Right-skewed (tail extends to right)
p > 0.5: Left-skewed (tail extends to left)
Large n: Distribution becomes more normal-shaped

Normal Approximation to Binomial

When is the binomial approximately normal?

The binomial distribution is approximately normal when BOTH conditions are met:

np ≥ 10 AND n(1 − p) ≥ 10

When these conditions are satisfied, we can use normal distribution calculations (z-scores) to approximate binomial probabilities.

Example: With n = 50, p = 0.3:
• np = 50 × 0.3 = 15 ≥ 10
• n(1-p) = 50 × 0.7 = 35 ≥ 10
→ Normal approximation is appropriate!

Quick Reference: All Formulas

Expected Value (General Discrete Distribution)

E(X) = Σ [x · P(X = x)]

Variance and Standard Deviation

Var(X) = Σ [(x − E(X))² · P(X = x)]

SD(X) = √[Var(X)]

Binomial Distribution

E(X) = n × p

Var(X) = n × p × (1 − p)

SD(X) = √[n × p × (1 − p)]

Requirements: Fixed n, two outcomes, constant p, independent trials

Normal Approximation Condition

Use normal approximation when: np ≥ 10 AND n(1 − p) ≥ 10

Key Concepts to Remember

1. A discrete random variable takes countable distinct values
2. A probability distribution must have probabilities summing to 1.0
3. Expected value is the long-run average (mean)
4. Variance/SD measure spread; SD is easier to interpret
5. Binomial requires fixed n, two outcomes, constant p
6. E(X) and Var(X) have simple formulas for binomial: np and np(1-p)
7. Check np ≥ 10 and n(1-p) ≥ 10 before using normal approximation

Module 4: Discrete Probability Distributions

Free Statistics Learning Platform • Safaa Dabagh • sdabagh.github.io

Module 4 Study Guide

1. Random Variables

Types of Random Variables

Discrete Random Variables

Continuous Random Variables

2. Probability Distributions

Requirements for Valid Probability Distribution

Example: US Household Size Distribution

Visualization of Probability Distributions

3. Expected Value (Mean)

Formula for Expected Value

Fully Worked Example: US Household Size

4. Variance and Standard Deviation

Formulas for Variance and Standard Deviation

Interpreting Variance and Standard Deviation

5. The Binomial Distribution

Four Requirements for a Binomial Experiment

Binomial Parameters

Expected Value and Variance of Binomial

Binomial Distribution Examples

Shape of Binomial Distribution

Normal Approximation to Binomial

Quick Reference: All Formulas

Expected Value (General Discrete Distribution)

Variance and Standard Deviation

Binomial Distribution

Normal Approximation Condition

Key Concepts to Remember