Probability Distributions
The shape of randomness
📌 Before You Start
What you need: Module 1 completed or familiarity with basic descriptive stats. Understanding of mean and standard deviation.
What you’ll learn: How to work with probability distributions in R using the d/p/q/r function family. You’ll calculate probabilities, find quantiles, and generate random samples from the normal, uniform, binomial, and Poisson distributions.
📖 The Concept: Probability Distributions
A probability distribution describes all possible outcomes of a random variable and how likely each outcome is. It’s the mathematical model for randomness.
The Normal Distribution is the most important in statistics. It is:
- Symmetric and bell-shaped
- Completely defined by its mean (μ) and standard deviation (σ)
- Governed by the 68-95-99.7 rule: 68% of data within 1 SD, 95% within 2 SD, 99.7% within 3 SD
Other key distributions:
- Uniform — every value in a range is equally likely (e.g., rolling a fair die)
- Binomial — number of successes in n independent yes/no trials
- Poisson — number of events in a fixed time/space interval
In R, every distribution has 4 functions using the pattern d/p/q/r + distribution name.
🔢 R’s Distribution Function System
| Prefix | What it does | Example |
|---|---|---|
d*(x) | Density / probability at x | dnorm(0) → height of curve at 0 |
p*(q) | Cumulative probability P(X ≤ q) | pnorm(1.96) → 0.975 |
q*(p) | Quantile: value at probability p | qnorm(0.95) → 1.645 |
r*(n) | Generate n random values | rnorm(100) → 100 random normals |
Replace * with: norm, unif, binom, pois, etc.
💻 In R — Worked Example (read-only)
The normal distribution in R. pnorm gives cumulative probability; qnorm gives the reverse.
🖐️ Your Turn
Exercise 1 — SAT Score Probabilities
SAT scores are approximately normally distributed with mean = 500 and SD = 100. Use pnorm() and qnorm() to answer three probability questions.
pnorm gives P(X ≤ x). For P(X > x), use 1 - pnorm(x, ...). For P(a < X < b), use pnorm(b) - pnorm(a).Exercise 2 — Visualize the Normal Distribution
Generate 500 random normal values. Create a density histogram and overlay the theoretical normal curve. The histogram should roughly match the curve.
Exercise 3 — Normal vs. Uniform: Shape Comparison
Generate 1000 values from a normal and a uniform distribution (same rough range). Plot both histograms. Observe: the normal concentrates near the center; the uniform is flat.
🧠 Brain Break
You now have a probability calculator in R. pnorm and qnorm replace z-tables entirely.
Quick check: If X ~ Normal(100, 15) (IQ scores), what is pnorm(130, 100, 15)? Try to estimate mentally first (130 is 2 SD above mean → ~97.7%), then run it in Exercise 1’s editor to verify.
✅ Key Takeaway
The normal distribution is the most important in statistics. pnorm() calculates cumulative probabilities (replacing z-tables), and qnorm() gives the value at any percentile. The d/p/q/r system works for every distribution in R.
🏆 Module 2 Complete!
You can now calculate probabilities and generate random samples from any distribution in R. Next, we’ll use these skills to demonstrate one of the most important results in all of statistics.
Continue to Module 3: Sampling & The Central Limit Theorem →