Probability Distributions

The shape of randomness

← Module 1: Descriptive Statistics Module 2 of 8 Module 3: CLT →

Loading R... (first load takes ~15 seconds)

Before You Start

What you need: Module 1 completed or familiarity with basic descriptive stats. Understanding of mean and standard deviation.

What you’ll learn: How to work with probability distributions in R using the d/p/q/r function family. You’ll calculate probabilities, find quantiles, and generate random samples from the normal, uniform, binomial, and Poisson distributions.

The Concept: Probability Distributions

A probability distribution describes all possible outcomes of a random variable and how likely each outcome is. It’s the mathematical model for randomness.

The Normal Distribution is the most important in statistics. It is:

Symmetric and bell-shaped
Completely defined by its mean (μ) and standard deviation (σ)
Governed by the 68-95-99.7 rule: 68% of data within 1 SD, 95% within 2 SD, 99.7% within 3 SD

Other key distributions:

Uniform — every value in a range is equally likely (e.g., rolling a fair die)
Binomial — number of successes in n independent yes/no trials
Poisson — number of events in a fixed time/space interval

In R, every distribution has 4 functions using the pattern d/p/q/r + distribution name.

R’s Distribution Function System

Prefix	What it does	Example
`d*(x)`	Density / probability at x	`dnorm(0)` → height of curve at 0
`p*(q)`	Cumulative probability P(X ≤ q)	`pnorm(1.96)` → 0.975
`q*(p)`	Quantile: value at probability p	`qnorm(0.95)` → 1.645
`r*(n)`	Generate n random values	`rnorm(100)` → 100 random normals

Replace * with: norm, unif, binom, pois, etc.

In R — Worked Example (read-only)

The normal distribution in R. pnorm gives cumulative probability; qnormgives the reverse.

# Working with the normal distribution in R # dnorm: density, pnorm: cumulative prob, qnorm: quantile, rnorm: random # P(X < 90) where X ~ Normal(mean=80, sd=10) prob_below_90 <- pnorm(90, mean=80, sd=10) cat("P(score < 90):", round(prob_below_90, 4), "\n") # P(70 < X < 90) — probability between two values prob_between <- pnorm(90, mean=80, sd=10) - pnorm(70, mean=80, sd=10) cat("P(70 < score < 90):", round(prob_between, 4), "\n") # What score is at the 95th percentile? p95 <- qnorm(0.95, mean=80, sd=10) cat("95th percentile:", round(p95, 2), "\n") # Generate 1000 random normal values set.seed(42) samples <- rnorm(1000, mean=80, sd=10) cat("Sample mean:", round(mean(samples), 2), "\n") cat("Sample SD: ", round(sd(samples), 2), "\n")

Your Turn

Exercise 1 — SAT Score Probabilities

SAT scores are approximately normally distributed with mean = 500 and SD = 100. Use pnorm() and qnorm() to answer three probability questions.

# SAT scores: Normal(mean=500, sd=100)
mu <- 500
sigma <- 100

# 1. P(score > 600) — probability of scoring above 600
# pnorm gives P(X <= x), so we use 1 - pnorm for "above"
p_above_600 <- 1 - pnorm(600, mean=mu, sd=sigma)
cat("P(score > 600):", round(p_above_600, 4), "\n")
cat("That's about", round(p_above_600 * 100, 1), "% of test-takers\n\n")

# 2. P(400 < score < 600) — probability between 400 and 600
p_between <- pnorm(600, mean=mu, sd=sigma) - pnorm(400, mean=mu, sd=sigma)
cat("P(400 < score < 600):", round(p_between, 4), "\n")
cat("That's the 68-95-99.7 rule! 400 and 600 are 1 SD away.\n\n")

# 3. Score at the 90th percentile
p90_score <- qnorm(0.90, mean=mu, sd=sigma)
cat("90th percentile score:", round(p90_score, 1), "\n")
cat("A score of", round(p90_score, 0), "beats 90% of test-takers.\n")

Output will appear here...

Note: pnorm gives P(X ≤ x). For P(X > x), use 1 - pnorm(x, ...). For P(a < X < b), use pnorm(b) - pnorm(a).

Exercise 2 — Visualize the Normal Distribution

Generate 500 random normal values. Create a density histogram and overlay the theoretical normal curve. The histogram should roughly match the curve.

set.seed(99)
# Generate 500 random values from Normal(80, 10)
x_vals <- rnorm(500, mean = 80, sd = 10)

# Density histogram (freq=FALSE makes it a density, not count)
hist(x_vals,
     freq = FALSE,
     main = "Sample vs. Theoretical Normal Distribution",
     xlab = "Value",
     col = "#B2DFDB",
     border = "white",
     breaks = 20)

# Overlay the theoretical normal curve
curve(dnorm(x, mean = 80, sd = 10),
      add = TRUE,
      col = "#004D40",
      lwd = 2.5)

# Add mean line
abline(v = 80, col = "#C62828", lwd = 2, lty = 2)
legend("topright",
       legend = c("Sample (n=500)", "Theoretical N(80,10)", "True mean = 80"),
       fill = c("#B2DFDB", NA, NA),
       lty = c(NA, 1, 2),
       col = c(NA, "#004D40", "#C62828"),
       lwd = c(NA, 2.5, 2))

Output will appear here...

Try it: Change n from 500 to 50. The histogram will be less smooth. That’s sampling variability — small samples are noisy.

Exercise 3 — Normal vs. Uniform: Shape Comparison

Generate 1000 values from a normal and a uniform distribution (same rough range). Plot both histograms. Observe: the normal concentrates near the center; the uniform is flat.

set.seed(7)
# Normal distribution: mean=50, sd=10
normal_data <- rnorm(1000, mean = 50, sd = 10)

# Uniform distribution: same general range (roughly 20 to 80)
uniform_data <- runif(1000, min = 20, max = 80)

# Stats comparison
cat("=== Normal Distribution ===\n")
cat("Mean:", round(mean(normal_data), 2), " | SD:", round(sd(normal_data), 2), "\n")
cat("Min:", round(min(normal_data), 1), " | Max:", round(max(normal_data), 1), "\n\n")

cat("=== Uniform Distribution ===\n")
cat("Mean:", round(mean(uniform_data), 2), " | SD:", round(sd(uniform_data), 2), "\n")
cat("Min:", round(min(uniform_data), 1), " | Max:", round(max(uniform_data), 1), "\n\n")

cat("Similar means — but very different shapes!\n")

# Side-by-side histograms
par(mfrow = c(1, 2))
hist(normal_data,
     main = "Normal(50, 10)\n— Bell-shaped",
     xlab = "Value", col = "#B2DFDB", border = "white",
     xlim = c(15, 85), breaks = 20)

hist(uniform_data,
     main = "Uniform(20, 80)\n— Flat",
     xlab = "Value", col = "#F8BBD9", border = "white",
     xlim = c(15, 85), breaks = 20)
par(mfrow = c(1, 1))

Output will appear here...

Brain Break

You now have a probability calculator in R. pnorm and qnormreplace z-tables entirely.

Quick check: If X ~ Normal(100, 15) (IQ scores), what is pnorm(130, 100, 15)? Try to estimate mentally first (130 is 2 SD above mean → ~97.7%), then run it in Exercise 1’s editor to verify.

Key Takeaway

The normal distribution is the most important in statistics. pnorm() calculates cumulative probabilities (replacing z-tables), and qnorm() gives the value at any percentile. The d/p/q/r system works for every distribution in R.

Module 2 Complete!

You can now calculate probabilities and generate random samples from any distribution in R. Next, we’ll use these skills to demonstrate one of the most important results in all of statistics.

Continue to Module 3: Sampling & The Central Limit Theorem →

← Module 1: Descriptive Statistics Module 2 of 8 Module 3: CLT →