Learn Without Walls
← Back to Statistics with R
Module 2 of 8 — Statistics with R

Probability Distributions

The shape of randomness

← Module 1: Descriptive Statistics Module 2 of 8 Module 3: CLT →
⏳ Loading R... (first load takes ~15 seconds)

📌 Before You Start

What you need: Module 1 completed or familiarity with basic descriptive stats. Understanding of mean and standard deviation.

What you’ll learn: How to work with probability distributions in R using the d/p/q/r function family. You’ll calculate probabilities, find quantiles, and generate random samples from the normal, uniform, binomial, and Poisson distributions.

📖 The Concept: Probability Distributions

A probability distribution describes all possible outcomes of a random variable and how likely each outcome is. It’s the mathematical model for randomness.

The Normal Distribution is the most important in statistics. It is:

Other key distributions:

In R, every distribution has 4 functions using the pattern d/p/q/r + distribution name.

🔢 R’s Distribution Function System

PrefixWhat it doesExample
d*(x)Density / probability at xdnorm(0) → height of curve at 0
p*(q)Cumulative probability P(X ≤ q)pnorm(1.96) → 0.975
q*(p)Quantile: value at probability pqnorm(0.95) → 1.645
r*(n)Generate n random valuesrnorm(100) → 100 random normals

Replace * with: norm, unif, binom, pois, etc.

💻 In R — Worked Example (read-only)

The normal distribution in R. pnorm gives cumulative probability; qnorm gives the reverse.

# Working with the normal distribution in R # dnorm: density, pnorm: cumulative prob, qnorm: quantile, rnorm: random # P(X < 90) where X ~ Normal(mean=80, sd=10) prob_below_90 <- pnorm(90, mean=80, sd=10) cat("P(score < 90):", round(prob_below_90, 4), "\n") # P(70 < X < 90) — probability between two values prob_between <- pnorm(90, mean=80, sd=10) - pnorm(70, mean=80, sd=10) cat("P(70 < score < 90):", round(prob_between, 4), "\n") # What score is at the 95th percentile? p95 <- qnorm(0.95, mean=80, sd=10) cat("95th percentile:", round(p95, 2), "\n") # Generate 1000 random normal values set.seed(42) samples <- rnorm(1000, mean=80, sd=10) cat("Sample mean:", round(mean(samples), 2), "\n") cat("Sample SD: ", round(sd(samples), 2), "\n")

🖐️ Your Turn

Exercise 1 — SAT Score Probabilities

SAT scores are approximately normally distributed with mean = 500 and SD = 100. Use pnorm() and qnorm() to answer three probability questions.

Output will appear here...
💡 Note: pnorm gives P(X ≤ x). For P(X > x), use 1 - pnorm(x, ...). For P(a < X < b), use pnorm(b) - pnorm(a).

Exercise 2 — Visualize the Normal Distribution

Generate 500 random normal values. Create a density histogram and overlay the theoretical normal curve. The histogram should roughly match the curve.

Output will appear here...
💡 Try it: Change n from 500 to 50. The histogram will be less smooth. That’s sampling variability — small samples are noisy.

Exercise 3 — Normal vs. Uniform: Shape Comparison

Generate 1000 values from a normal and a uniform distribution (same rough range). Plot both histograms. Observe: the normal concentrates near the center; the uniform is flat.

Output will appear here...

🧠 Brain Break

You now have a probability calculator in R. pnorm and qnorm replace z-tables entirely.

Quick check: If X ~ Normal(100, 15) (IQ scores), what is pnorm(130, 100, 15)? Try to estimate mentally first (130 is 2 SD above mean → ~97.7%), then run it in Exercise 1’s editor to verify.

✅ Key Takeaway

The normal distribution is the most important in statistics. pnorm() calculates cumulative probabilities (replacing z-tables), and qnorm() gives the value at any percentile. The d/p/q/r system works for every distribution in R.

🏆 Module 2 Complete!

You can now calculate probabilities and generate random samples from any distribution in R. Next, we’ll use these skills to demonstrate one of the most important results in all of statistics.

Continue to Module 3: Sampling & The Central Limit Theorem →

← Module 1: Descriptive Statistics Module 2 of 8 Module 3: CLT →