Learn Without Walls
← Back to Module 2
Module 2, Lesson 4 of 4

25-30 minutes

← Previous Practice Problems →

Lesson 4: Five-Number Summary and Boxplots

Learning Objectives

By the end of this lesson, you will be able to:

What is the Five-Number Summary?

Five-Number Summary: A set of five values that provides a complete picture of a dataset's center and spread.

The Five Numbers:

1. Minimum
Min

Smallest value

2. Q1
Q1

First Quartile (25th percentile)

3. Median
Q2

Middle value (50th percentile)

4. Q3
Q3

Third Quartile (75th percentile)

5. Maximum
Max

Largest value

These five numbers divide your data into four equal quarters, each containing 25% of the data:

Calculating the Five-Number Summary

Example 1: Finding the Five-Number Summary

Dataset: Test scores: 62, 68, 72, 75, 78, 80, 82, 85, 88, 92, 95

Step-by-Step Calculation:

Step 1: Order the data (already ordered!)

62, 68, 72, 75, 78, 80, 82, 85, 88, 92, 95

Step 2: Find the Minimum and Maximum

  • Minimum = 62 (smallest value)
  • Maximum = 95 (largest value)

Step 3: Find the Median (Q2)

11 values, so median is the 6th value: Median = 80

Step 4: Find Q1 (median of lower half)

Lower half (values below median): 62, 68, 72, 75, 78

Median of lower half: Q1 = 72

Step 5: Find Q3 (median of upper half)

Upper half (values above median): 82, 85, 88, 92, 95

Median of upper half: Q3 = 88

Five-Number Summary:

Min
62
Q1
72
Median
80
Q3
88
Max
95

What the Five-Number Summary Tells Us:

  • Center: Median = 80 (typical score)
  • Spread: Range = 95 - 62 = 33 points (full spread)
  • Middle 50%: IQR = Q3 - Q1 = 88 - 72 = 16 points (scores from 72 to 88)
  • Distribution: 25% scored below 72, 50% scored below 80, 75% scored below 88

What is a Boxplot?

Boxplot (Box-and-Whisker Plot): A visual representation of the five-number summary. Shows the center, spread, and shape of a distribution in one compact graph.

Anatomy of a Boxplot

┌─── Max (95) │ Whisker │ ├─── Q3 (88) ───┐ │ │ │ BOX │ ← IQR (Middle 50%) │ │ ├─── Median (80)│ │ │ │ BOX │ │ │ ├─── Q1 (72) ───┘ Whisker │ │ └─── Min (62) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60 70 80 90 100 Test Scores

Parts of a Boxplot:

  • Box: Spans from Q1 to Q3 (contains middle 50% of data)
  • Line inside box: Shows the median
  • Whiskers: Lines extending from box to Min and Max
  • Width of box = IQR (shows the spread of middle 50%)

Reading Distribution Shape from Boxplots

Boxplots reveal distribution shape by showing where the data is concentrated!

Symmetric Distribution

┌─── Max │ ═════╪═════ Q3 │ ─────┼───── Median (centered!) │ ═════╪═════ Q1 │ └─── Min Equal distances: Q1 to Median ≈ Median to Q3 Min to Q1 ≈ Q3 to Max

Right-Skewed Distribution

┌─── Max │ │ Long whisker │ ═══════╪ Q3 ───────┤ Median (closer to Q1) ═══════╪ Q1 │ └─ Min Upper whisker LONGER Median closer to Q1

Left-Skewed Distribution

┌─── Max │ └─ Q3 ═══════╪ ───────┤ Median (closer to Q3) ═══════╪ Q1 │ │ Long whisker │ └─── Min Lower whisker LONGER Median closer to Q3

Quick Shape Identification from Boxplots:

  • Symmetric: Median centered in box, whiskers roughly equal length
  • Right-Skewed: Median closer to Q1, longer upper whisker
  • Left-Skewed: Median closer to Q3, longer lower whisker

Identifying Outliers: The 1.5×IQR Rule

Outlier (formal definition): Any value that falls more than 1.5 × IQR below Q1 or above Q3.

The 1.5×IQR Rule for Outliers

Step 1: Calculate IQR

IQR = Q3 - Q1

Step 2: Calculate fences (boundaries)

  • Lower Fence = Q1 - (1.5 × IQR)
  • Upper Fence = Q3 + (1.5 × IQR)

Step 3: Identify outliers

  • Any value below lower fence = outlier
  • Any value above upper fence = outlier

Example 2: Finding Outliers

Dataset: Daily sales ($): 200, 210, 215, 220, 225, 230, 235, 240, 450

Step 1: Find five-number summary

  • Min = 200, Q1 = 212.5, Median = 225, Q3 = 237.5, Max = 450

Step 2: Calculate IQR

IQR = Q3 - Q1 = 237.5 - 212.5 = 25

Step 3: Calculate fences

  • Lower Fence = Q1 - (1.5 × IQR) = 212.5 - (1.5 × 25) = 212.5 - 37.5 = 175
  • Upper Fence = Q3 + (1.5 × IQR) = 237.5 + (1.5 × 25) = 237.5 + 37.5 = 275

Step 4: Identify outliers

  • Is any value < 175? No
  • Is any value > 275? Yes! 450 is an outlier

Interpretation: The $450 day is unusual – maybe a special event or catering order. It's statistically different from typical daily sales ($200-240).

Boxplot with Outlier

* ← Outlier (450) │ ┌───────────────┘ Upper fence (275) │ ├─── Q3 (237.5) ───┐ │ │ ├─── Median (225) │ BOX │ │ ├─── Q1 (212.5) ───┘ │ └─── Min (200) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 150 200 250 300 350 400 450 Daily Sales ($)

Note: Outliers are plotted as individual points beyond the whiskers

Comparing Groups with Side-by-Side Boxplots

One of the best uses of boxplots is comparing multiple groups on the same scale!

Example 3: Comparing Test Scores (Class A vs. Class B)

Class A Scores: Five-number summary = (62, 72, 80, 88, 95)

Class B Scores: Five-number summary = (55, 68, 75, 82, 98)

Class A: ├─────╪═════╪═════╪─────┤ 62 72 80 88 95 Class B: ├──────╪═════╪═════╪──────┤ 55 68 75 82 98 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50 60 70 80 90 100 Test Scores

Comparison Analysis:

  • Center: Class A (median=80) scored higher than Class B (median=75)
  • Spread: Class A IQR = 16, Class B IQR = 14 (similar variability)
  • Range: Class A = 33 points, Class B = 43 points (B more variable overall)
  • Shape: Both appear roughly symmetric (median centered in box)

Conclusion: Class A performed better on average, with tighter scores at the extremes.

Advantages of Side-by-Side Boxplots:

  • Easy visual comparison of centers (medians)
  • Compare variability at a glance (box widths = IQR)
  • Identify shape differences between groups
  • Spot outliers in each group
  • Compact - can compare many groups in small space

Boxplots vs. Histograms: When to Use Each?

Feature Boxplot Histogram
Shows center Yes (median) Yes (peak)
Shows spread Yes (IQR, range) Yes (width)
Shows shape Yes (roughly) Yes (detailed)
Shows individual values No No (grouped)
Identifies outliers Yes (with 1.5×IQR rule) Sometimes visible
Compares groups Excellent (side-by-side) Harder (overlapping)
Space efficient Very compact Takes more space
Shows detailed shape Less detail Shows exact distribution

When to Use Which?

Use Boxplots when:

  • Comparing multiple groups
  • You want a quick summary (5 numbers)
  • Identifying outliers is important
  • Space is limited

Use Histograms when:

  • You want to see the detailed shape
  • Analyzing a single group
  • You need to see modes (peaks)
  • Distribution shape detail matters

Best practice: Use both! They complement each other.

Check Your Understanding

Test your knowledge of five-number summaries and boxplots!

Question 1:

A dataset has these values: Min=10, Q1=20, Median=30, Q3=40, Max=50. What is the IQR?

  • 10
  • 20 (Q3 - Q1 = 40 - 20)
  • 30
  • 40

Question 2:

Using the dataset from Q1, calculate the upper fence for outlier detection (Q3 + 1.5×IQR).

  • 50
  • 60
  • 70 (40 + 1.5×20 = 40 + 30)
  • 80

Question 3:

In a boxplot, if the median line is much closer to Q1 than to Q3, what does this indicate about the distribution?

  • Symmetric distribution
  • Right-skewed distribution (tail extends right)
  • Left-skewed distribution
  • There's an error in the data

Question 4:

What percentage of data falls between Q1 and Q3 (inside the box)?

  • 25%
  • 50% (the middle half of the data)
  • 75%
  • 100%

Question 5:

When comparing test scores for 5 different classes, which visualization is better?

  • Side-by-side boxplots (compact, easy comparison)
  • 5 separate histograms
  • A single combined histogram
  • A pie chart

Question 6:

A value of 85 is in a dataset with Q1=50, Q3=70, IQR=20. Is 85 an outlier?

  • No (upper fence = 70+30=100, and 85 < 100)
  • Yes, it's above Q3
  • Yes, it's too far from the median
  • Cannot determine

Key Takeaways

  • Five-number summary: Min, Q1, Median, Q3, Max – divides data into four equal quarters.
  • Boxplot shows: Center (median), spread (IQR, range), shape (symmetric vs. skewed), and outliers.
  • IQR = Q3 - Q1 – the range of the middle 50% of data.
  • 1.5×IQR Rule: Outliers are values beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR.
  • Reading shape: Median centered = symmetric, median near Q1 = right-skewed, median near Q3 = left-skewed.
  • Side-by-side boxplots are excellent for comparing multiple groups quickly!
  • Boxplots vs. histograms: Boxplots better for comparisons, histograms better for detailed shape.
← Lesson 3: Distribution Shapes Practice Problems →