25-30 minutes
Lesson 4: Five-Number Summary and Boxplots
Learning Objectives
By the end of this lesson, you will be able to:
- Calculate the five-number summary (Min, Q1, Median, Q3, Max)
- Create and interpret boxplots (box-and-whisker plots)
- Use the 1.5×IQR rule to identify outliers
- Determine distribution shape from a boxplot
- Compare multiple distributions using side-by-side boxplots
- Choose between histograms and boxplots for different purposes
What is the Five-Number Summary?
Five-Number Summary: A set of five values that provides a complete picture of a dataset's center and spread.
The Five Numbers:
Smallest value
First Quartile (25th percentile)
Middle value (50th percentile)
Third Quartile (75th percentile)
Largest value
These five numbers divide your data into four equal quarters, each containing 25% of the data:
- Bottom 25%: Between Min and Q1
- Second 25%: Between Q1 and Median
- Third 25%: Between Median and Q3
- Top 25%: Between Q3 and Max
Calculating the Five-Number Summary
Example 1: Finding the Five-Number Summary
Dataset: Test scores: 62, 68, 72, 75, 78, 80, 82, 85, 88, 92, 95
Step-by-Step Calculation:
Step 1: Order the data (already ordered!)
62, 68, 72, 75, 78, 80, 82, 85, 88, 92, 95
Step 2: Find the Minimum and Maximum
- Minimum = 62 (smallest value)
- Maximum = 95 (largest value)
Step 3: Find the Median (Q2)
11 values, so median is the 6th value: Median = 80
Step 4: Find Q1 (median of lower half)
Lower half (values below median): 62, 68, 72, 75, 78
Median of lower half: Q1 = 72
Step 5: Find Q3 (median of upper half)
Upper half (values above median): 82, 85, 88, 92, 95
Median of upper half: Q3 = 88
Five-Number Summary:
What the Five-Number Summary Tells Us:
- Center: Median = 80 (typical score)
- Spread: Range = 95 - 62 = 33 points (full spread)
- Middle 50%: IQR = Q3 - Q1 = 88 - 72 = 16 points (scores from 72 to 88)
- Distribution: 25% scored below 72, 50% scored below 80, 75% scored below 88
What is a Boxplot?
Boxplot (Box-and-Whisker Plot): A visual representation of the five-number summary. Shows the center, spread, and shape of a distribution in one compact graph.
Anatomy of a Boxplot
Parts of a Boxplot:
- Box: Spans from Q1 to Q3 (contains middle 50% of data)
- Line inside box: Shows the median
- Whiskers: Lines extending from box to Min and Max
- Width of box = IQR (shows the spread of middle 50%)
Reading Distribution Shape from Boxplots
Boxplots reveal distribution shape by showing where the data is concentrated!
Symmetric Distribution
Right-Skewed Distribution
Left-Skewed Distribution
Quick Shape Identification from Boxplots:
- Symmetric: Median centered in box, whiskers roughly equal length
- Right-Skewed: Median closer to Q1, longer upper whisker
- Left-Skewed: Median closer to Q3, longer lower whisker
Identifying Outliers: The 1.5×IQR Rule
Outlier (formal definition): Any value that falls more than 1.5 × IQR below Q1 or above Q3.
The 1.5×IQR Rule for Outliers
Step 1: Calculate IQR
IQR = Q3 - Q1
Step 2: Calculate fences (boundaries)
- Lower Fence = Q1 - (1.5 × IQR)
- Upper Fence = Q3 + (1.5 × IQR)
Step 3: Identify outliers
- Any value below lower fence = outlier
- Any value above upper fence = outlier
Example 2: Finding Outliers
Dataset: Daily sales ($): 200, 210, 215, 220, 225, 230, 235, 240, 450
Step 1: Find five-number summary
- Min = 200, Q1 = 212.5, Median = 225, Q3 = 237.5, Max = 450
Step 2: Calculate IQR
IQR = Q3 - Q1 = 237.5 - 212.5 = 25
Step 3: Calculate fences
- Lower Fence = Q1 - (1.5 × IQR) = 212.5 - (1.5 × 25) = 212.5 - 37.5 = 175
- Upper Fence = Q3 + (1.5 × IQR) = 237.5 + (1.5 × 25) = 237.5 + 37.5 = 275
Step 4: Identify outliers
- Is any value < 175? No
- Is any value > 275? Yes! 450 is an outlier
Interpretation: The $450 day is unusual – maybe a special event or catering order. It's statistically different from typical daily sales ($200-240).
Boxplot with Outlier
Note: Outliers are plotted as individual points beyond the whiskers
Comparing Groups with Side-by-Side Boxplots
One of the best uses of boxplots is comparing multiple groups on the same scale!
Example 3: Comparing Test Scores (Class A vs. Class B)
Class A Scores: Five-number summary = (62, 72, 80, 88, 95)
Class B Scores: Five-number summary = (55, 68, 75, 82, 98)
Comparison Analysis:
- Center: Class A (median=80) scored higher than Class B (median=75)
- Spread: Class A IQR = 16, Class B IQR = 14 (similar variability)
- Range: Class A = 33 points, Class B = 43 points (B more variable overall)
- Shape: Both appear roughly symmetric (median centered in box)
Conclusion: Class A performed better on average, with tighter scores at the extremes.
Advantages of Side-by-Side Boxplots:
- Easy visual comparison of centers (medians)
- Compare variability at a glance (box widths = IQR)
- Identify shape differences between groups
- Spot outliers in each group
- Compact - can compare many groups in small space
Boxplots vs. Histograms: When to Use Each?
| Feature | Boxplot | Histogram |
|---|---|---|
| Shows center | Yes (median) | Yes (peak) |
| Shows spread | Yes (IQR, range) | Yes (width) |
| Shows shape | Yes (roughly) | Yes (detailed) |
| Shows individual values | No | No (grouped) |
| Identifies outliers | Yes (with 1.5×IQR rule) | Sometimes visible |
| Compares groups | Excellent (side-by-side) | Harder (overlapping) |
| Space efficient | Very compact | Takes more space |
| Shows detailed shape | Less detail | Shows exact distribution |
When to Use Which?
Use Boxplots when:
- Comparing multiple groups
- You want a quick summary (5 numbers)
- Identifying outliers is important
- Space is limited
Use Histograms when:
- You want to see the detailed shape
- Analyzing a single group
- You need to see modes (peaks)
- Distribution shape detail matters
Best practice: Use both! They complement each other.
Check Your Understanding
Test your knowledge of five-number summaries and boxplots!
Question 1:
A dataset has these values: Min=10, Q1=20, Median=30, Q3=40, Max=50. What is the IQR?
Question 2:
Using the dataset from Q1, calculate the upper fence for outlier detection (Q3 + 1.5×IQR).
Question 3:
In a boxplot, if the median line is much closer to Q1 than to Q3, what does this indicate about the distribution?
Question 4:
What percentage of data falls between Q1 and Q3 (inside the box)?
Question 5:
When comparing test scores for 5 different classes, which visualization is better?
Question 6:
A value of 85 is in a dataset with Q1=50, Q3=70, IQR=20. Is 85 an outlier?
Key Takeaways
- Five-number summary: Min, Q1, Median, Q3, Max – divides data into four equal quarters.
- Boxplot shows: Center (median), spread (IQR, range), shape (symmetric vs. skewed), and outliers.
- IQR = Q3 - Q1 – the range of the middle 50% of data.
- 1.5×IQR Rule: Outliers are values beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR.
- Reading shape: Median centered = symmetric, median near Q1 = right-skewed, median near Q3 = left-skewed.
- Side-by-side boxplots are excellent for comparing multiple groups quickly!
- Boxplots vs. histograms: Boxplots better for comparisons, histograms better for detailed shape.