1. Measures of Center

Measures of Center: Single values that represent the "typical" or "central" value in a dataset. They help us summarize an entire dataset with one number.

The Three Main Measures

Mean (Average)

Mean = (Sum of all values) ÷ (Number of values)

x̄ = Σx / n

Where: x̄ = mean, Σx = sum of all values, n = number of values

Example: Test scores: 85, 90, 78, 92, 88
Mean = (85 + 90 + 78 + 92 + 88) ÷ 5 = 433 ÷ 5 = 86.6

Median (Middle Value)

Median: The middle value when data is arranged in order

How to Find the Median:

Arrange data in order from smallest to largest
If odd number of values: Median is the middle value
If even number of values: Median is the average of the two middle values

Example (odd): 2, 5, 7, 9, 12 → Median = 7 (middle value)
Example (even): 3, 5, 8, 10 → Median = (5 + 8) ÷ 2 = 6.5

Mode (Most Frequent)

Mode: The value that appears most often in the dataset

No mode: All values appear the same number of times
Unimodal: One mode (one peak)
Bimodal: Two modes (two peaks)
Multimodal: More than two modes

Example: 3, 7, 7, 9, 12, 7, 15 → Mode = 7 (appears 3 times)

When to Use Each Measure

Measure	Best Used When...	Advantages	Disadvantages
Mean	Data is fairly symmetric with no extreme outliers	Uses all data points; familiar to everyone	Very sensitive to outliers
Median	Data has outliers or is skewed	Not affected by outliers; better for skewed data	Doesn't use all information
Mode	Categorical data or finding most common value	Easy to identify; works with any data type	May not exist or may not be unique

CRITICAL: For salary data, home prices, and other datasets with extreme values, the median is usually more representative than the mean!

2. Measures of Spread (Variability)

Measures of Spread: Values that describe how spread out or dispersed the data is. They tell us how much variability exists in the dataset.

Range

Range = Maximum − Minimum

Example: Data: 5, 12, 8, 20, 15
Range = 20 − 5 = 15

Limitation: Only uses two values (ignores everything in between); very sensitive to outliers

Interquartile Range (IQR)

IQR = Q3 − Q1

Where: Q1 = first quartile (25th percentile), Q3 = third quartile (75th percentile)

IQR: The range of the middle 50% of the data. It measures spread while being resistant to outliers.

How to Find IQR:

Arrange data in order
Find the median (Q2)
Q1 = median of the lower half (below Q2)
Q3 = median of the upper half (above Q2)
IQR = Q3 − Q1

Example: 3, 5, 7, 9, 11, 13, 15, 17, 19
Median (Q2) = 11
Lower half: 3, 5, 7, 9 → Q1 = (5 + 7) ÷ 2 = 6
Upper half: 13, 15, 17, 19 → Q3 = (15 + 17) ÷ 2 = 16
IQR = 16 − 6 = 10

Variance

s² = Σ(x − x̄)² / (n − 1)

Where: s² = variance, x = each value, x̄ = mean, n = sample size

Variance: The average of the squared deviations from the mean. It measures how far each value is from the mean, on average.

Step-by-Step Calculation:

Calculate the mean (x̄)
For each value, find (x − x̄)
Square each deviation: (x − x̄)²
Sum all squared deviations: Σ(x − x̄)²
Divide by (n − 1)

Example: Data: 4, 6, 8, 10
Mean = (4 + 6 + 8 + 10) ÷ 4 = 7

x	x − x̄	(x − x̄)²
4	−3	9
6	−1	1
8	1	1
10	3	9
Sum:		20

s² = 20 ÷ (4 − 1) = 20 ÷ 3 = 6.67

Standard Deviation

s = √[Σ(x − x̄)² / (n − 1)]

Simplified: s = √(variance)

Standard Deviation (SD): The square root of the variance. It measures the typical distance of values from the mean, in the same units as the data.

Example (continued from variance):
s = √6.67 = 2.58
Interpretation: On average, values are about 2.58 units away from the mean.

KEY DIFFERENCE: Variance is in squared units; standard deviation is in original units (easier to interpret!)

The Empirical Rule (68-95-99.7 Rule)

The Empirical Rule

For bell-shaped (normal) distributions:

~68% of data falls within 1 standard deviation of the mean (x̄ ± 1s)
~95% of data falls within 2 standard deviations of the mean (x̄ ± 2s)
~99.7% of data falls within 3 standard deviations of the mean (x̄ ± 3s)

Example: SAT scores have mean = 1050 and SD = 100
• About 68% of students score between 950 and 1150 (1050 ± 100)
• About 95% score between 850 and 1250 (1050 ± 200)
• About 99.7% score between 750 and 1350 (1050 ± 300)

IMPORTANT: The Empirical Rule only works for bell-shaped (approximately normal) distributions!

3. Distribution Shapes and Skewness

Distribution Shape: The overall pattern of how data values are spread out. Understanding shape helps us choose appropriate summary statistics.

Three Main Shapes

Symmetric Distribution

Characteristics:

Left and right sides are mirror images
Mean ≈ Median ≈ Mode (all approximately equal)
Data is balanced around the center

Examples: Heights of adults, IQ scores, measurement errors, test scores (well-designed tests)

Right-Skewed (Positively Skewed)

Characteristics:

Long tail extends to the right
Mode < Median < Mean
Mean is pulled toward the tail by extreme high values

Examples: Income, home prices, age at first marriage, company revenues

Left-Skewed (Negatively Skewed)

Characteristics:

Long tail extends to the left
Mean < Median < Mode
Mean is pulled toward the tail by extreme low values

Examples: Age at death, exam scores (very easy test), reaction times

Mean vs. Median Relationship by Shape

Distribution Shape	Relationship	Best Measure of Center
Symmetric	Mean ≈ Median ≈ Mode	Mean (uses all data)
Right-Skewed	Mode < Median < Mean	Median (not affected by high outliers)
Left-Skewed	Mean < Median < Mode	Median (not affected by low outliers)

QUICK TIP: The mean is always pulled in the direction of the skew (toward the tail)!

Impact of Outliers

Outlier: A data value that is unusually far from the rest of the data

Statistic	Affected by Outliers?	Called...
Mean	YES - very sensitive	Not resistant
Median	NO - stays stable	Resistant
Mode	NO - unaffected	Resistant
Range	YES - very sensitive	Not resistant
IQR	NO - stays stable	Resistant
Standard Deviation	YES - very sensitive	Not resistant

4. Five-Number Summary & Boxplots

Five-Number Summary

Five-Number Summary: A set of five values that completely describe the distribution of a dataset

The Five Numbers:

Minimum (Min): Smallest value
First Quartile (Q1): 25th percentile
Median (Q2): 50th percentile (middle value)
Third Quartile (Q3): 75th percentile
Maximum (Max): Largest value

Example: Data: 2, 4, 6, 8, 10, 12, 14, 16, 18
• Min = 2
• Q1 = 5 (median of 2, 4, 6, 8)
• Median (Q2) = 10
• Q3 = 15 (median of 12, 14, 16, 18)
• Max = 18
Five-Number Summary: {2, 5, 10, 15, 18}

Boxplots (Box-and-Whisker Plots)

Boxplot: A visual display of the five-number summary that shows the distribution shape and identifies potential outliers

Boxplot Components:

Box: Extends from Q1 to Q3 (contains middle 50% of data)
Line inside box: Shows the median (Q2)
Whiskers: Extend to the smallest and largest values within 1.5×IQR
Individual points: Outliers beyond the whiskers

Outlier Detection Rule (1.5×IQR Rule)

Lower Fence = Q1 − (1.5 × IQR)

Upper Fence = Q3 + (1.5 × IQR)

Any value below the lower fence or above the upper fence is considered an outlier

Example: Q1 = 30, Q3 = 50, IQR = 20
Lower Fence = 30 − (1.5 × 20) = 30 − 30 = 0
Upper Fence = 50 + (1.5 × 20) = 50 + 30 = 80
Outliers: Any value < 0 or > 80

Reading Boxplots

What Boxplots Tell Us:

Center: Location of the median line
Spread: Width of the box and length of whiskers
Shape:
- Symmetric: Median centered in box, equal whiskers
- Right-skewed: Median left of center, longer right whisker
- Left-skewed: Median right of center, longer left whisker
Outliers: Individual points beyond whiskers

Comparing Distributions with Boxplots

Side-by-side boxplots let us compare:

Centers: Which group has higher/lower median?
Spread: Which group is more variable (wider box/longer whiskers)?
Shape: Which group is more symmetric/skewed?
Outliers: Which group has unusual values?

Quick Reference: All Formulas

Measures of Center

Mean: x̄ = Σx / n

Median: Middle value when ordered

Mode: Most frequent value

Measures of Spread

Range = Max − Min

IQR = Q3 − Q1

Variance: s² = Σ(x − x̄)² / (n − 1)

Standard Deviation: s = √s²

Empirical Rule (Bell-Shaped Distributions)

68% within x̄ ± 1s

95% within x̄ ± 2s

99.7% within x̄ ± 3s

Outlier Detection

Lower Fence = Q1 − 1.5×IQR

Upper Fence = Q3 + 1.5×IQR

Distribution Shapes

Symmetric: Mean ≈ Median ≈ Mode

Right-Skewed: Mode < Median < Mean

Left-Skewed: Mean < Median < Mode

Important Reminders

1. Use median for skewed data or data with outliers
2. Use mean for symmetric data without outliers
3. IQR is resistant to outliers; range and SD are not
4. The Empirical Rule only works for bell-shaped distributions
5. Mean is always pulled toward the direction of the skew
6. A boxplot shows the five-number summary visually
7. Standard deviation has the same units as the data

Module 2: Descriptive Statistics

Free Statistics Learning Platform • Safaa Dabagh • sdabagh.github.io

Module 2 Study Guide

1. Measures of Center

The Three Main Measures

Mean (Average)

Median (Middle Value)

Mode (Most Frequent)

When to Use Each Measure

2. Measures of Spread (Variability)

Range

Interquartile Range (IQR)

Variance

Standard Deviation

The Empirical Rule (68-95-99.7 Rule)

The Empirical Rule

3. Distribution Shapes and Skewness

Three Main Shapes

Symmetric Distribution

Right-Skewed (Positively Skewed)

Left-Skewed (Negatively Skewed)

Mean vs. Median Relationship by Shape

Impact of Outliers

4. Five-Number Summary & Boxplots

Five-Number Summary

Boxplots (Box-and-Whisker Plots)

Outlier Detection Rule (1.5×IQR Rule)

Reading Boxplots

Comparing Distributions with Boxplots

Quick Reference: All Formulas

Measures of Center

Measures of Spread

Empirical Rule (Bell-Shaped Distributions)

Outlier Detection

Distribution Shapes

Important Reminders