Module 2 Study Guide
Descriptive Statistics
Free Statistics Learning Platform • Safaa Dabagh
1. Measures of Center
The Three Main Measures
Mean (Average)
Mean = (Sum of all values) ÷ (Number of values)
x̄ = Σx / n
Where: x̄ = mean, Σx = sum of all values, n = number of values
Mean = (85 + 90 + 78 + 92 + 88) ÷ 5 = 433 ÷ 5 = 86.6
Median (Middle Value)
- Arrange data in order from smallest to largest
- If odd number of values: Median is the middle value
- If even number of values: Median is the average of the two middle values
Example (even): 3, 5, 8, 10 → Median = (5 + 8) ÷ 2 = 6.5
Mode (Most Frequent)
- No mode: All values appear the same number of times
- Unimodal: One mode (one peak)
- Bimodal: Two modes (two peaks)
- Multimodal: More than two modes
When to Use Each Measure
| Measure | Best Used When... | Advantages | Disadvantages |
|---|---|---|---|
| Mean | Data is fairly symmetric with no extreme outliers | Uses all data points; familiar to everyone | Very sensitive to outliers |
| Median | Data has outliers or is skewed | Not affected by outliers; better for skewed data | Doesn't use all information |
| Mode | Categorical data or finding most common value | Easy to identify; works with any data type | May not exist or may not be unique |
2. Measures of Spread (Variability)
Range
Range = Maximum − Minimum
Range = 20 − 5 = 15
Limitation: Only uses two values (ignores everything in between); very sensitive to outliers
Interquartile Range (IQR)
IQR = Q3 − Q1
Where: Q1 = first quartile (25th percentile), Q3 = third quartile (75th percentile)
- Arrange data in order
- Find the median (Q2)
- Q1 = median of the lower half (below Q2)
- Q3 = median of the upper half (above Q2)
- IQR = Q3 − Q1
Median (Q2) = 11
Lower half: 3, 5, 7, 9 → Q1 = (5 + 7) ÷ 2 = 6
Upper half: 13, 15, 17, 19 → Q3 = (15 + 17) ÷ 2 = 16
IQR = 16 − 6 = 10
Variance
s² = Σ(x − x̄)² / (n − 1)
Where: s² = variance, x = each value, x̄ = mean, n = sample size
- Calculate the mean (x̄)
- For each value, find (x − x̄)
- Square each deviation: (x − x̄)²
- Sum all squared deviations: Σ(x − x̄)²
- Divide by (n − 1)
Mean = (4 + 6 + 8 + 10) ÷ 4 = 7
| x | x − x̄ | (x − x̄)² |
|---|---|---|
| 4 | −3 | 9 |
| 6 | −1 | 1 |
| 8 | 1 | 1 |
| 10 | 3 | 9 |
| Sum: | 20 | |
s² = 20 ÷ (4 − 1) = 20 ÷ 3 = 6.67
Standard Deviation
s = √[Σ(x − x̄)² / (n − 1)]
Simplified: s = √(variance)
s = √6.67 = 2.58
Interpretation: On average, values are about 2.58 units away from the mean.
The Empirical Rule (68-95-99.7 Rule)
The Empirical Rule
For bell-shaped (normal) distributions:
- ~68% of data falls within 1 standard deviation of the mean (x̄ ± 1s)
- ~95% of data falls within 2 standard deviations of the mean (x̄ ± 2s)
- ~99.7% of data falls within 3 standard deviations of the mean (x̄ ± 3s)
• About 68% of students score between 950 and 1150 (1050 ± 100)
• About 95% score between 850 and 1250 (1050 ± 200)
• About 99.7% score between 750 and 1350 (1050 ± 300)
3. Distribution Shapes and Skewness
Three Main Shapes
Symmetric Distribution
Characteristics:
- Left and right sides are mirror images
- Mean ≈ Median ≈ Mode (all approximately equal)
- Data is balanced around the center
Right-Skewed (Positively Skewed)
Characteristics:
- Long tail extends to the right
- Mode < Median < Mean
- Mean is pulled toward the tail by extreme high values
Left-Skewed (Negatively Skewed)
Characteristics:
- Long tail extends to the left
- Mean < Median < Mode
- Mean is pulled toward the tail by extreme low values
Mean vs. Median Relationship by Shape
| Distribution Shape | Relationship | Best Measure of Center |
|---|---|---|
| Symmetric | Mean ≈ Median ≈ Mode | Mean (uses all data) |
| Right-Skewed | Mode < Median < Mean | Median (not affected by high outliers) |
| Left-Skewed | Mean < Median < Mode | Median (not affected by low outliers) |
Impact of Outliers
| Statistic | Affected by Outliers? | Called... |
|---|---|---|
| Mean | YES - very sensitive | Not resistant |
| Median | NO - stays stable | Resistant |
| Mode | NO - unaffected | Resistant |
| Range | YES - very sensitive | Not resistant |
| IQR | NO - stays stable | Resistant |
| Standard Deviation | YES - very sensitive | Not resistant |
4. Five-Number Summary & Boxplots
Five-Number Summary
The Five Numbers:
- Minimum (Min): Smallest value
- First Quartile (Q1): 25th percentile
- Median (Q2): 50th percentile (middle value)
- Third Quartile (Q3): 75th percentile
- Maximum (Max): Largest value
• Min = 2
• Q1 = 5 (median of 2, 4, 6, 8)
• Median (Q2) = 10
• Q3 = 15 (median of 12, 14, 16, 18)
• Max = 18
Five-Number Summary: {2, 5, 10, 15, 18}
Boxplots (Box-and-Whisker Plots)
- Box: Extends from Q1 to Q3 (contains middle 50% of data)
- Line inside box: Shows the median (Q2)
- Whiskers: Extend to the smallest and largest values within 1.5×IQR
- Individual points: Outliers beyond the whiskers
Outlier Detection Rule (1.5×IQR Rule)
Lower Fence = Q1 − (1.5 × IQR)
Upper Fence = Q3 + (1.5 × IQR)
Any value below the lower fence or above the upper fence is considered an outlier
Lower Fence = 30 − (1.5 × 20) = 30 − 30 = 0
Upper Fence = 50 + (1.5 × 20) = 50 + 30 = 80
Outliers: Any value < 0 or > 80
Reading Boxplots
- Center: Location of the median line
- Spread: Width of the box and length of whiskers
- Shape:
- Symmetric: Median centered in box, equal whiskers
- Right-skewed: Median left of center, longer right whisker
- Left-skewed: Median right of center, longer left whisker
- Outliers: Individual points beyond whiskers
Comparing Distributions with Boxplots
Side-by-side boxplots let us compare:
- Centers: Which group has higher/lower median?
- Spread: Which group is more variable (wider box/longer whiskers)?
- Shape: Which group is more symmetric/skewed?
- Outliers: Which group has unusual values?
Quick Reference: All Formulas
Measures of Center
Mean: x̄ = Σx / n
Median: Middle value when ordered
Mode: Most frequent value
Measures of Spread
Range = Max − Min
IQR = Q3 − Q1
Variance: s² = Σ(x − x̄)² / (n − 1)
Standard Deviation: s = √s²
Empirical Rule (Bell-Shaped Distributions)
68% within x̄ ± 1s
95% within x̄ ± 2s
99.7% within x̄ ± 3s
Outlier Detection
Lower Fence = Q1 − 1.5×IQR
Upper Fence = Q3 + 1.5×IQR
Distribution Shapes
Symmetric: Mean ≈ Median ≈ Mode
Right-Skewed: Mode < Median < Mean
Left-Skewed: Mean < Median < Mode
Important Reminders
2. Use mean for symmetric data without outliers
3. IQR is resistant to outliers; range and SD are not
4. The Empirical Rule only works for bell-shaped distributions
5. Mean is always pulled toward the direction of the skew
6. A boxplot shows the five-number summary visually
7. Standard deviation has the same units as the data
Module 2: Descriptive Statistics
Free Statistics Learning Platform • Safaa Dabagh • sdabagh.github.io
© 2025 • Part of UCLA Dissertation Research