25-30 minutes
Lesson 3: Distribution Shapes and Skewness
Learning Objectives
By the end of this lesson, you will be able to:
- Identify and describe symmetric distributions
- Recognize right-skewed and left-skewed distributions
- Understand the relationship between mean, median, and mode for different distribution shapes
- Identify outliers and understand their impact on distribution shape
- Choose appropriate measures of center and spread based on distribution shape
- Recognize real-world examples of each distribution type
What is Distribution Shape?
When we plot data on a graph (like a histogram or dot plot), the overall shape of the data tells us important information about the dataset.
Understanding distribution shape helps us:
- Choose the right summary statistics (mean vs. median)
- Understand the data's story (Are most values clustered? Are there outliers?)
- Make predictions about future observations
- Identify unusual patterns that might need investigation
Distribution Shape: The overall pattern of how data values are spread out. The three most important shapes are:
- Symmetric - Data is evenly distributed on both sides of the center
- Right-Skewed - Data has a long tail extending to the right (high values)
- Left-Skewed - Data has a long tail extending to the left (low values)
Symmetric Distributions
Symmetric Distribution: A distribution where the left and right sides are approximately mirror images of each other.
Symmetric Distribution
Notice: Perfectly balanced on both sides
Relationship of Measures in Symmetric Distributions:
Mean ≈ Median ≈ Mode
When data is symmetric, all three measures of center are approximately equal and located at the center of the distribution.
Example 1: Symmetric Data
Heights of adult women (inches):
60, 62, 63, 64, 64, 65, 65, 65, 66, 66, 67, 68, 70
Calculate the measures:
- Mean: 64.92 inches
- Median: 65 inches (middle value)
- Mode: 65 inches (appears 3 times)
All three measures are very close! (64.92 ≈ 65 ≈ 65)
Heights tend to be symmetric because most people cluster around average height, with fewer people at extreme heights (very short or very tall).
When to Expect Symmetric Distributions:
- Human measurements: Heights, weights (in homogeneous groups)
- Test scores: When tests are well-designed
- Manufacturing data: Product dimensions with quality control
- Measurement errors: Random errors in scientific measurements
Which Measure of Center to Use?
For symmetric distributions: Mean or Median – both work equally well!
The mean is preferred because it uses all data values, but the median is equally representative when data is symmetric.
Right-Skewed Distributions (Positive Skew)
Right-Skewed Distribution: A distribution with a long tail extending to the right. Most values are concentrated on the left, with a few unusually high values pulling the tail to the right.
Also called: Positively skewed (because the tail goes toward positive/high numbers)
Right-Skewed Distribution
Notice: Tail extends to the right
Relationship of Measures in Right-Skewed Distributions:
Mode < Median < Mean
The mean gets "pulled" toward the tail by high outliers. The median stays near the bulk of the data. The mode is where most values cluster (left side).
Example 2: Right-Skewed Data (Household Income)
Annual household incomes in a neighborhood (thousands):
$35k, $40k, $42k, $45k, $48k, $50k, $52k, $55k, $60k, $180k
Calculate the measures:
- Mode: No clear mode (or multiple modes ~$40-50k range)
- Median: (48 + 50) / 2 = $49k
- Mean: 607k / 10 = $60.7k
Mean > Median! The $180k income pulled the mean up, but the median stays at $49k.
Which is more representative? The median ($49k) better represents a "typical" household. Nine out of ten households earn ≤$60k. The mean ($60.7k) is inflated by one high earner.
Common Right-Skewed Distributions:
- Income and wealth: Most people earn moderate amounts, few earn very high amounts
- Home prices: Most homes in affordable range, few mansions
- Company sizes: Many small businesses, few giant corporations
- Response times: Most quick, occasional very slow responses
- Ages at death (in developed countries): Most live to old age, few die young
Which Measure of Center to Use?
For right-skewed distributions: Use the MEDIAN!
The mean is misleading because it's pulled way up by outliers. The median better represents the "typical" value.
Example: Median household income is more meaningful than mean household income because of wealthy outliers.
Left-Skewed Distributions (Negative Skew)
Left-Skewed Distribution: A distribution with a long tail extending to the left. Most values are concentrated on the right, with a few unusually low values pulling the tail to the left.
Also called: Negatively skewed (because the tail goes toward negative/low numbers)
Left-Skewed Distribution
Notice: Tail extends to the left
Relationship of Measures in Left-Skewed Distributions:
Mean < Median < Mode
The mean gets "pulled" toward the tail by low outliers. The median stays near the bulk of the data. The mode is where most values cluster (right side).
Example 3: Left-Skewed Data (Test Scores)
Scores on an easy test (out of 100):
98, 97, 96, 95, 94, 93, 92, 91, 88, 85, 82, 60, 55
Calculate the measures:
- Mode: No single mode (most scores 90-98)
- Median: 92 (middle value of 13 scores)
- Mean: 1146 / 13 = 88.15
Mean < Median! The two very low scores (55, 60) pulled the mean down.
Most students scored in the 90s (the test was easy!), but a couple of students struggled and scored much lower, creating a left tail.
Common Left-Skewed Distributions:
- Easy test scores: Most students score high, few score low
- Age at retirement: Most retire ~65, few retire much earlier
- Product lifespans with warranties: Most products last a long time, few fail early
- Restaurant bills: Most moderate, occasional very cheap meals
Note: Left-skewed distributions are less common in real life than right-skewed!
Which Measure of Center to Use?
For left-skewed distributions: Use the MEDIAN!
The mean is pulled down by low outliers and doesn't represent the typical value well.
Summary: Comparing Distribution Shapes
| Distribution Shape | Tail Location | Measure Relationship | Best Center Measure | Common Examples |
|---|---|---|---|---|
| Symmetric | No tail (balanced) | Mean ≈ Median ≈ Mode | Mean or Median | Heights, well-designed tests |
| Right-Skewed | Long tail to right | Mode < Median < Mean | Median | Income, home prices |
| Left-Skewed | Long tail to left | Mean < Median < Mode | Median | Easy test scores, retirement age |
Quick Rule for Choosing Center Measure:
Symmetric data? → Use mean (or median)
Skewed data? → Use median (resistant to outliers)
Categorical data? → Use mode (only option!)
Identifying Outliers
Outlier: An observation that is unusually far from the bulk of the data. Outliers can significantly affect the mean and create skewness.
Outliers often create the "tails" in skewed distributions. They're important to identify because they:
- Can dramatically affect the mean
- Might indicate data entry errors
- Might represent genuinely unusual observations worth investigating
- Influence which summary statistics to use
Example 4: Spotting Outliers
Daily coffee sales at a café:
$850, $920, $880, $900, $910, $870, $3200
Analysis:
- Six days: Sales in $850-920 range (pretty consistent)
- One day: $3,200 (way higher!)
Question: What happened on that day?
- Maybe they catered a big event?
- Maybe it's a data entry error ($320 entered as $3200)?
- Either way, $3,200 is an outlier
Impact:
- Mean with outlier: $1,218.57
- Median with outlier: $900
- The median ($900) better represents "typical daily sales"
What to Do with Outliers?
DON'T automatically delete them! Investigate first:
- Data error? Fix or remove it
- Genuine unusual observation? Keep it, but report it separately
- Different population? Consider analyzing separately
Best practice: Report measures both with and without outliers to show their impact.
Real-World Examples by Shape
Symmetric Examples
- Heights of adults (same gender)
- IQ scores (by design)
- SAT/ACT scores (scaled to be symmetric)
- Blood pressure in healthy adults
- Measurement errors in scientific instruments
Right-Skewed Examples
- Household income
- Home prices
- CEO salaries
- Number of social media followers
- City populations
- Wait times in emergency rooms
Left-Skewed Examples
- Age at death (in developed countries)
- Easy exam scores
- Time to complete a very easy task
- Retirement age
- Years of education (most finish high school)
Check Your Understanding
Test your knowledge of distribution shapes!
Question 1:
A dataset has Mean = 45, Median = 50, Mode = 52. What shape is this distribution?
Question 2:
Which distribution shape would you expect for ages of people in a retirement community?
Question 3:
Home prices in a city have Mean = $520k and Median = $380k. What does this tell you?
Question 4:
For a strongly right-skewed distribution of salaries, which measure of center should you report?
Question 5:
In a symmetric distribution, where is the mean located relative to the median?
Key Takeaways
- Symmetric: Mean ≈ Median ≈ Mode. Use mean or median. Examples: heights, IQ scores.
- Right-Skewed: Mode < Median < Mean. Use median! Examples: income, home prices.
- Left-Skewed: Mean < Median < Mode. Use median! Examples: easy test scores, age at death.
- The mean is pulled toward the tail in skewed distributions.
- The median is resistant to outliers and represents the typical value better when data is skewed.
- Outliers create skewness – investigate them, don't automatically delete!
- Real-world tip: Income, wealth, and prices are almost always right-skewed. Use median!