Save or print this lesson:

← Back to Module 2

Module 2, Lesson 3 of 4

25-30 minutes

← Previous Next Lesson →

Lesson 3: Distribution Shapes and Skewness

Learning Objectives

By the end of this lesson, you will be able to:

Identify and describe symmetric distributions
Recognize right-skewed and left-skewed distributions
Understand the relationship between mean, median, and mode for different distribution shapes
Identify outliers and understand their impact on distribution shape
Choose appropriate measures of center and spread based on distribution shape
Recognize real-world examples of each distribution type

What is Distribution Shape?

When we plot data on a graph (like a histogram or dot plot), the overall shape of the data tells us important information about the dataset.

Understanding distribution shape helps us:

Choose the right summary statistics (mean vs. median)
Understand the data's story (Are most values clustered? Are there outliers?)
Make predictions about future observations
Identify unusual patterns that might need investigation

Distribution Shape: The overall pattern of how data values are spread out. The three most important shapes are:

Symmetric - Data is evenly distributed on both sides of the center
Right-Skewed - Data has a long tail extending to the right (high values)
Left-Skewed - Data has a long tail extending to the left (low values)

Symmetric Distributions

Symmetric Distribution: A distribution where the left and right sides are approximately mirror images of each other.

Symmetric Distribution

* *** ***** ******* ********* *********** ************* ━━━━━━━━━━━━━━━ Mean/ Median/Mode (all equal)

Notice: Perfectly balanced on both sides

Relationship of Measures in Symmetric Distributions:

Mean ≈ Median ≈ Mode

When data is symmetric, all three measures of center are approximately equal and located at the center of the distribution.

Example 1: Symmetric Data

Heights of adult women (inches):

60, 62, 63, 64, 64, 65, 65, 65, 66, 66, 67, 68, 70

Calculate the measures:

Mean: 64.92 inches
Median: 65 inches (middle value)
Mode: 65 inches (appears 3 times)

All three measures are very close! (64.92 ≈ 65 ≈ 65)

Heights tend to be symmetric because most people cluster around average height, with fewer people at extreme heights (very short or very tall).

When to Expect Symmetric Distributions:

Human measurements: Heights, weights (in homogeneous groups)
Test scores: When tests are well-designed
Manufacturing data: Product dimensions with quality control
Measurement errors: Random errors in scientific measurements

Which Measure of Center to Use?

For symmetric distributions: Mean or Median – both work equally well!

The mean is preferred because it uses all data values, but the median is equally representative when data is symmetric.

Right-Skewed Distributions (Positive Skew)

Right-Skewed Distribution: A distribution with a long tail extending to the right. Most values are concentrated on the left, with a few unusually high values pulling the tail to the right.

Also called: Positively skewed (because the tail goes toward positive/high numbers)

Right-Skewed Distribution

* *** ***** ******* ********* ******* *** * * ← Long tail * to the right ━━━━━━━━━━━━━━━ Mode Median Mean ↑ ↑ ↑

Notice: Tail extends to the right

Relationship of Measures in Right-Skewed Distributions:

Mode < Median < Mean

The mean gets "pulled" toward the tail by high outliers. The median stays near the bulk of the data. The mode is where most values cluster (left side).

Example 2: Right-Skewed Data (Household Income)

Annual household incomes in a neighborhood (thousands):

$35k, $40k, $42k, $45k, $48k, $50k, $52k, $55k, $60k, $180k

Calculate the measures:

Mode: No clear mode (or multiple modes ~$40-50k range)
Median: (48 + 50) / 2 = $49k
Mean: 607k / 10 = $60.7k

Mean > Median! The $180k income pulled the mean up, but the median stays at $49k.

Which is more representative? The median ($49k) better represents a "typical" household. Nine out of ten households earn ≤$60k. The mean ($60.7k) is inflated by one high earner.

Common Right-Skewed Distributions:

Income and wealth: Most people earn moderate amounts, few earn very high amounts
Home prices: Most homes in affordable range, few mansions
Company sizes: Many small businesses, few giant corporations
Response times: Most quick, occasional very slow responses
Ages at death (in developed countries): Most live to old age, few die young

Which Measure of Center to Use?

For right-skewed distributions: Use the MEDIAN!

The mean is misleading because it's pulled way up by outliers. The median better represents the "typical" value.

Example: Median household income is more meaningful than mean household income because of wealthy outliers.

Left-Skewed Distributions (Negative Skew)

Left-Skewed Distribution: A distribution with a long tail extending to the left. Most values are concentrated on the right, with a few unusually low values pulling the tail to the left.

Also called: Negatively skewed (because the tail goes toward negative/low numbers)

Left-Skewed Distribution

* *** ***** ******* ********* ******* *** * * ← Long tail * to the left ━━━━━━━━━━━━━━━ Mean Median Mode ↑ ↑ ↑

Notice: Tail extends to the left

Relationship of Measures in Left-Skewed Distributions:

Mean < Median < Mode

The mean gets "pulled" toward the tail by low outliers. The median stays near the bulk of the data. The mode is where most values cluster (right side).

Example 3: Left-Skewed Data (Test Scores)

Scores on an easy test (out of 100):

98, 97, 96, 95, 94, 93, 92, 91, 88, 85, 82, 60, 55

Calculate the measures:

Mode: No single mode (most scores 90-98)
Median: 92 (middle value of 13 scores)
Mean: 1146 / 13 = 88.15

Mean < Median! The two very low scores (55, 60) pulled the mean down.

Most students scored in the 90s (the test was easy!), but a couple of students struggled and scored much lower, creating a left tail.

Common Left-Skewed Distributions:

Easy test scores: Most students score high, few score low
Age at retirement: Most retire ~65, few retire much earlier
Product lifespans with warranties: Most products last a long time, few fail early
Restaurant bills: Most moderate, occasional very cheap meals

Note: Left-skewed distributions are less common in real life than right-skewed!

Which Measure of Center to Use?

For left-skewed distributions: Use the MEDIAN!

The mean is pulled down by low outliers and doesn't represent the typical value well.

Summary: Comparing Distribution Shapes

Distribution Shape	Tail Location	Measure Relationship	Best Center Measure	Common Examples
Symmetric	No tail (balanced)	Mean ≈ Median ≈ Mode	Mean or Median	Heights, well-designed tests
Right-Skewed	Long tail to right	Mode < Median < Mean	Median	Income, home prices
Left-Skewed	Long tail to left	Mean < Median < Mode	Median	Easy test scores, retirement age

Quick Rule for Choosing Center Measure:

Symmetric data? → Use mean (or median)
Skewed data? → Use median (resistant to outliers)
Categorical data? → Use mode (only option!)

Identifying Outliers

Outlier: An observation that is unusually far from the bulk of the data. Outliers can significantly affect the mean and create skewness.

Outliers often create the "tails" in skewed distributions. They're important to identify because they:

Can dramatically affect the mean
Might indicate data entry errors
Might represent genuinely unusual observations worth investigating
Influence which summary statistics to use

Example 4: Spotting Outliers

Daily coffee sales at a café:

$850, $920, $880, $900, $910, $870, $3200

Analysis:

Six days: Sales in $850-920 range (pretty consistent)
One day: $3,200 (way higher!)

Question: What happened on that day?

Maybe they catered a big event?
Maybe it's a data entry error ($320 entered as $3200)?
Either way, $3,200 is an outlier

Impact:

Mean with outlier: $1,218.57
Median with outlier: $900
The median ($900) better represents "typical daily sales"

What to Do with Outliers?

DON'T automatically delete them! Investigate first:

Data error? Fix or remove it
Genuine unusual observation? Keep it, but report it separately
Different population? Consider analyzing separately

Best practice: Report measures both with and without outliers to show their impact.

Real-World Examples by Shape

Symmetric Examples

Heights of adults (same gender)
IQ scores (by design)
SAT/ACT scores (scaled to be symmetric)
Blood pressure in healthy adults
Measurement errors in scientific instruments

Right-Skewed Examples

Household income
Home prices
CEO salaries
Number of social media followers
City populations
Wait times in emergency rooms

Left-Skewed Examples

Age at death (in developed countries)
Easy exam scores
Time to complete a very easy task
Retirement age
Years of education (most finish high school)

Check Your Understanding

Test your knowledge of distribution shapes!

Question 1:

A dataset has Mean = 45, Median = 50, Mode = 52. What shape is this distribution?

Symmetric
Right-skewed
Left-skewed (Mean < Median < Mode)
Cannot determine from this information

Question 2:

Which distribution shape would you expect for ages of people in a retirement community?

Symmetric
Right-skewed
Left-skewed (most are older, few are younger)
Perfectly uniform

Question 3:

Home prices in a city have Mean = $520k and Median = $380k. What does this tell you?

Prices are symmetric
Prices are right-skewed (a few expensive homes pull mean up)
Prices are left-skewed
There's a data error

Question 4:

For a strongly right-skewed distribution of salaries, which measure of center should you report?

Mean (it uses all data)
Median (resistant to high outliers)
Mode
Either mean or median works equally well

Question 5:

In a symmetric distribution, where is the mean located relative to the median?

Approximately equal to the median
Always greater than the median
Always less than the median
No consistent relationship

Key Takeaways

Symmetric: Mean ≈ Median ≈ Mode. Use mean or median. Examples: heights, IQ scores.
Right-Skewed: Mode < Median < Mean. Use median! Examples: income, home prices.
Left-Skewed: Mean < Median < Mode. Use median! Examples: easy test scores, age at death.
The mean is pulled toward the tail in skewed distributions.
The median is resistant to outliers and represents the typical value better when data is skewed.
Outliers create skewness – investigate them, don't automatically delete!
Real-world tip: Income, wealth, and prices are almost always right-skewed. Use median!

← Lesson 2: Measures of Spread Next: Boxplots & Five-Number Summary →