Learn Without Walls
← Back to Module 2
Module 2, Lesson 3 of 4

25-30 minutes

← Previous Next Lesson →

Lesson 3: Distribution Shapes and Skewness

Learning Objectives

By the end of this lesson, you will be able to:

What is Distribution Shape?

When we plot data on a graph (like a histogram or dot plot), the overall shape of the data tells us important information about the dataset.

Understanding distribution shape helps us:

Distribution Shape: The overall pattern of how data values are spread out. The three most important shapes are:

  • Symmetric - Data is evenly distributed on both sides of the center
  • Right-Skewed - Data has a long tail extending to the right (high values)
  • Left-Skewed - Data has a long tail extending to the left (low values)

Symmetric Distributions

Symmetric Distribution: A distribution where the left and right sides are approximately mirror images of each other.

Symmetric Distribution

* *** ***** ******* ********* *********** ************* ━━━━━━━━━━━━━━━ Mean/ Median/Mode (all equal)

Notice: Perfectly balanced on both sides

Relationship of Measures in Symmetric Distributions:

Mean ≈ Median ≈ Mode

When data is symmetric, all three measures of center are approximately equal and located at the center of the distribution.

Example 1: Symmetric Data

Heights of adult women (inches):

60, 62, 63, 64, 64, 65, 65, 65, 66, 66, 67, 68, 70

Calculate the measures:

  • Mean: 64.92 inches
  • Median: 65 inches (middle value)
  • Mode: 65 inches (appears 3 times)

All three measures are very close! (64.92 ≈ 65 ≈ 65)

Heights tend to be symmetric because most people cluster around average height, with fewer people at extreme heights (very short or very tall).

When to Expect Symmetric Distributions:

  • Human measurements: Heights, weights (in homogeneous groups)
  • Test scores: When tests are well-designed
  • Manufacturing data: Product dimensions with quality control
  • Measurement errors: Random errors in scientific measurements

Which Measure of Center to Use?

For symmetric distributions: Mean or Median – both work equally well!

The mean is preferred because it uses all data values, but the median is equally representative when data is symmetric.

Right-Skewed Distributions (Positive Skew)

Right-Skewed Distribution: A distribution with a long tail extending to the right. Most values are concentrated on the left, with a few unusually high values pulling the tail to the right.

Also called: Positively skewed (because the tail goes toward positive/high numbers)

Right-Skewed Distribution

* *** ***** ******* ********* ******* *** * * ← Long tail * to the right ━━━━━━━━━━━━━━━ Mode Median Mean ↑ ↑ ↑

Notice: Tail extends to the right

Relationship of Measures in Right-Skewed Distributions:

Mode < Median < Mean

The mean gets "pulled" toward the tail by high outliers. The median stays near the bulk of the data. The mode is where most values cluster (left side).

Example 2: Right-Skewed Data (Household Income)

Annual household incomes in a neighborhood (thousands):

$35k, $40k, $42k, $45k, $48k, $50k, $52k, $55k, $60k, $180k

Calculate the measures:

  • Mode: No clear mode (or multiple modes ~$40-50k range)
  • Median: (48 + 50) / 2 = $49k
  • Mean: 607k / 10 = $60.7k

Mean > Median! The $180k income pulled the mean up, but the median stays at $49k.

Which is more representative? The median ($49k) better represents a "typical" household. Nine out of ten households earn ≤$60k. The mean ($60.7k) is inflated by one high earner.

Common Right-Skewed Distributions:

  • Income and wealth: Most people earn moderate amounts, few earn very high amounts
  • Home prices: Most homes in affordable range, few mansions
  • Company sizes: Many small businesses, few giant corporations
  • Response times: Most quick, occasional very slow responses
  • Ages at death (in developed countries): Most live to old age, few die young

Which Measure of Center to Use?

For right-skewed distributions: Use the MEDIAN!

The mean is misleading because it's pulled way up by outliers. The median better represents the "typical" value.

Example: Median household income is more meaningful than mean household income because of wealthy outliers.

Left-Skewed Distributions (Negative Skew)

Left-Skewed Distribution: A distribution with a long tail extending to the left. Most values are concentrated on the right, with a few unusually low values pulling the tail to the left.

Also called: Negatively skewed (because the tail goes toward negative/low numbers)

Left-Skewed Distribution

* *** ***** ******* ********* ******* *** * * ← Long tail * to the left ━━━━━━━━━━━━━━━ Mean Median Mode ↑ ↑ ↑

Notice: Tail extends to the left

Relationship of Measures in Left-Skewed Distributions:

Mean < Median < Mode

The mean gets "pulled" toward the tail by low outliers. The median stays near the bulk of the data. The mode is where most values cluster (right side).

Example 3: Left-Skewed Data (Test Scores)

Scores on an easy test (out of 100):

98, 97, 96, 95, 94, 93, 92, 91, 88, 85, 82, 60, 55

Calculate the measures:

  • Mode: No single mode (most scores 90-98)
  • Median: 92 (middle value of 13 scores)
  • Mean: 1146 / 13 = 88.15

Mean < Median! The two very low scores (55, 60) pulled the mean down.

Most students scored in the 90s (the test was easy!), but a couple of students struggled and scored much lower, creating a left tail.

Common Left-Skewed Distributions:

  • Easy test scores: Most students score high, few score low
  • Age at retirement: Most retire ~65, few retire much earlier
  • Product lifespans with warranties: Most products last a long time, few fail early
  • Restaurant bills: Most moderate, occasional very cheap meals

Note: Left-skewed distributions are less common in real life than right-skewed!

Which Measure of Center to Use?

For left-skewed distributions: Use the MEDIAN!

The mean is pulled down by low outliers and doesn't represent the typical value well.

Summary: Comparing Distribution Shapes

Distribution Shape Tail Location Measure Relationship Best Center Measure Common Examples
Symmetric No tail (balanced) Mean ≈ Median ≈ Mode Mean or Median Heights, well-designed tests
Right-Skewed Long tail to right Mode < Median < Mean Median Income, home prices
Left-Skewed Long tail to left Mean < Median < Mode Median Easy test scores, retirement age

Quick Rule for Choosing Center Measure:

Symmetric data? → Use mean (or median)
Skewed data? → Use median (resistant to outliers)
Categorical data? → Use mode (only option!)

Identifying Outliers

Outlier: An observation that is unusually far from the bulk of the data. Outliers can significantly affect the mean and create skewness.

Outliers often create the "tails" in skewed distributions. They're important to identify because they:

Example 4: Spotting Outliers

Daily coffee sales at a café:

$850, $920, $880, $900, $910, $870, $3200

Analysis:

  • Six days: Sales in $850-920 range (pretty consistent)
  • One day: $3,200 (way higher!)

Question: What happened on that day?

  • Maybe they catered a big event?
  • Maybe it's a data entry error ($320 entered as $3200)?
  • Either way, $3,200 is an outlier

Impact:

  • Mean with outlier: $1,218.57
  • Median with outlier: $900
  • The median ($900) better represents "typical daily sales"

What to Do with Outliers?

DON'T automatically delete them! Investigate first:

  1. Data error? Fix or remove it
  2. Genuine unusual observation? Keep it, but report it separately
  3. Different population? Consider analyzing separately

Best practice: Report measures both with and without outliers to show their impact.

Real-World Examples by Shape

Symmetric Examples

  • Heights of adults (same gender)
  • IQ scores (by design)
  • SAT/ACT scores (scaled to be symmetric)
  • Blood pressure in healthy adults
  • Measurement errors in scientific instruments

Right-Skewed Examples

  • Household income
  • Home prices
  • CEO salaries
  • Number of social media followers
  • City populations
  • Wait times in emergency rooms

Left-Skewed Examples

  • Age at death (in developed countries)
  • Easy exam scores
  • Time to complete a very easy task
  • Retirement age
  • Years of education (most finish high school)

Check Your Understanding

Test your knowledge of distribution shapes!

Question 1:

A dataset has Mean = 45, Median = 50, Mode = 52. What shape is this distribution?

  • Symmetric
  • Right-skewed
  • Left-skewed (Mean < Median < Mode)
  • Cannot determine from this information

Question 2:

Which distribution shape would you expect for ages of people in a retirement community?

  • Symmetric
  • Right-skewed
  • Left-skewed (most are older, few are younger)
  • Perfectly uniform

Question 3:

Home prices in a city have Mean = $520k and Median = $380k. What does this tell you?

  • Prices are symmetric
  • Prices are right-skewed (a few expensive homes pull mean up)
  • Prices are left-skewed
  • There's a data error

Question 4:

For a strongly right-skewed distribution of salaries, which measure of center should you report?

  • Mean (it uses all data)
  • Median (resistant to high outliers)
  • Mode
  • Either mean or median works equally well

Question 5:

In a symmetric distribution, where is the mean located relative to the median?

  • Approximately equal to the median
  • Always greater than the median
  • Always less than the median
  • No consistent relationship

Key Takeaways

  • Symmetric: Mean ≈ Median ≈ Mode. Use mean or median. Examples: heights, IQ scores.
  • Right-Skewed: Mode < Median < Mean. Use median! Examples: income, home prices.
  • Left-Skewed: Mean < Median < Mode. Use median! Examples: easy test scores, age at death.
  • The mean is pulled toward the tail in skewed distributions.
  • The median is resistant to outliers and represents the typical value better when data is skewed.
  • Outliers create skewness – investigate them, don't automatically delete!
  • Real-world tip: Income, wealth, and prices are almost always right-skewed. Use median!
← Lesson 2: Measures of Spread Next: Boxplots & Five-Number Summary →