Save or print this lesson:

Type I and Type II Errors & Power

Understand the two types of errors in hypothesis testing and the concept of statistical power

Lesson Objectives

By the end of this lesson, you will be able to:

Define Type I and Type II errors
Explain the relationship between α and β
Understand the concept of statistical power
Identify factors that affect power
Recognize real-world consequences of different types of errors

1. The Four Possible Outcomes

When we conduct a hypothesis test, we make a decision: either reject H₀ or fail to reject H₀. But we don't know the true state of reality—H₀ might actually be true or false. This creates four possible outcomes:

Our Decision	Reality (Unknown to Us)
Our Decision	H₀ is Actually True	H₀ is Actually False
Reject H₀	Type I Error (False Positive) Probability = α	Correct Decision (True Positive) Probability = 1 - β (Power)
Fail to Reject H₀	Correct Decision (True Negative) Probability = 1 - α	Type II Error (False Negative) Probability = β

Two of these outcomes are correct decisions, and two are errors.

2. Type I Error (α)

Definition: Type I Error

A Type I error occurs when we reject a true null hypothesis. This is also called a false positive.

The probability of making a Type I error is denoted by α (alpha), which is the significance level we choose.

Example 1: Type I Error in Medicine

Medical test scenario:

H₀: Patient does not have the disease
Hₐ: Patient has the disease

Type I Error: The test says the patient HAS the disease (reject H₀), but the patient is actually healthy (H₀ was true).

Consequence: Unnecessary treatment, anxiety, additional testing, and medical costs for a healthy person.

Example 2: Type I Error in Criminal Justice

H₀: Defendant is innocent
Hₐ: Defendant is guilty

Type I Error: Convict an innocent person (reject H₀ when it's true).

Consequence: An innocent person goes to jail—a very serious error!

This is why the criminal justice system uses a very low α ("beyond reasonable doubt").

Controlling Type I Error:

We directly control the Type I error rate by choosing α. Common choices:

α = 0.05: Accept 5% chance of Type I error (standard in most research)
α = 0.01: Accept only 1% chance (more conservative, fewer false positives)
α = 0.10: Accept 10% chance (less conservative, more false positives)

3. Type II Error (β)

Definition: Type II Error

A Type II error occurs when we fail to reject a false null hypothesis. This is also called a false negative.

The probability of making a Type II error is denoted by β (beta).

Example 3: Type II Error in Medicine

Medical test scenario:

H₀: Patient does not have the disease
Hₐ: Patient has the disease

Type II Error: The test says the patient is healthy (fail to reject H₀), but the patient actually HAS the disease (H₀ was false).

Consequence: Disease goes untreated, potentially leading to serious health complications or death.

Example 4: Type II Error in Drug Development

H₀: New drug is no better than existing treatment
Hₐ: New drug is better than existing treatment

Type II Error: Conclude the new drug doesn't work (fail to reject H₀), when it actually is effective (H₀ is false).

Consequence: An effective treatment is rejected and never becomes available to patients.

Important: Unlike α, we don't directly choose β. The value of β depends on several factors including sample size, effect size, and α. However, we can reduce β by increasing sample size or using better research designs.

4. The Relationship Between α and β

Type I and Type II errors are inversely related:

The Tradeoff:

If you decrease α (reduce Type I error), you increase β (increase Type II error)
If you increase α (increase Type I error), you decrease β (reduce Type II error)

Think of it like adjusting the sensitivity of a test:

Very strict test (low α): Fewer false alarms, but might miss real effects
Lenient test (high α): Catches more real effects, but also more false alarms

Which Error is Worse?

The answer depends on context and consequences:

Scenario	Worse Error	Strategy
Criminal trial	Type I (convict innocent)	Use very low α ("beyond reasonable doubt")
Cancer screening	Type II (miss cancer)	Accept higher α to catch more cases
Quality control (safety)	Type II (miss defect)	Strict testing to avoid missing defects
Scientific research	Type I (false discovery)	Standard α = 0.05 to control false claims

Example 5: Choosing α Based on Consequences

Scenario A: Airport Security

H₀: Passenger is not a threat
Type I Error: Flag innocent passenger (inconvenience)
Type II Error: Miss actual threat (catastrophic)
Decision: Use higher α to minimize Type II error (better safe than sorry)

Scenario B: Spam Filter

H₀: Email is legitimate
Type I Error: Mark legitimate email as spam (might miss important message)
Type II Error: Let spam through (minor annoyance)
Decision: Use lower α to minimize Type I error (don't want to lose important emails)

5. Statistical Power (1 - β)

Definition: Power

Statistical power is the probability of correctly rejecting a false null hypothesis. It's calculated as:

Power = 1 - β

Power represents the test's ability to detect a real effect when one exists. Higher power is better!

What does power tell us?

Power = 0.80 (80%): If there really is an effect, we have an 80% chance of detecting it
Power = 0.50 (50%): Only a coin flip's chance of detecting a real effect—not good!
Power = 0.95 (95%): Very high chance of detecting a real effect—excellent!

Recommended Power: Most researchers aim for power ≥ 0.80 (80% or higher). This means accepting up to a 20% chance of a Type II error (β = 0.20).

Factors That Affect Power

Four main factors influence statistical power:

Factor	Increase This...	Effect on Power
Sample Size (n)	Larger sample	Power increases
Significance Level (α)	Higher α (e.g., 0.10 vs 0.05)	Power increases (but more Type I errors)
Effect Size	Larger difference from H₀	Power increases (easier to detect big effects)
Variability (σ)	Lower variability	Power increases (less noise in data)

Most Practical Way to Increase Power: Increase sample size!

We can't always control effect size or population variability, but we can often collect more data. Doubling the sample size substantially increases power.

Check Your Understanding

Question 1: A pharmaceutical company tests a new drug. What would be a Type I error and a Type II error in this scenario?

H₀: The drug is not effective
Hₐ: The drug is effective

Type I Error (α): Conclude the drug is effective (reject H₀) when it actually doesn't work (H₀ true).

Consequence: Ineffective drug goes to market, patients pay for treatment that doesn't help.

Type II Error (β): Conclude the drug doesn't work (fail to reject H₀) when it actually is effective (H₀ false).

Consequence: Effective treatment never reaches patients who could benefit.

Question 2: A study has α = 0.05 and power = 0.75. What are the probabilities of Type I error, Type II error, and correctly detecting a real effect?

Type I Error (α) = 0.05 (5%)

Type II Error (β) = 1 - Power = 1 - 0.75 = 0.25 (25%)

Correctly detecting real effect (Power) = 0.75 (75%)

This means if the null hypothesis is false (there really is an effect), we have a 75% chance of correctly rejecting it, and a 25% chance of missing it.

Question 3: A researcher wants to increase the power of their study from 0.70 to 0.85. What are three ways they could do this?

Three ways to increase power:

Increase sample size: Collect data from more participants (most common approach)
Increase α: Use α = 0.10 instead of 0.05 (though this increases Type I error risk)
Reduce variability: Use more precise measurement tools or more homogeneous sample

Best approach: Usually increasing sample size, as it doesn't involve the tradeoffs of the other methods.

Summary

Type I Error (α): Rejecting a true null hypothesis (false positive)
Type II Error (β): Failing to reject a false null hypothesis (false negative)
α and β are inversely related—decreasing one increases the other
Power = 1 - β: The probability of correctly detecting a real effect
We aim for power ≥ 0.80 (at least 80%)
The most practical way to increase power is to increase sample size
Which error is worse depends on the real-world consequences of each type

← Previous: Intro to Testing Next: Tests for Means →