The data can be 100% accurate — and the conclusion can still be completely wrong. Today we learn how.
Imagine a bar chart showing a company's sales performance. Both versions below use identical numbers.
Both graphs are accurate. But Graph A is designed to mislead. A Data Detective always checks the Y-axis first.
Cherry-picking means selecting only the data that supports your conclusion — and ignoring everything else.
Cherry-picking is especially dangerous because each individual data point is true. You need to ask what's missing.
Survivorship bias happens when we only study the people or things that "survived" a process — missing all those that didn't.
Detective question: "Who is NOT in this dataset — and why might they be missing?"
A framing effect occurs when the same data is presented differently to create a different emotional response.
"This surgery has a 90% survival rate"
vs.
"This surgery has a 10% mortality rate"
"9 out of 10 dentists recommend this toothpaste"
vs.
"1 in 10 dentists does NOT recommend this toothpaste"
Detective question: "What is the full number? Can I restate this statistic a different way?"
A trend appears in separate groups — but completely reverses when the groups are combined. Even trained statisticians find this hard.
"How can Treatment A be better for mild patients AND better for severe patients — but Treatment B appear better overall? Let's see..."
The key is group sizes. When groups are very different in size, combining them distorts the picture. The combined number hides what's really happening inside each group.
Two schools. Which is doing better?
| Student Group | School A pass rate | School B pass rate | Who's better? |
|---|---|---|---|
| Strong students | 90% (90 out of 100) | 85% (17 out of 20) | School A ✓ |
| Struggling students | 30% (6 out of 20) | 20% (20 out of 100) | School A ✓ |
| Overall combined | 80% (96/120) | 31% (37/120) | School A wins... obviously? |
Wait — School A is better in BOTH groups, AND has a higher overall rate. So where's the paradox? Now look at the group sizes: School A has 100 strong + 20 struggling. School B has 20 strong + 100 struggling. School A has MORE easy cases — that inflates its overall rate.
Two medical treatments. Which should you choose?
| Patient Type | Treatment A success | Treatment B success | Who's better? |
|---|---|---|---|
| Mild cases | 81% (81/100) | 87% (234/270) | Treatment B |
| Severe cases | 73% (192/263) | 69% (55/80) | Treatment A ✓ |
| Overall combined | 78% (273/363) | 83% (289/350) | Treatment B looks better! |
Treatment A is better for severe cases. Treatment B is only used on mild (easier) cases more often. The combined number favors B — but for your health, A is the better choice. Always look at subgroups.
Your teacher will describe a scenario. Call out the deception technique: Cherry-Picking, Survivorship Bias, Framing Effect, or Simpson's Paradox!
"A magazine only publishes success stories from people who used their diet plan" · "A graph's Y-axis starts at 94%" · "The drug is 95% safe!" said instead of "5% of users had serious side effects"
Naming the trick is the first step to defeating it!
Each case study contains accurate data — but draws a misleading conclusion. Your job:
⏱ You have 15 minutes in your groups. Then we share out. Remember: the data itself is real — the interpretation is the problem.
Only the 3 best sales months shown. The other 9 were declining. Detective Q: "What data was left out?"
"Successful athletes train 6hrs/day." Missing: the many who trained equally hard and didn't succeed. Detective Q: "Who's NOT in this data?"
Drug A: "20% side effects." Drug B: "80% side-effect free." Same drug. Detective Q: "Can I restate this differently?"
Overall reading scores rose — but both groups dropped. Change in student mix caused it. Detective Q: "What do subgroups show?"
"Which deception technique was hardest for you to spot — and why? What question would a Data Detective ask to catch it?"
✍️ 6 minutes. Use your worksheet — Part 4. Name the specific technique and propose your detective question.
"Accurate data + misleading presentation = misinformation.
Ask what's missing. Look at the subgroups. Check the axis."
Next session: Even honest data has randomness built in. We learn about probability — and why 10 coin flips can look very misleading.