Teacher Cheat Sheet — Session 4: When Data Deceives

Data Science for Young Minds · Grade 5 · Ages 10–11
~60 min Ages 10–11 Session 4 of 8 ND-Friendly
Session Agenda
TimeBlockWhat's Happening
0–5 HookShow a truncated Y-axis graph. "What does this tell us?" vs. the same graph with Y-axis starting at 0.
5–20 Lesson 1–2Cherry-picking · Survivorship bias · Framing effects — each with a real-world example
20–35 Lesson 3Simpson's Paradox — two worked examples with small numbers; think step-by-step
35–52 ActivityData Detectives: 4 case studies — identify the deception technique, find the flaw, reframe honestly
52–58 DebriefStudents write: "Which deception technique was hardest to spot? Why?"
58–60 ClosePreview S5: "Even honest data has randomness in it — next time we learn about probability."
Tone note: Frame this session as empowerment, not cynicism. "You now have the tools to catch this" rather than "data is always lying to you." These tricks fool even trained adults — validate the difficulty.
Materials Needed
Printed Case Study cards (4 cases — see below) Worksheets (1 per student) PencilsHook graph printed or projected (truncated vs. full Y-axis)
Tip: Print case study cards double-sided so groups can reference the "flaw explanation" after they've attempted to identify it themselves.
Key Vocabulary
Cherry-picking — selecting only the data that supports your conclusion, ignoring the rest
Survivorship bias — only studying the "survivors" of a process, missing those who didn't make it
Framing effect — presenting the same data differently to create different impressions
Simpson's paradox — a trend appears in separate groups but reverses when groups are combined
Confirmation bias — the tendency to seek data that confirms what you already believe

Simpson's Paradox — Two Worked Examples for Instructors
Example 1: School Improvement (simpler)
GroupSchool A pass rateSchool B pass rate
Strong students90% (90/100)85% (17/20)
Struggling students30% (6/20)20% (20/100)
Overall80% (96/120)31% (37/120)

School A is better in BOTH groups — yet the headline "School B has 31% pass rate vs. School A's 80%" seems accurate. The paradox: School A has far more struggling students in its mix. When you combine groups of very different sizes, results flip.

Example 2: Hospital Treatment Success
Patient typeTreatment A successTreatment B success
Mild cases81% (81/100)87% (234/270)
Severe cases73% (192/263)69% (55/80)
Combined78% (273/363)83% (289/350)

Treatment A is better for BOTH mild AND severe patients — but overall, Treatment B appears better. Why? Treatment B is used more on mild (easier) cases. The group sizes create a misleading combined total. A hospital administrator relying only on the "83% vs. 78%" would choose the worse treatment!

Teaching the paradox: Work through Example 1 on the board step by step. Ask: "Who is better in the strong student group? Who is better in the struggling group? Now look at the combined number — what happened?" Let students sit with the confusion before explaining. The confusion IS the lesson.

Discussion Questions + Teacher Notes
  • "Is cherry-picking always intentional?"
    → No — and this is important. Confirmation bias means we often cherry-pick unconsciously. We notice data that agrees with us and overlook data that doesn't. Scientists use peer review and pre-registration to combat this.
  • "What is survivorship bias — can you think of a real example?"
    → Classic example: "Successful entrepreneurs all dropped out of college" — we only see the famous successes, not the thousands who dropped out and failed. This is why "success stories" as advice can be dangerous.
  • "In the Simpson's Paradox hospital example — which treatment would you choose for a family member? Why?"
    → Treatment A — because it's better for BOTH mild AND severe cases. The combined statistic is misleading due to different group sizes. This should feel unsettling — it means you MUST look at subgroups, not just totals.
  • "What's the difference between a framing effect and a lie?"
    → Framing uses true numbers but selects presentation to create a desired impression. "90% fat free!" vs. "10% fat." Both true — very different impressions. Not a lie, but deliberately misleading.
Data Detectives — 4 Case Studies
Groups of 3–4. Each group gets all 4 cases. 15 min to identify the deception; then class share-out. Emphasize: the data is technically accurate — the conclusion drawn is wrong.
  1. Case 1 — Cherry-Picking: A company shows sales figures for only the 3 best months of the year and claims "We're growing!" The other 9 months all showed decline. Flaw: selected favorable subset only.
  2. Case 2 — Survivorship Bias: "All the most successful athletes train 6 hours a day — so you should too!" Missing: the thousands who trained 6 hours/day and still didn't succeed. Flaw: only studying the outcomes that survived/succeeded.
  3. Case 3 — Framing Effect: Drug A: "20% of patients experienced side effects." Drug B: "80% of patients had NO side effects." Same drug, same data — different frames. Flaw: same statistic, opposite emotional impact.
  4. Case 4 — Simpson's Paradox: School claims "Our overall reading scores improved from 60% to 65%." But scores for both advanced AND struggling readers dropped individually. The improvement came from a change in the mix of students. Flaw: combined result masks what happened to each group.
Debrief question: "Which trick was hardest to spot? Why do you think that is?"

Opening Hook
Show two bar graphs side by side — identical data, but one has a Y-axis starting at 95%, the other starting at 0%.
"Which graph makes the difference look bigger? Are both graphs accurate?"
→ Both are technically accurate — but the truncated axis makes a tiny difference look dramatic. This is one of the most common tricks in journalism and advertising.
Debrief Writing Prompt
Write on board:
"Which deception technique was hardest for you to spot, and why? What question would a Data Detective ask to catch it?"
6 min writing. Students should name a specific technique and propose a specific "detective question" that exposes it.
Strong response: "Simpson's Paradox was hardest because the combined number looked real. A detective question would be: 'Are these groups the same size? What happens when you look at each group separately?'"
ND-Friendly Tips
  • Simple examples first — Use school performance or sports stats before complex medical/political examples. Familiar contexts reduce cognitive load.
  • Validate the confusion — Say explicitly: "Simpson's Paradox confuses trained statisticians. If it's hard for you, that's because it IS hard." This prevents shutdown.
  • Frame as empowerment — "You now know this trick exists. Most adults don't. That makes you a better thinker." Not: "Data is always trying to deceive you."
  • Case study cards — Physical cards give students something to hold, annotate, and refer back to. Better than a projected slide for extended work time.
  • Allow pair work throughout — The detective activity is designed for groups. Don't require solo work during the case analysis phase.