Two things can happen together without one causing the other. Today we learn to tell the difference — and why it matters enormously.
Every summer, two things increase at exactly the same time:
Does eating ice cream cause people to drown? Vote: Yes / No / Complicated
Both of these statements are true. The connection between them is also real. But does one CAUSE the other? And if not — why do they move together?
Two variables are correlated when they tend to change together — both go up, both go down, or one goes up as the other goes down.
Correlation is a description of a pattern. It does NOT tell you why the pattern exists.
Causation means one variable directly produces a change in another. Removing or changing the cause changes the effect.
To prove causation, you need a controlled experiment — change only one variable and keep everything else the same.
A confounding variable is a hidden third factor that causes both correlated variables to change — making them look connected when they aren't.
Ice cream sales ↑ ← HOT WEATHER → Drowning rates ↑
Hot weather causes BOTH — ice cream and drowning are not connected to each other
Detective question: "Is there a third factor that could explain why these two things move together?"
Nicolas Cage movies released per year correlates with swimming pool drownings.
Real connection: none — pure coincidence with small samples.
Per-capita cheese consumption correlates with deaths by bedsheet tangling.
Real connection: none — both happen to have similar patterns in the same data years.
Countries with more TVs per capita have higher life expectancy.
Real connection: national wealth drives both better healthcare AND more TV ownership.
Children with bigger shoe sizes read at a higher level.
Real connection: age — older children have bigger feet AND have had more reading instruction.
For each of the 4 correlations, answer these questions on your worksheet:
⏱ You have 18 minutes. Then we apply the same thinking to more serious real-world examples.
"What was the confounding variable in each case? How did identifying it change your understanding of the data?"
Wrong conclusions from correlations lead to real policy decisions that can harm people. This is why the correlation/causation distinction is one of the most important ideas in data science.
When you argue from data, use Claim → Evidence → Reasoning.
"Ice cream sales do not cause drowning."
"Both ice cream sales and drowning rates peak in July, when temperatures are highest."
"Hot weather causes both increased swimming activity and more ice cream eating. Temperature is the confounding variable — not ice cream."
Thumbs UP for Causation, thumbs DOWN for Correlation only!
"Smoking and lung cancer" · "Shoe size and reading in kids" · "Exercise and lower blood pressure" · "Countries with chocolate consumption and Nobel Prizes" · "Vaccines and reduced disease rates"
For every correlation — ask: "What could be the confounding variable?"
"Choose one of the serious correlations from today. Write a full C-E-R argument explaining why it is a correlation and NOT causation. Name the confounding variable."
✍️ 6 minutes. Use your worksheet — Part 4. Your argument must have all three parts: Claim, Evidence, and Reasoning with a named confounding variable.
"Correlation tells you WHAT. Only a controlled experiment tells you WHY.
Always ask: what's the confounding variable?"
Next session: Data Ethics — privacy, consent, and the genuinely difficult questions about who data belongs to and who benefits from collecting it.