Learn Without Walls
← Back to Machine Learning Basics
Module 5 of 8 — Machine Learning Basics

Decision Trees

Yes/no questions, branching logic, and the danger of memorizing your training data

← Module 4: KNN Module 5 of 8 Module 6: Linear Regression →
⏳ Loading Python… (first load ~15 seconds)

📌 Before You Start

Estimated time: ~55 minutes

What you’ll learn: How decision trees split data, what Gini impurity means, how tree depth controls complexity, and how to recognize overfitting.

💡 The Big Idea

A decision tree asks a series of yes/no questions about your features, splitting the data at each step, until it arrives at a prediction.

It’s like the game 20 Questions, but the computer chooses which questions are most useful. “Is petal length < 2.5 cm?” If yes → almost certainly setosa. If no → ask another question.

Decision trees are hugely popular because they’re interpretable — you can actually read the rules the model learned. They also require no feature scaling (unlike KNN).

But they have one big weakness: a deep tree will memorize the training data instead of learning generalizable patterns. That’s overfitting — and controlling tree depth is how we fight it.

🧠 How It Works

How the Tree Chooses Its Questions

At each node, the algorithm tries every possible split on every feature and picks the one that creates the purest child groups. “Purity” means: one class dominates the group.

The most common measure of impurity is Gini impurity. A node with all one class has Gini = 0 (perfectly pure). A node with equal mix has Gini = 0.5 (maximally impure). The tree always picks the split that reduces Gini the most.

An Example Split

Root: All 150 flowers │ ├─── petal_length ≤ 2.45? YES → 50 setosa (Gini = 0.0) ✅ Pure! │ └─── petal_length ≤ 2.45? NO → 100 mixed (versicolor + virginica) │ ├─── petal_width ≤ 1.75? YES → ~49 versicolor │ └─── petal_width ≤ 1.75? NO → ~45 virginica

Just 2 questions correctly classifies most of the 150 flowers. That’s the power of decision trees.

Depth and Overfitting

⚠️ Overfitting: When training accuracy = 100% but test accuracy is much lower, your model has memorized the training data rather than learning generalizable patterns. A depth-unlimited tree will always overfit.

Max Depth Train Accuracy Test Accuracy Verdict
1~67%~67%Underfit (too simple)
3~98%~97%Good balance
None100%~93%Overfit (memorized)

▶️ See It In Code

Training decision trees with different depths and reading the actual decision rules.

import micropip await micropip.install(['scikit-learn']) from sklearn.tree import DecisionTreeClassifier, export_text from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Compare trees at different depths print("=== Train vs Test Accuracy by Depth ===\n") print(f"{'Depth':>10} {'Train Acc':>10} {'Test Acc':>10} {'Gap':>8}") print("-" * 44) for depth in [1, 2, 3, 4, None]: dt = DecisionTreeClassifier(max_depth=depth, random_state=42) dt.fit(X_train, y_train) train_acc = dt.score(X_train, y_train) test_acc = dt.score(X_test, y_test) gap = train_acc - test_acc depth_str = str(depth) if depth else "unlimited" print(f"{depth_str:>10} {train_acc:>10.1%} {test_acc:>10.1%} {gap:>8.1%}") # Show the learned rules for depth=2 print("\n=== Decision Rules Learned (depth=2) ===\n") dt2 = DecisionTreeClassifier(max_depth=2, random_state=42) dt2.fit(X_train, y_train) rules = export_text(dt2, feature_names=list(iris.feature_names)) print(rules) # Feature importance print("=== Feature Importance ===") for name, imp in zip(iris.feature_names, dt2.feature_importances_): bar = "█" * int(imp * 30) print(f" {name:<20} {imp:.3f} {bar}")

👋 Your Turn

Run the code below. When depth is set to None (unlimited), the training accuracy reaches 100%. Your task: explain in the comment why 100% training accuracy is not a good thing, and find the depth that gives the smallest gap between train and test accuracy.

Output will appear here after you click Run… (~10 seconds first run)
💡 Hint: 100% training accuracy with a gap on test data means the model memorized the training set rather than learning general rules. For the bonus loop, use for depth in range(1, 11): and track which depth minimizes abs(train_acc - test_acc).

☕ Brain Break — 2 Minutes

Imagine two students studying for an exam:

The real exam has new problems. Who does better?

Student A is our overfit model. Student B is our well-generalized model.

Overfitting in ML is exactly this: a model that’s so tuned to the training examples that it fails on anything new. Controlling tree depth is one way to force the model to learn real patterns, not just memorize.

✅ Key Takeaways

🎉 Module 5 Complete!

You now understand classification trees and overfitting. Next, we switch from predicting categories to predicting numbers with linear regression.

Continue to Module 6: Linear Regression →

← Module 4: KNN Module 5 of 8 Module 6: Linear Regression →