Learn Without Walls
← Back to Machine Learning Basics
Module 4 of 8 — Machine Learning Basics

Your First Classifier — K-Nearest Neighbors

Find the closest examples, take a vote, make a prediction

← Module 3: Preparing Data Module 4 of 8 Module 5: Decision Trees →
⏳ Loading Python… (first load ~15 seconds)

📚 This module uses scikit-learn. The first time you run a code block, it will install scikit-learn via micropip (~5–10 seconds). Be patient — subsequent runs are instant!

📌 Before You Start

Estimated time: ~55 minutes

What you’ll learn: How K-Nearest Neighbors works, how to use scikit-learn’s API (fit → predict → score), and how the choice of k affects results.

💡 The Big Idea

KNN asks: “What do the k most similar examples in my training set look like?” Then it takes a majority vote among those neighbors.

No complicated math. No training phase. When you call .predict() on a new flower, KNN just finds the 3 (or 5, or k) closest flowers in the training set and says “it’s probably the same species as most of them.”

It’s like asking your k nearest neighbors what they think — and going with the majority.

Simple, intuitive, and surprisingly effective for many problems. And it’s a perfect starting point for understanding the scikit-learn API that all ML algorithms share.

🧠 How It Works

KNN Step by Step

1
Store all training examples — KNN doesn’t really “train”; it just memorizes. This is why it’s called a lazy learner.
2
Receive a new sample to classify (e.g., a flower with petal_length=1.5, petal_width=0.3).
3
Calculate distance from the new sample to every training sample. Usually Euclidean distance: d = √((x&sub1;−x&sub2;)² + (y&sub1;−y&sub2;)² + …)
4
Find the k nearest neighbors — the k training samples with the smallest distance.
5
Vote — whichever class appears most among the k neighbors wins. That’s the prediction.

Why k Matters

There’s no universal “best k.” You find it by experimenting — exactly what your exercise will do.

The sklearn API (same for every algorithm!)

# 1. Import the algorithm from sklearn.neighbors import KNeighborsClassifier # 2. Create the model (set hyperparameters) model = KNeighborsClassifier(n_neighbors=3) # 3. Fit (train) on training data model.fit(X_train, y_train) # 4. Predict on new data predictions = model.predict(X_test) # 5. Score (accuracy on test set) accuracy = model.score(X_test, y_test)

This 5-step pattern works for every sklearn classifier: decision trees, random forests, SVMs. Learn it once, use it everywhere.

▶️ See It In Code

A complete KNN pipeline on the Iris dataset. This will install scikit-learn — first run takes 10–20 seconds.

⏳ First run will install scikit-learn (~10 seconds). Watch the output area for progress.
import micropip await micropip.install(['scikit-learn']) from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris import numpy as np print("Loading Iris dataset...") iris = load_iris() X, y = iris.data, iris.target print(f"Dataset shape: {X.shape[0]} samples × {X.shape[1]} features") print(f"Classes: {list(iris.target_names)}") print(f"Class counts: {[sum(y==i) for i in range(3)]}") # Split into train and test (80/20) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) print(f"\nTraining samples: {len(X_train)}, Test samples: {len(X_test)}") # Train KNN with k=3 knn = KNeighborsClassifier(n_neighbors=3) knn.fit(X_train, y_train) accuracy = knn.score(X_test, y_test) print(f"\nKNN (k=3) Test Accuracy: {accuracy:.1%}") # Make predictions on all test samples y_pred = knn.predict(X_test) correct = sum(y_pred == y_test) print(f"Correct predictions: {correct}/{len(y_test)}") # Predict a single new flower sample = np.array([[5.1, 3.5, 1.4, 0.2]]) # typical setosa pred = knn.predict(sample)[0] prob = knn.predict_proba(sample)[0] print(f"\nNew flower {sample[0].tolist()} →") print(f" Predicted: {iris.target_names[pred]}") print(f" Confidence: {max(prob):.1%}")

👋 Your Turn

Run the code below and find out which value of k gives the best accuracy on the Iris test set. Try k = 1, 5, and 10. Record the results and explain what you observe.

Output will appear here after you click Run… (first run installs scikit-learn, ~10 seconds)
💡 Hint: Add more values to k_values like [1, 3, 5, 7, 10, 15, 20]. Notice how accuracy changes. Which tends to be better — very small k or very large k?

☕ Brain Break — 2 Minutes

You’re new to a city and trying to decide where to eat. You ask your 3 nearest neighbors:

Majority vote: 2 pizza vs 1 sushi → you go for pizza. That’s exactly KNN.

Now imagine asking 100 neighbors. Some live far away and don’t even know the pizza place exists. Their votes might not be helpful. This is why k matters — too many neighbors can drown out the signal.

The right number of “neighbors” to consult is almost always somewhere in the middle.

✅ Key Takeaways

🎉 Module 4 Complete!

You’ve trained your first real ML model! Next, we’ll explore a completely different approach — one that makes decisions by asking yes/no questions.

Continue to Module 5: Decision Trees →

← Module 3: Preparing Data Module 4 of 8 Module 5: Decision Trees →