Learn Without Walls
← Back to Machine Learning Basics
Module 8 of 8 — Machine Learning Basics

🏆 Capstone Project

Build Your Own ML Pipeline — from raw data to interpreted results

← Module 7: Evaluating Models Module 8 of 8 — Final!
⏳ Loading Python… (first load ~15 seconds)

🎉 You made it to the Capstone!

You’ve covered all the fundamentals. This final project puts everything together: load data → explore → split → train 3 models → evaluate → pick the best → predict. No hand-holding this time — you’ve got this.

📌 Before You Start

Estimated time: ~60 minutes

What you’ll do: Complete a full ML pipeline independently, compare three models, justify your choice, and make final predictions. A collapsible sample solution is provided at the end.

💡 The Project Goal

Use everything you’ve learned to build a complete, end-to-end machine learning pipeline on the Iris dataset. You will make real decisions: which k to use for KNN, which depth for your Decision Tree, and which model to select as your final answer.

This mirrors what real ML practitioners do on every project — just with more data and more features. The workflow never changes.

Your deliverables: A working pipeline with three trained models, a comparison table of their results, a written justification for your model selection, and 3 new predictions.

🧰 The Pipeline Steps

▶ Step 1

Load & Explore the Data

Load the Iris dataset. Print the shape, class names, class distribution, and feature value ranges (min/max) for all 4 features. Confirm there are no missing values.

▶ Step 2

Split Into Train and Test

Use an 80/20 train/test split with random_state=42. Print the number of samples in each set.

▶ Step 3

Train Three Models

  • Model A: KNN with k=5
  • Model B: Decision Tree with max_depth=3
  • Model C: A third model of your choice. Options: KNN with a different k, Decision Tree with a different depth, or try GaussianNB from sklearn.naive_bayes.
▶ Step 4

Evaluate All Three

For each model, print: accuracy, and the classification report (precision, recall, F1 per class). Print a comparison table.

▶ Step 5

Select the Best Model & Explain Why

Look at your results. Which model would you deploy? Add a print statement explaining your reasoning. Consider: accuracy, consistency across classes, and simplicity.

▶ Step 6

Make 3 New Predictions

Using your chosen best model, predict the species for these 3 new flower measurements:

  • Flower 1: [5.0, 3.4, 1.5, 0.2]
  • Flower 2: [6.7, 3.0, 5.2, 2.3]
  • Flower 3: [5.9, 3.0, 4.2, 1.5]

Print which species each flower is predicted to be.

👋 Your Turn — Complete the Pipeline

The scaffold below has # YOUR CODE HERE comments where you need to fill in. Work through each step. You can run at any point to check your progress.

Output will appear here after you click Run… (~15 seconds first run)
💡 Stuck on a step? Look back at the module that taught it: Step 1 → Module 2, Step 2 → Module 3, Steps 3–4 → Modules 4–5 & 7, Step 6 → Module 4. Read the scaffolding comments carefully — most lines just need to be uncommented and completed.
🔒 View Sample Solution (try it yourself first!)
import micropip await micropip.install(['scikit-learn']) from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import classification_report, accuracy_score import numpy as np print("=" * 55) print(" MACHINE LEARNING CAPSTONE PROJECT") print(" Iris Species Classification Pipeline") print("=" * 55) # STEP 1: Load & Explore print("\n📊 STEP 1: LOAD & EXPLORE\n") iris = load_iris() X, y = iris.data, iris.target print(f"Shape: {X.shape[0]} samples × {X.shape[1]} features") print(f"Classes: {list(iris.target_names)}") from collections import Counter for species, count in zip(iris.target_names, [sum(y==i) for i in range(3)]): print(f" {species}: {count} samples") print("\nFeature ranges:") for i, name in enumerate(iris.feature_names): print(f" {name}: [{X[:,i].min():.1f}, {X[:,i].max():.1f}]") # STEP 2: Split print("\n✂️ STEP 2: TRAIN/TEST SPLIT\n") X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) print(f"Training: {len(X_train)} samples") print(f"Test: {len(X_test)} samples") # STEP 3: Train Three Models print("\n🤖 STEP 3: TRAINING THREE MODELS\n") model_a = KNeighborsClassifier(n_neighbors=5) model_a.fit(X_train, y_train) model_b = DecisionTreeClassifier(max_depth=3, random_state=42) model_b.fit(X_train, y_train) model_c = KNeighborsClassifier(n_neighbors=7) model_c.fit(X_train, y_train) print("All 3 models trained!") # STEP 4: Evaluate print("\n📈 STEP 4: EVALUATION RESULTS\n") models = [ ("Model A — KNN (k=5)", model_a), ("Model B — Decision Tree (d=3)", model_b), ("Model C — KNN (k=7)", model_c), ] best_acc = 0 best_model = None best_name = "" print(f"{'Model':<35} {'Accuracy':>10}") print("-" * 47) for name, model in models: y_pred = model.predict(X_test) acc = accuracy_score(y_test, y_pred) print(f"{name:<35} {acc:>10.1%}") if acc > best_acc: best_acc = acc best_model = model best_name = name # STEP 5: Best Model print(f"\n🏆 STEP 5: BEST MODEL SELECTION\n") print(f"Selected: {best_name} ({best_acc:.1%} accuracy)") print("Reasoning: highest accuracy on test set.") print("Decision Trees are also interpretable — useful for explaining results.") # STEP 6: New Predictions print("\n🔮 STEP 6: NEW PREDICTIONS\n") new_flowers = np.array([ [5.0, 3.4, 1.5, 0.2], [6.7, 3.0, 5.2, 2.3], [5.9, 3.0, 4.2, 1.5], ]) for i, flower in enumerate(new_flowers): pred = best_model.predict([flower])[0] print(f"Flower {i+1} {flower} → {iris.target_names[pred]}") print("\n✅ Pipeline complete!")

✅ Key Takeaways from the Full Course

🚀 What’s Next?

💻

Python Practice Labs

10 hands-on Python labs with live code. Reinforce the Python fundamentals that power every ML project you build.

📊

Data Analyst Course

Apply ML and data science skills in a career-track context. SQL, Python, Tableau, Power BI, and real business analysis.

📈

Introduction to Statistics

The math behind ML. Distributions, hypothesis testing, correlation, and regression — understand why the algorithms work.

🏆 Machine Learning Basics — Complete!

You’ve finished all 8 modules. You understand what ML is, how it works, and you’ve trained real models in your browser. That’s something most people never do.

Share your achievement, explore the next courses above, or revisit any module to deepen your understanding.

← Return to Course Home

← Module 7: Evaluating Models Module 8 of 8 — Course Complete!