🏆 Capstone Project
Build Your Own ML Pipeline — from raw data to interpreted results
📌 Before You Start
- All 7 previous modules completed
- You are comfortable with KNN, Decision Trees, and classification metrics
Estimated time: ~60 minutes
What you’ll do: Complete a full ML pipeline independently, compare three models, justify your choice, and make final predictions. A collapsible sample solution is provided at the end.
💡 The Project Goal
Use everything you’ve learned to build a complete, end-to-end machine learning pipeline on the Iris dataset. You will make real decisions: which k to use for KNN, which depth for your Decision Tree, and which model to select as your final answer.
This mirrors what real ML practitioners do on every project — just with more data and more features. The workflow never changes.
Your deliverables: A working pipeline with three trained models, a comparison table of their results, a written justification for your model selection, and 3 new predictions.
🧰 The Pipeline Steps
Load & Explore the Data
Load the Iris dataset. Print the shape, class names, class distribution, and feature value ranges (min/max) for all 4 features. Confirm there are no missing values.
Split Into Train and Test
Use an 80/20 train/test split with random_state=42. Print the number of samples in each set.
Train Three Models
- Model A: KNN with k=5
- Model B: Decision Tree with max_depth=3
- Model C: A third model of your choice. Options: KNN with a different k, Decision Tree with a different depth, or try
GaussianNBfromsklearn.naive_bayes.
Evaluate All Three
For each model, print: accuracy, and the classification report (precision, recall, F1 per class). Print a comparison table.
Select the Best Model & Explain Why
Look at your results. Which model would you deploy? Add a print statement explaining your reasoning. Consider: accuracy, consistency across classes, and simplicity.
Make 3 New Predictions
Using your chosen best model, predict the species for these 3 new flower measurements:
- Flower 1:
[5.0, 3.4, 1.5, 0.2] - Flower 2:
[6.7, 3.0, 5.2, 2.3] - Flower 3:
[5.9, 3.0, 4.2, 1.5]
Print which species each flower is predicted to be.
👋 Your Turn — Complete the Pipeline
The scaffold below has # YOUR CODE HERE comments where you need to fill in. Work through each step. You can run at any point to check your progress.
🔒 View Sample Solution (try it yourself first!)
✅ Key Takeaways from the Full Course
- ML = learned patterns, not explicit rules. Show the model examples; it finds the patterns.
- Always explore your data before modeling. Shape, class balance, missing values, ranges.
- Preprocess before training: fill missing values, encode categories, scale features, split into train/test.
- The sklearn API is always: import → create → fit → predict → evaluate. It works the same for every algorithm.
- Don’t trust accuracy alone. Use precision, recall, and F1 — especially when classes are imbalanced.
- Overfitting is real. Watch for a large gap between train and test accuracy, and control it with regularization (depth, k).
- ML is a cycle, not a finish line. Data, model, evaluate, improve, repeat.
🚀 What’s Next?
🏆 Machine Learning Basics — Complete!
You’ve finished all 8 modules. You understand what ML is, how it works, and you’ve trained real models in your browser. That’s something most people never do.
Share your achievement, explore the next courses above, or revisit any module to deepen your understanding.