Learn Without Walls
← Back to Phase 4
Phase 4 — Capstone & Career
Module 9 of 14

Capstone + Portfolio + Career Prep

No new concepts. Just you, real data, and the tools you have spent weeks learning. Build your proof.

~25 minutes
📌 Before You Start

What you need: Google Sheets, Tableau Public (or Google Colab), and a free GitHub account at github.com. Create your GitHub account now if you have not already — it takes 2 minutes and is completely free.

What you’ll do: This module is different from every other one. There is no concept to read and then apply. You are going to use everything you have learned to build something real — your first portfolio project. Step by step.

There is no wrong way to do this. There is only doing it.

💡 The Concept

A portfolio is proof.

Anyone can list “SQL” on a resume. Not everyone can show a cleaned dataset, a documented analysis, and a live visualization. Your portfolio is what converts “trust me” into “here is evidence.”

GitHub is where data professionals store and share their work. It is free, widely recognized by employers, and searchable. Your GitHub profile is essentially a second resume — one that shows the work, not just the claims.

Your capstone project has four parts:

  1. Pick a free public dataset about something you genuinely care about
  2. Clean it and document what you found and fixed
  3. Answer one specific question using your analysis
  4. Publish the cleaned data, your analysis, and one visualization on GitHub

One question. One visualization. One clear answer. That is all a first project needs. Quality over quantity.

🔗 Why It Matters

Entry-level data analyst roles are competitive. Most applicants only have coursework to show. A portfolio with one real project — even a simple one — sets you apart from the majority of candidates who do not have one.

When an interviewer asks “Tell me about a time you worked with data,” you will say “I cleaned a dataset on [topic], asked a specific question, built a dashboard, and published it on GitHub. Here’s the link.” That changes everything.

🖐️ Practice

Follow each step. Take your time. This module is worth lingering on.

1
Go to kaggle.com/datasets — a free library of thousands of public datasets. Browse for a topic you are genuinely curious about. Health, sports, food, education, movies, city data, transit, music — anything. Pick one that has fewer than 10,000 rows to keep this manageable.
World Happiness
Netflix Shows
Coffee Ratings
Video Games
NBA Stats
Air Quality
Spotify Songs
Bike Sharing
2
Download the dataset as a CSV file. Open it in Google Sheets. Spend 5–10 minutes exploring it: How many rows? How many columns? What does each column mean? Are there obvious missing values or formatting issues?
3
Document 3 specific data quality issues you notice. Write them down in a separate tab or document. Examples: “Column X has 12 blank values,” “Column Y has inconsistent capitalization,” “Column Z has outliers (values over 10x the average).” Clean those 3 issues using what you learned in Module 3.
4
Write a short “Data Notes” paragraph: What is this dataset? Where does it come from? What did you find and fix? What limitations should someone know about? One paragraph is enough. This is what makes your analysis trustworthy.
5
Pick one specific question to answer. Not “what can I find in this data” — be specific: “Which country had the highest happiness score in 2023?” or “Which genre has the highest average Netflix rating?” One question. Answer it using your analysis tools (SQL, pandas, or Sheets).
6
Build one clear visualization that answers your question. Use Tableau Public (to get a live URL) or Google Colab (for a pandas chart). The visualization should be titled with your question so the audience knows immediately what it answers.
7
Log in to github.com. Create a new repository called data-analyst-portfolio. Set it to Public. Add a description: “My first data analyst portfolio project.”
8
Upload your cleaned CSV to the repository. Then click the README.md file (or create one). Write:
  • What the dataset is and where it came from
  • What question you asked
  • What you found
  • A link to your Tableau Public dashboard (or Colab notebook)
  • What data quality issues you found and fixed

💼 Your portfolio now contains:

  • A GitHub repository with a public URL
  • A cleaned CSV dataset
  • A documented analysis (README)
  • A published visualization (Tableau Public URL or Colab link)
  • One answered data question
🛑 Phase 4 done. Take a real break — you earned it.
🧠 Brain Break

You just built a portfolio piece. That is real. That is yours. Nobody can take that from you. Stand up. Walk to a different room and back. Let yourself feel the weight of what you actually just did.

Walk to a different room Drink water Breathe Tell yourself: I built this
✅ You Got This

The ONE thing to remember from this module:

One project. One question answered. One story told. That is all a first portfolio entry needs to be. Quality over quantity.

🏁 Phase 4 Complete

You have a real portfolio now. Phase 5 is SAP FI/CO — the enterprise finance system that most data analysts never learn. It is the rare skill that opens Fortune 500 doors. You are about to have it.

← Module 8: Power BI 📋 Course Home Phase 5: What Is SAP? →