Learn Without Walls
← Back to Python Practice Labs
Lab 10 of 10 — Capstone

Mini Data Analysis Project

Put everything together — this is what analysts actually do

← Lab 9: Visualization Lab 10 of 10 Lab Home →
⏳ Loading Python + pandas + matplotlib... (this may take 15–20 seconds)
📦 This lab loads pandas, numpy, and matplotlib — allow extra time on first load.

📋 What This Lab Is

This is not a lab with exercises and fill-in-the-blanks. This is a real mini analysis project.

You will use everything you’ve learned: data structures, functions, pandas, and matplotlib. The scenario: you’re a data analyst at a retail company. Your manager wants answers to 5 business questions from this quarter’s sales data.

Your job: write the code, analyze the data, and present your findings. There is starter code below to help you get started. A full sample solution is available collapsed at the bottom — try it yourself first!

🏢 The Scenario

You are a data analyst at RetailCo. It’s the end of Q1 2024. Your manager has sent you this message:

“I need a quick Q1 data summary before the board meeting tomorrow. Can you pull together: total revenue and units, category breakdown, salesperson rankings, any weekly pattern, and a dashboard chart? Thanks.”

The dataset: 90 days of Q1 sales data with revenue, units, category, region, and salesperson columns. It’s already built into the starter code below.

🏆 5 Business Questions

Your Tasks

💻 Your Analysis Editor

The starter code includes the dataset and section headers. Fill in each section to answer all 5 questions. Then hit Run to see your results and charts.

Output will appear here...
🔑 Sample Solution (try it yourself first!)

This is one complete working solution. Your approach may differ — that’s fine! There’s no single right answer.

import pandas as pd import numpy as np import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt sales = pd.DataFrame({ 'Date': pd.date_range('2024-01-01', periods=90, freq='D'), 'Revenue': [round(8000 + 2000*abs((i%30)-15)/15 + (i%7)*500 + (i%3)*300, 2) for i in range(90)], 'Units': [int(50 + 20*abs((i%30)-15)/15 + (i%7)*5) for i in range(90)], 'Category': ['Electronics','Clothing','Food']*30, 'Region': ['West','East','South','North','West','East']*15, 'Salesperson': ['Alice','Bob','Carol','David','Eve']*18 }) # ── Q1 ────────────────────────────────────────────────── print("=" * 45) print("Q1: OVERALL TOTALS") print("=" * 45) total_rev = sales['Revenue'].sum() total_units = sales['Units'].sum() avg_daily = sales['Revenue'].mean() print(f"Total Revenue: ${total_rev:,.2f}") print(f"Total Units Sold: {total_units:,}") print(f"Avg Daily Revenue: ${avg_daily:,.2f}") # ── Q2 ────────────────────────────────────────────────── print("\n" + "=" * 45) print("Q2: CATEGORY PERFORMANCE") print("=" * 45) cat_rev = sales.groupby('Category')['Revenue'].sum().sort_values(ascending=False) cat_units = sales.groupby('Category')['Units'].sum() cat_aov = (sales.groupby('Category')['Revenue'].sum() / sales.groupby('Category')['Units'].sum()).round(2) print("Revenue by Category:") for cat, rev in cat_rev.items(): print(f" {cat:<15} ${rev:>10,.2f} AOV: ${cat_aov[cat]:.2f}/unit") print(f"\nTop category: {cat_rev.idxmax()}") print(f"Highest AOV: {cat_aov.idxmax()} (${cat_aov.max():.2f}/unit)") # ── Q3 ────────────────────────────────────────────────── print("\n" + "=" * 45) print("Q3: SALESPERSON RANKINGS") print("=" * 45) rep_rev = sales.groupby('Salesperson')['Revenue'].sum().sort_values(ascending=False) for rank, (rep, rev) in enumerate(rep_rev.items(), 1): marker = " ← TOP PERFORMER" if rank == 1 else "" print(f" #{rank} {rep:<8} ${rev:>10,.2f}{marker}") # ── Q4 ────────────────────────────────────────────────── print("\n" + "=" * 45) print("Q4: WEEKDAY vs WEEKEND REVENUE") print("=" * 45) sales['DayOfWeek'] = sales['Date'].dt.dayofweek sales['IsWeekend'] = sales['DayOfWeek'] >= 5 weekday_avg = sales[sales['IsWeekend'] == False]['Revenue'].mean() weekend_avg = sales[sales['IsWeekend'] == True]['Revenue'].mean() diff = weekend_avg - weekday_avg print(f"Weekday avg revenue: ${weekday_avg:,.2f}") print(f"Weekend avg revenue: ${weekend_avg:,.2f}") print(f"Difference: ${diff:+,.2f} ({'weekends higher' if diff > 0 else 'weekdays higher'})") # ── Q5: Dashboard ──────────────────────────────────────── fig, axes = plt.subplots(2, 2, figsize=(13, 9)) fig.suptitle('Q1 2024 Sales Dashboard', fontsize=16, fontweight='bold') # Daily trend ax1 = axes[0, 0] ax1.plot(range(90), sales['Revenue'], color='#1565C0', linewidth=1.5, alpha=0.8) ax1.fill_between(range(90), sales['Revenue'], alpha=0.15, color='#1565C0') ax1.set_title('Daily Revenue Trend', fontweight='bold') ax1.set_xlabel('Day of Q1') ax1.set_ylabel('Revenue ($)') ax1.grid(True, alpha=0.3) # Revenue by category ax2 = axes[0, 1] cat_rev_sorted = cat_rev.sort_values() colors_cat = ['#E65100', '#2E7D32', '#1565C0'] ax2.bar(cat_rev_sorted.index, cat_rev_sorted.values, color=colors_cat) ax2.set_title('Revenue by Category', fontweight='bold') ax2.set_ylabel('Total Revenue ($)') for i, (cat, val) in enumerate(cat_rev_sorted.items()): ax2.text(i, val + 500, f'${val/1000:.0f}K', ha='center', fontsize=9) # Revenue by salesperson ax3 = axes[1, 0] rep_rev_sorted = rep_rev.sort_values() ax3.bar(rep_rev_sorted.index, rep_rev_sorted.values, color='#6A1B9A') ax3.set_title('Revenue by Salesperson', fontweight='bold') ax3.set_ylabel('Total Revenue ($)') for i, (rep, val) in enumerate(rep_rev_sorted.items()): ax3.text(i, val + 200, f'${val/1000:.0f}K', ha='center', fontsize=9) # Revenue by region (horizontal bar) ax4 = axes[1, 1] region_rev = sales.groupby('Region')['Revenue'].sum().sort_values() ax4.barh(region_rev.index, region_rev.values, color='#00695C') ax4.set_title('Revenue by Region', fontweight='bold') ax4.set_xlabel('Total Revenue ($)') for i, (reg, val) in enumerate(region_rev.items()): ax4.text(val + 100, i, f'${val/1000:.0f}K', va='center', fontsize=9) plt.tight_layout() plt.show()

You can copy this into the editor above and run it to see the full output and dashboard.

🏆 Capstone Complete — Labs Finished!

You’ve completed all 10 Python Practice Labs! You can now write Python functions, analyze data structures, build pandas pipelines, and create matplotlib visualizations — the core toolkit of a working data analyst.

Keep practicing. The best way to get better is to find real data you care about and start asking questions of it.

← Return to Python Practice Labs Home

← Lab 9: Visualization Lab 10 of 10 Lab Home →