Mini Data Analysis Project

Put everything together — this is what analysts actually do

← Lab 9: Visualization Lab 10 of 10 Lab Home →

Loading Python + pandas + matplotlib... (this may take 15–20 seconds)

This lab loads pandas, numpy, and matplotlib — allow extra time on first load.

What This Lab Is

This is not a lab with exercises and fill-in-the-blanks. This is a real mini analysis project.

You will use everything you’ve learned: data structures, functions, pandas, and matplotlib. The scenario: you’re a data analyst at a retail company. Your manager wants answers to 5 business questions from this quarter’s sales data.

Your job: write the code, analyze the data, and present your findings. There is starter code below to help you get started. A full sample solution is available collapsed at the bottom — try it yourself first!

The Scenario

You are a data analyst at RetailCo. It’s the end of Q1 2024. Your manager has sent you this message:

“I need a quick Q1 data summary before the board meeting tomorrow. Can you pull together: total revenue and units, category breakdown, salesperson rankings, any weekly pattern, and a dashboard chart? Thanks.”

The dataset: 90 days of Q1 sales data with revenue, units, category, region, and salesperson columns. It’s already built into the starter code below.

5 Business Questions

Your Tasks

Q1: What was our total revenue and total units sold in Q1? What was the average daily revenue?
Q2: Which category generated the most revenue? Which had the highest average order value (revenue per unit)?
Q3: Who was the top salesperson? Show complete rankings for all 5 reps with their total revenue.
Q4: Is there a weekly pattern in revenue? Compare the average daily revenue on weekdays (Mon–Fri) vs. weekends (Sat–Sun).
Q5: Create a visualization dashboard (2×2 subplots): daily revenue trend, revenue by category (bar), revenue by salesperson (bar), revenue by region (horizontal bar).

Your Analysis Editor

The starter code includes the dataset and section headers. Fill in each section to answer all 5 questions. Then hit Run to see your results and charts.

import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

# ── The Dataset ────────────────────────────────────────────────
sales = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=90, freq='D'),
    'Revenue': [round(8000 + 2000*abs((i%30)-15)/15 + (i%7)*500 + (i%3)*300, 2) for i in range(90)],
    'Units':   [int(50 + 20*abs((i%30)-15)/15 + (i%7)*5) for i in range(90)],
    'Category':    ['Electronics','Clothing','Food']*30,
    'Region':      ['West','East','South','North','West','East']*15,
    'Salesperson': ['Alice','Bob','Carol','David','Eve']*18
})

# ── Q1: Total Revenue & Units ──────────────────────────────────
print("=" * 45)
print("Q1: OVERALL TOTALS")
print("=" * 45)
# Your code here:
# - Total revenue (use .sum())
# - Total units sold
# - Average daily revenue

# ── Q2: Category Analysis ──────────────────────────────────────
print("\n" + "=" * 45)
print("Q2: CATEGORY PERFORMANCE")
print("=" * 45)
# Your code here:
# - Revenue by category (groupby + sum)
# - Average order value = revenue / units per category

# ── Q3: Salesperson Rankings ───────────────────────────────────
print("\n" + "=" * 45)
print("Q3: SALESPERSON RANKINGS")
print("=" * 45)
# Your code here:
# - Total revenue by salesperson (groupby + sum, sorted)
# - Print ranked list with rank numbers

# ── Q4: Weekly Pattern ─────────────────────────────────────────
print("\n" + "=" * 45)
print("Q4: WEEKDAY vs WEEKEND REVENUE")
print("=" * 45)
sales['DayOfWeek'] = sales['Date'].dt.dayofweek  # 0=Mon, 6=Sun
# Hint: sales['IsWeekend'] = sales['DayOfWeek'] >= 5
# Your code here:

# ── Q5: Dashboard ──────────────────────────────────────────────
fig, axes = plt.subplots(2, 2, figsize=(13, 9))
fig.suptitle('Q1 2024 Sales Dashboard', fontsize=16, fontweight='bold')

# Top-left: Daily revenue trend
ax1 = axes[0, 0]
# Your plot here

# Top-right: Revenue by category
ax2 = axes[0, 1]
# Your plot here

# Bottom-left: Revenue by salesperson
ax3 = axes[1, 0]
# Your plot here

# Bottom-right: Revenue by region (horizontal bar)
ax4 = axes[1, 1]
# Your plot here

plt.tight_layout()
plt.show()

Output will appear here...

Sample Solution (try it yourself first!)

This is one complete working solution. Your approach may differ — that’s fine! There’s no single right answer.

import pandas as pd import numpy as np import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt sales = pd.DataFrame({ 'Date': pd.date_range('2024-01-01', periods=90, freq='D'), 'Revenue': [round(8000 + 2000*abs((i%30)-15)/15 + (i%7)*500 + (i%3)*300, 2) for i in range(90)], 'Units': [int(50 + 20*abs((i%30)-15)/15 + (i%7)*5) for i in range(90)], 'Category': ['Electronics','Clothing','Food']*30, 'Region': ['West','East','South','North','West','East']*15, 'Salesperson': ['Alice','Bob','Carol','David','Eve']*18 }) # ── Q1 ────────────────────────────────────────────────── print("=" * 45) print("Q1: OVERALL TOTALS") print("=" * 45) total_rev = sales['Revenue'].sum() total_units = sales['Units'].sum() avg_daily = sales['Revenue'].mean() print(f"Total Revenue: ${total_rev:,.2f}") print(f"Total Units Sold: {total_units:,}") print(f"Avg Daily Revenue: ${avg_daily:,.2f}") # ── Q2 ────────────────────────────────────────────────── print("\n" + "=" * 45) print("Q2: CATEGORY PERFORMANCE") print("=" * 45) cat_rev = sales.groupby('Category')['Revenue'].sum().sort_values(ascending=False) cat_units = sales.groupby('Category')['Units'].sum() cat_aov = (sales.groupby('Category')['Revenue'].sum() / sales.groupby('Category')['Units'].sum()).round(2) print("Revenue by Category:") for cat, rev in cat_rev.items(): print(f" {cat:<15} ${rev:>10,.2f} AOV: ${cat_aov[cat]:.2f}/unit") print(f"\nTop category: {cat_rev.idxmax()}") print(f"Highest AOV: {cat_aov.idxmax()} (${cat_aov.max():.2f}/unit)") # ── Q3 ────────────────────────────────────────────────── print("\n" + "=" * 45) print("Q3: SALESPERSON RANKINGS") print("=" * 45) rep_rev = sales.groupby('Salesperson')['Revenue'].sum().sort_values(ascending=False) for rank, (rep, rev) in enumerate(rep_rev.items(), 1): marker = " ← TOP PERFORMER" if rank == 1 else "" print(f" #{rank} {rep:<8} ${rev:>10,.2f}{marker}") # ── Q4 ────────────────────────────────────────────────── print("\n" + "=" * 45) print("Q4: WEEKDAY vs WEEKEND REVENUE") print("=" * 45) sales['DayOfWeek'] = sales['Date'].dt.dayofweek sales['IsWeekend'] = sales['DayOfWeek'] >= 5 weekday_avg = sales[sales['IsWeekend'] == False]['Revenue'].mean() weekend_avg = sales[sales['IsWeekend'] == True]['Revenue'].mean() diff = weekend_avg - weekday_avg print(f"Weekday avg revenue: ${weekday_avg:,.2f}") print(f"Weekend avg revenue: ${weekend_avg:,.2f}") print(f"Difference: ${diff:+,.2f} ({'weekends higher' if diff > 0 else 'weekdays higher'})") # ── Q5: Dashboard ──────────────────────────────────────── fig, axes = plt.subplots(2, 2, figsize=(13, 9)) fig.suptitle('Q1 2024 Sales Dashboard', fontsize=16, fontweight='bold') # Daily trend ax1 = axes[0, 0] ax1.plot(range(90), sales['Revenue'], color='#1565C0', linewidth=1.5, alpha=0.8) ax1.fill_between(range(90), sales['Revenue'], alpha=0.15, color='#1565C0') ax1.set_title('Daily Revenue Trend', fontweight='bold') ax1.set_xlabel('Day of Q1') ax1.set_ylabel('Revenue ($)') ax1.grid(True, alpha=0.3) # Revenue by category ax2 = axes[0, 1] cat_rev_sorted = cat_rev.sort_values() colors_cat = ['#E65100', '#2E7D32', '#1565C0'] ax2.bar(cat_rev_sorted.index, cat_rev_sorted.values, color=colors_cat) ax2.set_title('Revenue by Category', fontweight='bold') ax2.set_ylabel('Total Revenue ($)') for i, (cat, val) in enumerate(cat_rev_sorted.items()): ax2.text(i, val + 500, f'${val/1000:.0f}K', ha='center', fontsize=9) # Revenue by salesperson ax3 = axes[1, 0] rep_rev_sorted = rep_rev.sort_values() ax3.bar(rep_rev_sorted.index, rep_rev_sorted.values, color='#6A1B9A') ax3.set_title('Revenue by Salesperson', fontweight='bold') ax3.set_ylabel('Total Revenue ($)') for i, (rep, val) in enumerate(rep_rev_sorted.items()): ax3.text(i, val + 200, f'${val/1000:.0f}K', ha='center', fontsize=9) # Revenue by region (horizontal bar) ax4 = axes[1, 1] region_rev = sales.groupby('Region')['Revenue'].sum().sort_values() ax4.barh(region_rev.index, region_rev.values, color='#00695C') ax4.set_title('Revenue by Region', fontweight='bold') ax4.set_xlabel('Total Revenue ($)') for i, (reg, val) in enumerate(region_rev.items()): ax4.text(val + 100, i, f'${val/1000:.0f}K', va='center', fontsize=9) plt.tight_layout() plt.show()

You can copy this into the editor above and run it to see the full output and dashboard.

Capstone Complete — Labs Finished!

You’ve completed all 10 Python Practice Labs! You can now write Python functions, analyze data structures, build pandas pipelines, and create matplotlib visualizations — the core toolkit of a working data analyst.

Keep practicing. The best way to get better is to find real data you care about and start asking questions of it.

← Return to Python Practice Labs Home

← Lab 9: Visualization Lab 10 of 10 Lab Home →