Working with Data

Structuring and processing real-world data

← Lab 6: Strings Lab 7 of 10 Lab 8: pandas →

Loading Python... (first load takes ~10 seconds)

Concept Recap

Real data usually comes as a list of dictionaries — one dict per record:

[{"name": "Alice", "score": 92}, {"name": "Bob", "score": 78}, ...]

Filtering: [r for r in records if r["field"] == value]
Sorting: sorted(records, key=lambda r: r["field"], reverse=True)
Aggregating: sum(r["salary"] for r in records)
Grouping:loop and build a dict of lists
Finding max/min record: max(records, key=lambda r: r["value"])

This is the foundation of data analysis — and exactly what pandas does under the hood.

Worked Example

Analyzing an employee dataset — grouping and summarizing:

from collections import defaultdict employees = [ {"name": "Alice Chen", "dept": "Engineering", "salary": 95000, "years": 5}, {"name": "Bob Smith", "dept": "Marketing", "salary": 72000, "years": 3}, {"name": "Carol Wu", "dept": "Engineering", "salary": 105000, "years": 8}, {"name": "David Park", "dept": "HR", "salary": 68000, "years": 2}, {"name": "Eve Johnson", "dept": "Marketing", "salary": 78000, "years": 6}, ] # Group by department by_dept = defaultdict(list) for emp in employees: by_dept[emp["dept"]].append(emp) # Average salary by department for dept, emps in by_dept.items(): avg = sum(e["salary"] for e in emps) / len(emps) print(f"{dept}: ${avg:,.0f} avg salary") # Top earner top = max(employees, key=lambda e: e["salary"]) print(f"\nTop earner: {top['name']} (${top['salary']:,})")

Guided

Exercise 1 — Data Filter and Sorter

Complete the data operations by filling in the blanks. Each blank is a dict key name (as a string).

 students = [
    {"name": "Alex",   "gpa": 3.8, "major": "CS",      "year": 3},
    {"name": "Beth",   "gpa": 3.2, "major": "Math",     "year": 2},
    {"name": "Carlos", "gpa": 3.9, "major": "CS",       "year": 4},
    {"name": "Diana",  "gpa": 2.9, "major": "English",  "year": 1},
    {"name": "Ethan",  "gpa": 3.5, "major": "CS",       "year": 2},
    {"name": "Fiona",  "gpa": 3.7, "major": "Math",     "year": 3},
]

# 1. Filter: CS students only
cs_students = [s for s in students if s[___] == "CS"]
print("CS Students:", [s["name"] for s in cs_students])

# 2. Sort by GPA descending
ranked = sorted(students, key=lambda s: s[___], reverse=True)
print("\nRanked by GPA:")
for i, s in enumerate(ranked, 1):
    print(f"  {i}. {s['name']}: {s['gpa']}")

# 3. Average GPA
avg_gpa = sum(s[___] for s in students) / len(students)
print(f"\nClass average GPA: {avg_gpa:.2f}")

Output will appear here...

Hint: The blanks are dict key names: "major", "gpa", "gpa". Use the exact spelling from the data.

Independent

Exercise 2 — Multi-Field Analysis

Using the students list from Exercise 1, write code to find:

The highest GPA in each major
How many students are in each year (1, 2, 3, 4)
The honor roll students (GPA ≥ 3.7) — print their names

 students = [
    {"name": "Alex",   "gpa": 3.8, "major": "CS",      "year": 3},
    {"name": "Beth",   "gpa": 3.2, "major": "Math",     "year": 2},
    {"name": "Carlos", "gpa": 3.9, "major": "CS",       "year": 4},
    {"name": "Diana",  "gpa": 2.9, "major": "English",  "year": 1},
    {"name": "Ethan",  "gpa": 3.5, "major": "CS",       "year": 2},
    {"name": "Fiona",  "gpa": 3.7, "major": "Math",     "year": 3},
]

# Your analysis here

Output will appear here...

Hint: For highest GPA per major, group students by major first, then use max() on each group. For year counts, build a dict like year_counts = {} and use .get(year, 0) + 1.

Challenge

Exercise 3 — Generic Pivot Function

Write a function pivot_by_field(records, field) that groups any list of dicts by any given field. It should return a dict of lists. Test it on the students list by grouping by both "major" and "year".

 def pivot_by_field(records, field):
    # Your function here
    pass

students = [
    {"name": "Alex",   "gpa": 3.8, "major": "CS",      "year": 3},
    {"name": "Beth",   "gpa": 3.2, "major": "Math",     "year": 2},
    {"name": "Carlos", "gpa": 3.9, "major": "CS",       "year": 4},
    {"name": "Diana",  "gpa": 2.9, "major": "English",  "year": 1},
    {"name": "Ethan",  "gpa": 3.5, "major": "CS",       "year": 2},
    {"name": "Fiona",  "gpa": 3.7, "major": "Math",     "year": 3},
]

# Test by major
by_major = pivot_by_field(students, "major")
for major, group in by_major.items():
    names = [s["name"] for s in group]
    print(f"{major}: {names}")

print()

# Test by year
by_year = pivot_by_field(students, "year")
for year, group in sorted(by_year.items()):
    names = [s["name"] for s in group]
    print(f"Year {year}: {names}")

Output will appear here...

Hint: Start with result = {}. Loop through records. Use result.setdefault(record[field], []).append(record) to build the groups without checking if the key exists yet.

Mini Project

Mini Project — Sales Report Processor

Analyze the sales data below. Write a complete analysis that answers all 5 questions and prints a formatted report:

Total revenue across all records
Revenue by region
Top 3 performing sales reps (total revenue each)
Month with highest total sales
Reps who exceeded quota (quota = $15,000 total)

 sales_data = [
    {"rep": "Alice", "region": "West",  "month": "Jan", "revenue": 18500},
    {"rep": "Bob",   "region": "East",  "month": "Jan", "revenue": 12300},
    {"rep": "Carol", "region": "West",  "month": "Feb", "revenue": 22100},
    {"rep": "Alice", "region": "West",  "month": "Feb", "revenue": 19800},
    {"rep": "David", "region": "South", "month": "Jan", "revenue": 15600},
    {"rep": "Bob",   "region": "East",  "month": "Feb", "revenue": 16900},
    {"rep": "Carol", "region": "West",  "month": "Mar", "revenue": 25400},
    {"rep": "David", "region": "South", "month": "Feb", "revenue": 13200},
]

# Your full analysis and report here

Output will appear here...

Hint: Build separate dicts for rep_revenue, region_revenue, and month_revenue as you loop. Sort rep_revenue by value and take the top 3 with sorted(..., key=lambda x: x[1], reverse=True)[:3].

Lab 7 Complete!

You can now structure, filter, sort, group, and aggregate real-world data — all without any external libraries. This is exactly what pandas automates. Now you’re ready for it.

Continue to Lab 8: pandas Basics →

← Lab 6: Strings Lab 7 of 10 Lab 8: pandas →