Lab 8 of 10
pandas Basics
The data analyst’s most essential library
⏳ Loading Python + pandas... (this may take 15–20 seconds)
📦 This lab loads pandas and numpy — allow extra time on first load. You’ll see the green “Python ready!” message when it’s done.
📖 Concept Recap
pandas is the go-to Python library for data analysis. Key concepts:
- DataFrame — a 2D table (like a spreadsheet). Rows = observations, Columns = variables.
- Series — a single column (1D labeled array).
- Selecting columns:
df['Name']ordf[['Name', 'Age']] - Filtering rows:
df[df['Score'] > 90]or combined:df[(cond1) & (cond2)] - Groupby:
df.groupby('Category')['Value'].mean() - Stats:
df['col'].describe(),.mean(),.sum(),.value_counts() - Sorting:
df.sort_values('col', ascending=False)
👀 Worked Example
Creating and exploring a DataFrame:
import pandas as pd
import numpy as np
data = {
'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
'Department': ['Eng', 'Marketing', 'Eng', 'HR', 'Marketing'],
'Salary': [95000, 72000, 105000, 68000, 78000],
'Years': [5, 3, 8, 2, 6]
}
df = pd.DataFrame(data)
print(df)
print("\nShape:", df.shape)
print("\nBasic stats:")
print(df['Salary'].describe())
print("\nAvg salary by dept:")
print(df.groupby('Department')['Salary'].mean())
✏️ Guided
Exercise 1 — Student DataFrame Explorer
Complete the pandas operations by filling in the column name strings in the blanks.
Output will appear here...
💡 Hint: The blanks are column name strings:
'Major', 'Major', 'GPA', 'GPA'. In pandas, column names are always quoted strings.
💪 Independent
Exercise 2 — DataFrame Analysis
Using the same students DataFrame, write pandas code to:
- Find the student with the highest GPA (use
.idxmax()or.sort_values()) - Count students per year (use
.value_counts()) - Find the average GPA of scholarship vs non-scholarship students (use
.groupby()) - Sort by GPA descending and show the top 3 students
Output will appear here...
💡 Hint: Highest GPA:
students.loc[students['GPA'].idxmax()]. Year counts: students['Year'].value_counts(). Top 3: students.sort_values('GPA', ascending=False).head(3).
🔥 Challenge
Exercise 3 — GPA Categories with pd.cut()
Create a new column 'GPA_Category' using pd.cut() that labels each student’s GPA:
- Excellent: GPA ≥ 3.7
- Good: GPA ≥ 3.3
- Satisfactory: GPA ≥ 3.0
- Needs Improvement: GPA < 3.0
Then count how many students fall into each category.
Output will appear here...
💡 Hint: bins is the list
[0, 3.0, 3.3, 3.7, 4.01] (slightly above 4 to include 4.0). labels is the list of 4 category strings in matching order.
🏆 Mini Project
Mini Project — Sales Dataset Explorer
Analyze the sales DataFrame below. Answer all 5 business questions using pandas and print a formatted report:
- Total revenue and total quota across all rows
- Top salesperson by total revenue
- Best month by total revenue
- Revenue by region (sorted highest to lowest)
- Percentage of rows where rep exceeded their quota (Revenue > Quota)
Output will appear here...
💡 Hint: Q1:
sales['Revenue'].sum(). Q2: sales.groupby('Rep')['Revenue'].sum().idxmax(). Q5: sales['Hit_Quota'] = sales['Revenue'] > sales['Quota'] then .mean() * 100.✅ Lab 8 Complete!
You’ve created DataFrames, filtered rows, grouped data, and answered real business questions with pandas. This is the core skill of a data analyst.
Continue to Lab 9: Data Visualization →