Learn Without Walls
← Back to R Practice Labs
Lab 2 of 10

Data Frames

R’s version of a spreadsheet — and so much more

← Lab 1: Vectors Lab 2 of 10 Lab 3: dplyr →
⏳ Loading R... (first load takes ~15 seconds)

📖 Concept Recap

A data frame is a table where each column is a vector of the same length. It’s R’s most important data structure for real analysis.

👀 Worked Example

Creating and exploring a data frame from scratch:

employees <- data.frame( name = c("Alice", "Bob", "Carol", "David", "Eve"), dept = c("Eng", "Marketing", "Eng", "HR", "Marketing"), salary = c(95000, 72000, 105000, 68000, 78000), years = c(5, 3, 8, 2, 6), stringsAsFactors = FALSE ) # Explore str(employees) cat("\nDimensions:", dim(employees), "\n") cat("\nEngineering staff:\n") print(employees[employees$dept == "Eng", ]) cat("\nAverage salary:", mean(employees$salary), "\n") cat("Highest paid:", employees$name[which.max(employees$salary)], "\n")
✏️ Guided

Exercise 1 — Students Data Frame Explorer

Fill in the blanks to complete the data frame operations.

Output will appear here...
💡 Hint: The comma inside [ , ] separates row conditions from column selections. students[condition, ] filters rows. students[, c("col1","col2")] selects columns.
💪 Independent

Exercise 2 — Data Frame Analysis

Using the students data frame above, write code to:

  1. Calculate average GPA by major using tapply(gpa, major, mean)
  2. Find the student with the lowest GPA
  3. Count students per year using table(students$year)
  4. Create a new column gpa_letter using ifelse() — “A” if gpa ≥ 3.7, “B” if ≥ 3.3, “C” otherwise
Output will appear here...
💡 Hint: tapply(students$gpa, students$major, mean) computes mean GPA per major. For lowest GPA: students[which.min(students$gpa), ]. Nested ifelse(): ifelse(gpa >= 3.7, "A", ifelse(gpa >= 3.3, "B", "C")).
🔥 Challenge

Exercise 3 — Sales Data Frame

Create a sales data frame with columns: rep (5 names, some repeated across 15 rows), region (West/East/South), month (Jan/Feb/Mar), revenue (15 numeric values). Then:

  1. Calculate total revenue by rep using aggregate(revenue ~ rep, data=sales, FUN=sum)
  2. Find the best month (highest total revenue)
  3. Filter to only West region and show the results
Output will appear here...
💡 Hint: aggregate(revenue ~ rep, data=sales, FUN=sum) sums revenue by rep. For best month: use aggregate then find the max row. West: sales[sales$region == "West", ].
🏆 Mini Project — Class Report

Build a Complete Class Dataset & Report

Build a class dataset with 8 students and these columns: name, major, gpa, year, scholarship (logical), credits_completed (numeric). Then write code to print a complete class report covering:

Output will appear here...
💡 Hint: Fill in the c() vectors with 8 values each. Use students$name[students$gpa >= 3.7] for honor roll. names(which.max(table(students$major))) finds the most common major.

✅ Lab 2 Complete!

You’ve created, explored, filtered, sorted, and analyzed data frames — the backbone of R data analysis. In Lab 3, you’ll supercharge this with dplyr’s elegant pipeline syntax.

Continue to Lab 3: dplyr Data Wrangling →

← Lab 1: Vectors Lab 2 of 10 Lab 3: dplyr →