Learn Without Walls
← Back to R Practice Labs
Lab 3 of 10

dplyr: Data Wrangling

The grammar of data manipulation

← Lab 2: Data Frames Lab 3 of 10 Lab 4: Functions →
⏳ Loading R... (first load takes ~15 seconds)

📖 Concept Recap

dplyr provides 5 main verbs that cover nearly all data manipulation needs:

The pipe |> chains operations: output of the left becomes input on the right. No need to install in WebR — just library(dplyr). Use n() inside summarize to count rows per group.

👀 Worked Example

Study this complete dplyr pipeline before starting the exercises:

library(dplyr) sales <- data.frame( rep = c("Alice","Bob","Carol","Alice","David","Bob","Carol","David"), region = c("West","East","West","West","South","East","West","South"), month = c("Jan","Jan","Jan","Feb","Jan","Feb","Feb","Feb"), revenue= c(18500,12300,22100,19800,15600,16900,25400,13200), stringsAsFactors = FALSE ) result <- sales |> group_by(rep) |> summarize(total = sum(revenue), avg = mean(revenue), n = n()) |> arrange(desc(total)) print(result) west <- sales |> filter(region == "West") |> mutate(quota_met = revenue > 15000) |> select(rep, month, revenue, quota_met) print(west)
✏️ Guided

Exercise 1 — Student Wrangling

The blanks have been filled in for you — run the code and study the output, then experiment by changing the filter threshold or sort order.

Output will appear here...
💡 Hint: filter(gpa > 3.5) keeps rows where GPA exceeds 3.5. group_by(major) splits the data before summarize() computes per-group stats. rank(-gpa) ranks from highest to lowest.
💪 Independent

Exercise 2 — Advanced Student Summaries

Using the same students data frame, write three separate pipelines:

  1. Find the top student per major using group_by(major) |> slice_max(gpa, n=1)
  2. Count students per year+major combination using group_by(year, major) |> summarize(count = n())
  3. Create a summary by major showing count, average GPA, and max GPA, arranged by avg_gpa descending
Output will appear here...
💡 Hint: slice_max(gpa, n=1) picks the row with the highest GPA per group. Chain multiple group_by columns for crossed summaries. Use summarize(count=n(), avg=mean(gpa), top=max(gpa)) followed by arrange(desc(avg)).
🔥 Challenge

Exercise 3 — Performance Classification Pipeline

Build a single pipeline on students that: (1) filters to year ≥ 2, (2) adds a performance column using case_when() — “Excellent” if gpa ≥ 3.7, “Good” if gpa ≥ 3.3, “Fair” otherwise, (3) groups by major and performance, (4) counts, (5) arranges by major then count descending.

Output will appear here...
💡 Hint: case_when() works like nested if/else — each condition uses ~ to separate the test from the result. TRUE ~ "Fair" is the catch-all default. Add .groups = "drop" after summarize to avoid a warning.
🏆 Mini Project — Sales Analysis

Full Sales Pipeline Analysis

Using the sales data frame below, use dplyr pipelines to answer all 5 questions. Use |> throughout.

Output will appear here...
💡 Hint: Q4: filter(revenue > 15000) |> distinct(rep). Q5: compute team average with mean(revenue) then calculate percentage above. Use pull() to extract a single value from a pipeline result.

✅ Lab 3 Complete!

You’ve mastered dplyr — the most widely used R package for data wrangling. The five verbs plus pipes let you express complex transformations in clean, readable code.

Continue to Lab 4: Functions & Control Flow →

← Lab 2: Data Frames Lab 3 of 10 Lab 4: Functions →