Learn Without Walls
← Back to R Practice Labs
Lab 8 of 10

Tidyr & Data Reshaping

Tidy data is happy data

← Lab 7: Statistics Lab 8 of 10 Lab 9: Reporting →
⏳ Loading R... (first load takes ~15 seconds)

📖 Concept Recap

Tidy data has one rule: each variable is a column, each observation is a row. Real-world data is often “wide” (one row per subject, one column per time point), but tidy “long” format is needed for ggplot2 and dplyr.

Use library(tidyr) together with library(dplyr). Both load without installation in WebR.

👀 Worked Example

Pivoting exam scores from wide format to long, then summarizing per student:

library(tidyr); library(dplyr) wide <- data.frame( student = c("Alice","Bob","Carol"), exam_1 = c(88, 75, 92), exam_2 = c(91, 80, 88), exam_3 = c(85, 78, 95) ) cat("Wide format:\n"); print(wide) long <- wide |> pivot_longer( cols = starts_with("exam"), names_to = "exam", values_to = "score" ) cat("\nLong format:\n"); print(long) long |> group_by(student) |> summarize(avg = mean(score), best = max(score)) |> print()
✏️ Guided

Exercise 1 — Monthly Sales Pivot

Pivot the wide monthly sales table to long format so each row represents one rep–month combination.

Output will appear here...
💡 Hint: pivot_longer(cols = c(Jan, Feb, Mar), ...) stacks the three month columns. Alternatively use cols = -rep to pivot all columns except rep. The names_to argument names the new key column.
💪 Independent

Exercise 2 — City Population Growth

Start with a wide city population table (2020–2023). Pivot to long, calculate year-over-year growth rates, and find the fastest-growing city each year.

Output will appear here...
💡 Hint: lag(population) gets the previous row’s value within each city group — perfect for year-over-year growth. sub("pop_", "", year) strips the prefix to get just the year number.
🔥 Challenge

Exercise 3 — separate() and unite()

Use separate() to split full_name into first and last columns, and split date_range into start and end dates. Then use unite() to create a name_id column by combining last name and ID.

Output will appear here...
💡 Hint: separate(col, into=c("a","b"), sep="pattern") splits on a regex pattern. unite("new_col", col1, col2, sep="_") combines two columns with a separator. Both functions modify the data frame in a pipeline.
🏆 Mini Project — Gradebook Transformation

Wide → Long → Letter Grades → Wide Again

Transform a wide gradebook through a complete pipeline: pivot to long, add letter grade column, then pivot back wide with letter grades, and generate a summary table.

Output will appear here...
💡 Hint: When pivoting back wide, use pivot_wider(names_from = assignment, values_from = grade). The names_from column provides the new column names and values_from provides the cell values.

✅ Lab 8 Complete!

You can now reshape any dataset between wide and long formats, split and combine columns, and build complete data pipelines. Tidy data enables all the visualization and analysis techniques from earlier labs.

Continue to Lab 9: Professional Output & Reporting →

← Lab 7: Statistics Lab 8 of 10 Lab 9: Reporting →