Data Frames
R’s version of a spreadsheet — and so much more
📖 Concept Recap
A data frame is a table where each column is a vector of the same length. It’s R’s most important data structure for real analysis.
- Create:
data.frame(col1 = vec1, col2 = vec2, ...) - Access columns:
df$colnameordf[["colname"]] - Filter rows:
df[df$col == "value", ] - Dimensions:
nrow(df),ncol(df),dim(df) - Explore:
head(df),tail(df),str(df),summary(df) - Sort:
df[order(df$col), ]ordf[order(df$col, decreasing=TRUE), ] - Add columns:
df$new_col <- expression
👀 Worked Example
Creating and exploring a data frame from scratch:
Exercise 1 — Students Data Frame Explorer
Fill in the blanks to complete the data frame operations.
[ , ] separates row conditions from column selections. students[condition, ] filters rows. students[, c("col1","col2")] selects columns.Exercise 2 — Data Frame Analysis
Using the students data frame above, write code to:
- Calculate average GPA by major using
tapply(gpa, major, mean) - Find the student with the lowest GPA
- Count students per year using
table(students$year) - Create a new column
gpa_letterusingifelse()— “A” if gpa ≥ 3.7, “B” if ≥ 3.3, “C” otherwise
tapply(students$gpa, students$major, mean) computes mean GPA per major. For lowest GPA: students[which.min(students$gpa), ]. Nested ifelse(): ifelse(gpa >= 3.7, "A", ifelse(gpa >= 3.3, "B", "C")).Exercise 3 — Sales Data Frame
Create a sales data frame with columns: rep (5 names, some repeated across 15 rows), region (West/East/South), month (Jan/Feb/Mar), revenue (15 numeric values). Then:
- Calculate total revenue by rep using
aggregate(revenue ~ rep, data=sales, FUN=sum) - Find the best month (highest total revenue)
- Filter to only West region and show the results
aggregate(revenue ~ rep, data=sales, FUN=sum) sums revenue by rep. For best month: use aggregate then find the max row. West: sales[sales$region == "West", ].Build a Complete Class Dataset & Report
Build a class dataset with 8 students and these columns: name, major, gpa, year, scholarship (logical), credits_completed (numeric). Then write code to print a complete class report covering:
- Class size and GPA distribution (min, mean, max)
- Names of honor roll students (gpa ≥ 3.7)
- Names of scholarship recipients
- Most common major (use
table()andwhich.max()) - A ranked table sorted by GPA descending (name, major, gpa columns only)
c() vectors with 8 values each. Use students$name[students$gpa >= 3.7] for honor roll. names(which.max(table(students$major))) finds the most common major.✅ Lab 2 Complete!
You’ve created, explored, filtered, sorted, and analyzed data frames — the backbone of R data analysis. In Lab 3, you’ll supercharge this with dplyr’s elegant pipeline syntax.
Continue to Lab 3: dplyr Data Wrangling →