Data Frames

R’s version of a spreadsheet — and so much more

← Lab 1: Vectors Lab 2 of 10 Lab 3: dplyr →

Loading R... (first load takes ~15 seconds)

Concept Recap

A data frame is a table where each column is a vector of the same length. It’s R’s most important data structure for real analysis.

Create: data.frame(col1 = vec1, col2 = vec2, ...)
Access columns: df$colname or df[["colname"]]
Filter rows: df[df$col == "value", ]
Dimensions: nrow(df), ncol(df), dim(df)
Explore: head(df), tail(df), str(df), summary(df)
Sort:df[order(df$col), ] or df[order(df$col, decreasing=TRUE), ]
Add columns: df$new_col <- expression

Worked Example

Creating and exploring a data frame from scratch:

employees <- data.frame( name = c("Alice", "Bob", "Carol", "David", "Eve"), dept = c("Eng", "Marketing", "Eng", "HR", "Marketing"), salary = c(95000, 72000, 105000, 68000, 78000), years = c(5, 3, 8, 2, 6), stringsAsFactors = FALSE ) # Explore str(employees) cat("\nDimensions:", dim(employees), "\n") cat("\nEngineering staff:\n") print(employees[employees$dept == "Eng", ]) cat("\nAverage salary:", mean(employees$salary), "\n") cat("Highest paid:", employees$name[which.max(employees$salary)], "\n")

Guided

Exercise 1 — Students Data Frame Explorer

Fill in the blanks to complete the data frame operations.

students <- data.frame(
  name = c("Alex","Beth","Carlos","Diana","Ethan","Fiona"),
  major = c("CS","Math","CS","English","CS","Math"),
  gpa = c(3.8, 3.2, 3.9, 2.9, 3.5, 3.7),
  year = c(3, 2, 4, 1, 2, 3),
  stringsAsFactors = FALSE
)

# 1. Show structure
str(students)

# 2. Filter CS students only
cs_students <- students[students$major == "CS", ]
print(cs_students)

# 3. Sort by GPA descending
ranked <- students[order(students$gpa, decreasing = TRUE), ]
print(ranked[, c("name", "gpa")])

# 4. Add a new column: honor_roll (TRUE if gpa >= 3.7)
students$honor_roll <- students$gpa >= 3.7
print(students)

Output will appear here...

Hint: The comma inside [ , ] separates row conditions from column selections. students[condition, ] filters rows. students[, c("col1","col2")] selects columns.

Independent

Exercise 2 — Data Frame Analysis

Using the students data frame above, write code to:

Calculate average GPA by major using tapply(gpa, major, mean)
Find the student with the lowest GPA
Count students per year using table(students$year)
Create a new column gpa_letter using ifelse() — “A” if gpa ≥ 3.7, “B” if ≥ 3.3, “C” otherwise

students <- data.frame(
  name = c("Alex","Beth","Carlos","Diana","Ethan","Fiona"),
  major = c("CS","Math","CS","English","CS","Math"),
  gpa = c(3.8, 3.2, 3.9, 2.9, 3.5, 3.7),
  year = c(3, 2, 4, 1, 2, 3),
  stringsAsFactors = FALSE
)

# 1. Average GPA by major

# 2. Student with lowest GPA

# 3. Count students per year

# 4. Add gpa_letter column

Output will appear here...

Hint: tapply(students$gpa, students$major, mean) computes mean GPA per major. For lowest GPA: students[which.min(students$gpa), ]. Nested ifelse(): ifelse(gpa >= 3.7, "A", ifelse(gpa >= 3.3, "B", "C")).

Challenge

Exercise 3 — Sales Data Frame

Create a sales data frame with columns: rep (5 names, some repeated across 15 rows), region (West/East/South), month (Jan/Feb/Mar), revenue (15 numeric values). Then:

Calculate total revenue by rep using aggregate(revenue ~ rep, data=sales, FUN=sum)
Find the best month (highest total revenue)
Filter to only West region and show the results

# Create the sales data frame (15 rows)
sales <- data.frame(
  rep = c("Alice","Bob","Carol","Alice","David","Bob","Carol","David","Eve",
          "Alice","Bob","Carol","David","Eve","Eve"),
  region = c("West","East","West","West","South","East","West","South","North",
             "West","East","South","West","North","East"),
  month = c("Jan","Jan","Jan","Feb","Jan","Feb","Feb","Feb","Jan",
            "Mar","Mar","Mar","Mar","Mar","Feb"),
  revenue = c(18500,12300,22100,19800,15600,16900,25400,13200,11800,
              20500,14200,26800,15100,12900,18200)
)

# 1. Total revenue by rep

# 2. Best month

# 3. West region only

Output will appear here...

Hint: aggregate(revenue ~ rep, data=sales, FUN=sum) sums revenue by rep. For best month: use aggregate then find the max row. West: sales[sales$region == "West", ].

Mini Project — Class Report

Build a Complete Class Dataset & Report

Build a class dataset with 8 students and these columns: name, major, gpa, year, scholarship (logical), credits_completed (numeric). Then write code to print a complete class report covering:

Class size and GPA distribution (min, mean, max)
Names of honor roll students (gpa ≥ 3.7)
Names of scholarship recipients
Most common major (use table() and which.max())
A ranked table sorted by GPA descending (name, major, gpa columns only)

# Build your class dataset
students <- data.frame(
  name = c(),
  major = c(),
  gpa = c(),
  year = c(),
  scholarship = c(),
  credits_completed = c(),
  stringsAsFactors = FALSE
)

# Print the complete class report
cat("=== CLASS REPORT ===\n")

Output will appear here...

Hint: Fill in the c() vectors with 8 values each. Use students$name[students$gpa >= 3.7] for honor roll. names(which.max(table(students$major))) finds the most common major.

Lab 2 Complete!

You’ve created, explored, filtered, sorted, and analyzed data frames — the backbone of R data analysis. In Lab 3, you’ll supercharge this with dplyr’s elegant pipeline syntax.

Continue to Lab 3: dplyr Data Wrangling →

← Lab 1: Vectors Lab 2 of 10 Lab 3: dplyr →