Learn Without Walls
← Back to R Practice Labs
Lab 5 of 10

String Manipulation

Cleaning and transforming text in R

← Lab 4: Functions Lab 5 of 10 Lab 6: ggplot2 →
⏳ Loading R... (first load takes ~15 seconds)

📖 Concept Recap

R has two worlds of string tools — base R and the stringr package (part of the tidyverse):

stringr functions are consistent: they always take the string as the first argument and use the same pattern syntax. Use library(stringr) — no install needed in WebR.

👀 Worked Example

A name-cleaning function handling multiple messy formats:

library(stringr) raw_names <- c(" john SMITH ", "Sarah_Lee", "DR. Chen, Wei", "mike.brown@email.com") clean_name <- function(raw) { name <- str_trim(raw) if (str_detect(name, "@")) name <- str_split(name, "@")[[1]][1] name <- str_replace_all(name, "[_.]", " ") name <- str_remove(name, "DR\\. ") if (str_detect(name, ",")) { parts <- str_split(name, ", ")[[1]] name <- paste(parts[2], parts[1]) } str_to_title(name) } cleaned <- sapply(raw_names, clean_name) print(cleaned)
✏️ Guided

Exercise 1 — Text Analyzer

Run this text analysis code and study what each stringr function returns. Experiment by changing my_text.

Output will appear here...
💡 Hint: \\b in a regex pattern means “word boundary”. \\w+ matches one or more word characters. \\w{5,} matches 5 or more word characters (words with at least 5 letters).
💪 Independent

Exercise 2 — Phone Number Formatter

Write a format_phone(raw) function that takes messy phone number strings and returns them in standard 213-555-0123 format. Test on all 4 formats below.

Output will appear here...
💡 Hint: str_replace_all(x, "[^0-9]", "") removes everything that is NOT a digit. substr(digits, 1, 3) extracts the area code. nchar() gives the string length.
🔥 Challenge

Exercise 3 — Email List Parser

Write parse_email_list(emails) that takes a character vector of "Name <email@domain.com>" strings and returns a data frame with columns: display_name, username, domain.

Output will appear here...
💡 Hint: (?<=<)[^>]+ is a lookbehind regex: match characters after < but before >. str_extract() returns NA if no match is found, which is helpful for error detection.
🏆 Mini Project — Text Analysis

Full Paragraph Analysis

Perform a complete text analysis of the paragraph below. Compute all 6 metrics and print a formatted report.

Output will appear here...
💡 Hint: table(words) counts word frequencies. sort(..., decreasing=TRUE)[1:5] gets the top 5. nchar() works on vectors — mean(nchar(words) > 6) gives the proportion of long words.

✅ Lab 5 Complete!

You can now clean messy text, extract patterns with regex, and transform strings for real-world data cleaning tasks. String manipulation is essential for working with survey data, web scraping, and log files.

Continue to Lab 6: ggplot2 Visualization →

← Lab 4: Functions Lab 5 of 10 Lab 6: ggplot2 →