Lesson 7.4: Common String Patterns
By the end of this lesson, you will be able to:
- Split strings into lists using split()
- Join lists back into strings using join()
- Parse structured data from strings
- Build formatted strings efficiently
- Work with multiline strings and raw strings
Splitting Strings
split(): Breaks a string into a list of substrings based on a separator. By default, it splits on whitespace.
# Split on whitespace (default) sentence = "Python is a great language" words = sentence.split() print(words) print(f"Word count: {len(words)}")
['Python', 'is', 'a', 'great', 'language']
Word count: 5
Splitting on a Specific Separator
# Split CSV data csv_line = "Alice,25,Engineer,New York" fields = csv_line.split(",") print(fields) # Split a URL path url = "https://example.com/blog/2024/post" parts = url.split("/") print(parts) # Split on any string data = "red::green::blue::yellow" colors = data.split("::") print(colors)
['Alice', '25', 'Engineer', 'New York']
['https:', '', 'example.com', 'blog', '2024', 'post']
['red', 'green', 'blue', 'yellow']
Limiting Splits
# Split only on the first occurrence assignment = "name = John Smith" key, value = assignment.split(" = ", 1) print(f"Key: {key}") print(f"Value: {value}")
Key: name
Value: John Smith
Joining Strings
join(): The opposite of split(). Combines a list of strings into a single string with a separator between each element. Syntax:
separator.join(list)
words = ["Python", "is", "awesome"] # Join with spaces sentence = " ".join(words) print(sentence) # Join with commas csv = ",".join(words) print(csv) # Join with no separator letters = ["P", "y", "t", "h", "o", "n"] word = "".join(letters) print(word) # Join with newlines lines = ["Line 1", "Line 2", "Line 3"] text = "\n".join(lines) print(text)
Python is awesome
Python,is,awesome
Python
Line 1
Line 2
Line 3
split() and join() Together
# Normalize spacing in a sentence messy = " too many spaces here " clean = " ".join(messy.split()) print(f"'{clean}'") # Convert between separators csv_data = "Alice,Bob,Charlie" tab_data = "\t".join(csv_data.split(",")) print(tab_data)
'too many spaces here'
Alice Bob Charlie
Parsing Structured Data
Combining split(), slicing, and other methods lets you extract information from structured text.
Parsing a Log Line
log = "2024-03-15 14:30:22 ERROR Database connection failed" parts = log.split(" ", 3) # Split into 4 parts max date = parts[0] time = parts[1] level = parts[2] message = parts[3] print(f"Date: {date}") print(f"Time: {time}") print(f"Level: {level}") print(f"Message: {message}")
Date: 2024-03-15
Time: 14:30:22
Level: ERROR
Message: Database connection failed
Parsing CSV Data
csv_text = "Alice,25,Engineer\nBob,30,Designer\nCharlie,28,Teacher" for line in csv_text.split("\n"): name, age, job = line.split(",") print(f"{name} is {age} years old and works as a {job}")
Alice is 25 years old and works as a Engineer
Bob is 30 years old and works as a Designer
Charlie is 28 years old and works as a Teacher
Building Strings
There are several ways to build strings from data in Python.
String Concatenation (+ operator)
first = "Hello" last = "World" result = first + ", " + last + "!" print(result)
Hello, World!
f-strings (Recommended)
name = "Alice" age = 25 print(f"{name} is {age} years old") print(f"In 5 years, {name} will be {age + 5}")
Alice is 25 years old
In 5 years, Alice will be 30
Building with join() (Most Efficient for Many Items)
# More efficient than concatenation in a loop parts = [] for i in range(5): parts.append(f"item{i}") result = ", ".join(parts) print(result)
item0, item1, item2, item3, item4
Try It Yourself
Given data = "John:Doe:35:Engineer:NYC", split it by ":" and create a formatted sentence with the information.
Multiline Strings
Use triple quotes (""" or ''') to create strings that span multiple lines.
poem = """Roses are red, Violets are blue, Python is great, And so are you!""" print(poem)
Roses are red,
Violets are blue,
Python is great,
And so are you!
splitlines()
The splitlines() method splits a multiline string into a list of lines.
text = """Line one Line two Line three""" lines = text.splitlines() print(lines) print(f"Number of lines: {len(lines)}")
['Line one', 'Line two', 'Line three']
Number of lines: 3
Raw Strings
Raw String: A string prefixed with
r that treats backslashes as literal characters instead of escape characters. Useful for file paths and regular expressions.
# Regular string: \n is a newline print("Hello\nWorld") # Raw string: \n is literally \n print(r"Hello\nWorld") # Useful for Windows file paths path = r"C:\Users\name\documents\file.txt" print(path)
Hello
World
Hello\nWorld
C:\Users\name\documents\file.txt
Common Escape Characters
\n-- newline\t-- tab\\-- literal backslash\"-- literal double quote inside double-quoted string\'-- literal single quote inside single-quoted string
Check Your Understanding
- What does
"a,b,c".split(",")return? - What does
"-".join(["2024", "03", "15"])return? - How do you split a string into individual lines?
- What does the
rprefix do before a string?
["a", "b", "c"]"2024-03-15"- Use
.splitlines()or.split("\n") - It makes the string "raw" -- backslashes are treated as literal characters, not escape sequences
Key Takeaways
split()breaks a string into a list;join()combines a list into a stringsplit()without arguments splits on any whitespace and strips extra spacesjoin()is called on the separator:",".join(list)- Use
split()andjoin()together to clean up and transform text - Triple quotes create multiline strings;
splitlines()breaks them apart - Raw strings (
r"...") treat backslashes literally -- useful for paths - f-strings are the recommended way to build strings with variables