Regular Expressions in Python
Regular expressions (regex) are patterns used to match character combinations in strings. Python’s re
module provides functions for working with regular expressions.
Basic Pattern Matching
The re.search()
function searches for a pattern in a string and returns a match object if found:
import re
text = "Hello, my phone number is 123-456-7890"
pattern = r"\d{3}-\d{3}-\d{4}" # Pattern for phone number
match = re.search(pattern, text)
if match:
print(match.group()) # Output: 123-456-7890
Finding All Matches
The re.findall()
function returns all matches as a list:
import re
text = "Contact us at alice@email.com or bob@company.org"
pattern = r"\w+@\w+\.\w+" # Simple email pattern
emails = re.findall(pattern, text)
print(emails) # Output: ['alice@email.com', 'bob@company.org']
Common Pattern Characters
Here are the most useful regex characters:
import re
# . matches any character except newline
re.search(r"h.t", "hat") # Matches "hat"
# * matches zero or more of the preceding character
re.search(r"colou*r", "color") # Matches "color"
# + matches one or more of the preceding character
re.search(r"go+d", "good") # Matches "good"
# ? matches zero or one of the preceding character
re.search(r"colou?r", "colour") # Matches "colour"
# \d matches any digit (0-9)
re.search(r"\d+", "I have 5 apples") # Matches "5"
# \w matches any word character (letters, digits, underscore)
re.search(r"\w+", "hello_world") # Matches "hello_world"
# \s matches any whitespace character
re.search(r"\s+", "hello world") # Matches the space
Character Classes and Ranges
Square brackets define character classes:
import re
# Match vowels
re.findall(r"[aeiou]", "hello world") # Output: ['e', 'o', 'o']
# Match digits from 1 to 5
re.findall(r"[1-5]", "123456789") # Output: ['1', '2', '3', '4', '5']
# Match uppercase letters
re.findall(r"[A-Z]", "Hello World") # Output: ['H', 'W']
# Match anything except digits
re.findall(r"[^\d]", "abc123") # Output: ['a', 'b', 'c']
Substitution
The re.sub()
function replaces matches with a replacement string:
import re
text = "The quick brown fox"
# Replace 'fox' with 'dog'
result = re.sub(r"fox", "dog", text)
print(result) # Output: The quick brown dog
# Remove all digits
text = "abc123def456"
result = re.sub(r"\d", "", text)
print(result) # Output: abcdef
Groups and Capturing
Parentheses create groups to capture parts of the match:
import re
text = "John Doe was born on 1990-05-15"
pattern = r"(\w+) (\w+) was born on (\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, text)
if match:
print(match.group(1)) # Output: John (first name)
print(match.group(2)) # Output: Doe (last name)
print(match.group(3)) # Output: 1990 (year)
Practical Examples
Validating Email
import re
def is_valid_email(email):
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
return bool(re.match(pattern, email))
print(is_valid_email("user@example.com")) # Output: True
print(is_valid_email("invalid.email")) # Output: False
Extracting Numbers from Text
import re
text = "The price is $29.99 and shipping costs $5.50"
numbers = re.findall(r"\d+\.\d+", text)
print(numbers) # Output: ['29.99', '5.50']
Cleaning Text
import re
text = "Hello!!! How are you??? Fine."
# Remove multiple punctuation and extra spaces
cleaned = re.sub(r"[!?]+", "", text) # Remove multiple ! or ?
cleaned = re.sub(r"\s+", " ", cleaned) # Replace multiple spaces with single space
print(cleaned.strip()) # Output: Hello How are you Fine.
Next example: Args and Kwargs