Regular Expressions in Python Basics
Regular expressions (regex) are powerful patterns for matching and manipulating text. Python's re module provides tools for searching, extracting, and replacing text based on patterns.
Basic Pattern Matching
import re
text = "My email is [email protected]"
# Search for pattern
match = re.search(r'[email protected]', text)
if match:
print("Email found!") # Output: Email found!
# Check if pattern is at start
if re.match(r'My', text):
print("Starts with 'My'") # Output: Starts with 'My'
Common Regex Patterns
# Metacharacters
# . = any character (except newline)
# ^ = start of string
# $ = end of string
# * = 0 or more repetitions
# + = 1 or more repetitions
# ? = 0 or 1 repetition
# {n} = exactly n repetitions
# [abc] = any character in brackets
# \d = digit (0-9)
# \w = word character (a-z, A-Z, 0-9, _)
# \s = whitespace
Finding Emails
text = "Contact us at [email protected] or [email protected]"
# Find all emails
emails = re.findall(r'\w+@\w+\.\w+', text)
print(emails) # ['[email protected]', '[email protected]']
Extracting Phone Numbers
text = "Call me at 555-123-4567 or 555.987.6543"
# Find phone numbers (flexible format)
phones = re.findall(r'\d{3}[-\.]\d{3}[-\.]\d{4}', text)
print(phones) # ['555-123-4567', '555.987.6543']
Groups and Capturing
text = "John Doe, age 30"
# Capture groups with parentheses
match = re.search(r'(\w+) (\w+), age (\d+)', text)
if match:
first_name = match.group(1) # 'John'
last_name = match.group(2) # 'Doe'
age = match.group(3) # '30'
Search and Replace
text = "The price is $100 and $200"
# Replace all dollar amounts
result = re.sub(r'\$\d+', '$REDACTED', text)
print(result) # "The price is $REDACTED and $REDACTED"
# Replace with function
def double_price(match):
amount = int(match.group(0)[1:]) # Remove $, convert to int
return f"${amount * 2}"
result = re.sub(r'\$\d+', double_price, text)
print(result) # "The price is $200 and $400"
Splitting Text
# Split on multiple delimiters
text = "apple,banana;orange|grape"
fruits = re.split(r'[,;|]', text)
print(fruits) # ['apple', 'banana', 'orange', 'grape']
Validation Examples
Email Validation
def is_valid_email(email):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
print(is_valid_email("[email protected]")) # True
print(is_valid_email("invalid.email")) # False
Password Strength Check
def check_password(password):
# At least 8 chars, 1 uppercase, 1 lowercase, 1 digit
if len(password) < 8:
return False
if not re.search(r'[A-Z]', password):
return False
if not re.search(r'[a-z]', password):
return False
if not re.search(r'\d', password):
return False
return True
print(check_password("Abc12345")) # True
print(check_password("weak")) # False
Compiled Patterns (Better Performance)
# Compile once, use many times
email_pattern = re.compile(r'\w+@\w+\.\w+')
text1 = "Email: [email protected]"
text2 = "Contact: [email protected]"
print(email_pattern.search(text1).group()) # [email protected]
print(email_pattern.search(text2).group()) # [email protected]
Common Use Cases
- Validating input formats (emails, phone numbers, ZIP codes)
- Extracting data from text (URLs, dates, prices)
- Cleaning and normalizing text
- Log file parsing and analysis
Pro Tip: Use raw strings (r'pattern') for regex patterns to avoid escaping backslashes. Compile patterns you'll use repeatedly for better performance. Test complex patterns at regex101.com!
← Back to Python Tips