Useful Data Tips

Regular Expressions in Python Basics

⏱️ 31 sec read 🐍 Python

Regular expressions (regex) are powerful patterns for matching and manipulating text. Python's re module provides tools for searching, extracting, and replacing text based on patterns.

Basic Pattern Matching

import re

text = "My email is [email protected]"

# Search for pattern
match = re.search(r'[email protected]', text)
if match:
    print("Email found!")  # Output: Email found!

# Check if pattern is at start
if re.match(r'My', text):
    print("Starts with 'My'")  # Output: Starts with 'My'

Common Regex Patterns

# Metacharacters
# . = any character (except newline)
# ^ = start of string
# $ = end of string
# * = 0 or more repetitions
# + = 1 or more repetitions
# ? = 0 or 1 repetition
# {n} = exactly n repetitions
# [abc] = any character in brackets
# \d = digit (0-9)
# \w = word character (a-z, A-Z, 0-9, _)
# \s = whitespace

Finding Emails

text = "Contact us at [email protected] or [email protected]"

# Find all emails
emails = re.findall(r'\w+@\w+\.\w+', text)
print(emails)  # ['[email protected]', '[email protected]']

Extracting Phone Numbers

text = "Call me at 555-123-4567 or 555.987.6543"

# Find phone numbers (flexible format)
phones = re.findall(r'\d{3}[-\.]\d{3}[-\.]\d{4}', text)
print(phones)  # ['555-123-4567', '555.987.6543']

Groups and Capturing

text = "John Doe, age 30"

# Capture groups with parentheses
match = re.search(r'(\w+) (\w+), age (\d+)', text)
if match:
    first_name = match.group(1)   # 'John'
    last_name = match.group(2)    # 'Doe'
    age = match.group(3)          # '30'

Search and Replace

text = "The price is $100 and $200"

# Replace all dollar amounts
result = re.sub(r'\$\d+', '$REDACTED', text)
print(result)  # "The price is $REDACTED and $REDACTED"

# Replace with function
def double_price(match):
    amount = int(match.group(0)[1:])  # Remove $, convert to int
    return f"${amount * 2}"

result = re.sub(r'\$\d+', double_price, text)
print(result)  # "The price is $200 and $400"

Splitting Text

# Split on multiple delimiters
text = "apple,banana;orange|grape"
fruits = re.split(r'[,;|]', text)
print(fruits)  # ['apple', 'banana', 'orange', 'grape']

Validation Examples

Email Validation

def is_valid_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

print(is_valid_email("[email protected]"))  # True
print(is_valid_email("invalid.email"))       # False

Password Strength Check

def check_password(password):
    # At least 8 chars, 1 uppercase, 1 lowercase, 1 digit
    if len(password) < 8:
        return False
    if not re.search(r'[A-Z]', password):
        return False
    if not re.search(r'[a-z]', password):
        return False
    if not re.search(r'\d', password):
        return False
    return True

print(check_password("Abc12345"))  # True
print(check_password("weak"))       # False

Compiled Patterns (Better Performance)

# Compile once, use many times
email_pattern = re.compile(r'\w+@\w+\.\w+')

text1 = "Email: [email protected]"
text2 = "Contact: [email protected]"

print(email_pattern.search(text1).group())  # [email protected]
print(email_pattern.search(text2).group())  # [email protected]

Common Use Cases

Pro Tip: Use raw strings (r'pattern') for regex patterns to avoid escaping backslashes. Compile patterns you'll use repeatedly for better performance. Test complex patterns at regex101.com!

← Back to Python Tips