Useful Data Tips

Essential SQL String Functions for Data Cleaning

⏱️ 30 sec read 📊 SQL

String functions are essential for cleaning, formatting, and transforming text data. Here are the most useful ones you'll use daily.

Combining Strings

CONCAT / || (Concatenation)

-- Standard SQL
SELECT CONCAT(first_name, ' ', last_name) as full_name
FROM users;

-- PostgreSQL, SQLite
SELECT first_name || ' ' || last_name as full_name
FROM users;

Extracting Parts of Strings

SUBSTRING

-- Extract area code from phone
SELECT SUBSTRING(phone, 1, 3) as area_code
FROM customers;

-- Get first initial
SELECT SUBSTRING(first_name, 1, 1) as initial
FROM users;

LEFT / RIGHT

SELECT
    LEFT(product_code, 3) as category,
    RIGHT(product_code, 4) as item_number
FROM products;

Cleaning Strings

TRIM, LTRIM, RTRIM

-- Remove leading/trailing spaces
SELECT TRIM(email) as clean_email
FROM users;

-- Remove specific characters
SELECT TRIM('$' FROM price_text) as clean_price
FROM products;

REPLACE

-- Standardize phone formats
SELECT REPLACE(REPLACE(phone, '-', ''), ' ', '') as phone_clean
FROM customers;

-- Fix common typos
UPDATE products
SET description = REPLACE(description, 'teh', 'the')
WHERE description LIKE '%teh%';

Changing Case

SELECT
    UPPER(email) as email_upper,
    LOWER(email) as email_lower,
    INITCAP(name) as name_titlecase  -- PostgreSQL
FROM users;

Finding and Searching

LENGTH

-- Find invalid phone numbers
SELECT phone
FROM customers
WHERE LENGTH(REPLACE(REPLACE(phone, '-', ''), ' ', '')) != 10;

POSITION / CHARINDEX / INSTR

-- PostgreSQL/MySQL
SELECT email, POSITION('@' IN email) as at_position
FROM users;

-- Find domain
SELECT SUBSTRING(email, POSITION('@' IN email) + 1) as domain
FROM users;

Practical Data Cleaning Example

-- Clean and standardize email addresses
SELECT
    TRIM(LOWER(email)) as cleaned_email,
    CASE
        WHEN email LIKE '%@%' AND email LIKE '%.%' THEN 'Valid'
        ELSE 'Invalid'
    END as email_status
FROM users;

Performance Note: String functions can be slow on large datasets. Consider adding computed columns or materialized views for frequently used transformations. Always use LIKE or pattern matching with leading wildcards sparingly.

← Back to SQL Tips