Useful Data Tips

SQL DISTINCT and COUNT DISTINCT: Find Unique Values

⏱️ 26 sec read 📊 SQL

DISTINCT removes duplicate rows from your results, while COUNT DISTINCT counts unique values. These are essential for data analysis and quality checks.

DISTINCT - Remove Duplicates

Use DISTINCT to return only unique rows:

-- Get list of unique cities
SELECT DISTINCT city
FROM customers
ORDER BY city;

-- Unique combinations of multiple columns
SELECT DISTINCT country, state, city
FROM customers
ORDER BY country, state, city;

COUNT DISTINCT - Count Unique Values

Find how many unique values exist in a column:

-- How many unique customers placed orders?
SELECT COUNT(DISTINCT customer_id) as unique_customers
FROM orders;

-- Total orders vs unique customers
SELECT
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    COUNT(*) / COUNT(DISTINCT customer_id) as orders_per_customer
FROM orders;

DISTINCT in Different Contexts

Find Unique Values in a Column

-- What payment methods do we accept?
SELECT DISTINCT payment_method
FROM orders
WHERE order_date >= '2024-01-01';

Unique Combinations

-- Find all product-category pairs
SELECT DISTINCT
    category,
    subcategory
FROM products
WHERE active = true
ORDER BY category, subcategory;

COUNT DISTINCT with GROUP BY

Combine COUNT DISTINCT with GROUP BY for powerful analysis:

Customer Reach by Product

SELECT
    product_name,
    COUNT(*) as times_ordered,
    COUNT(DISTINCT customer_id) as unique_buyers,
    SUM(quantity) as total_units_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY product_name
HAVING COUNT(DISTINCT customer_id) >= 10
ORDER BY unique_buyers DESC;

Daily Active Users

SELECT
    DATE(login_time) as date,
    COUNT(DISTINCT user_id) as daily_active_users,
    COUNT(*) as total_logins,
    COUNT(*) / COUNT(DISTINCT user_id) as avg_logins_per_user
FROM user_activity
WHERE login_time >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY DATE(login_time)
ORDER BY date;

Multiple COUNT DISTINCT in One Query

SELECT
    DATE_TRUNC('month', order_date) as month,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    COUNT(DISTINCT product_id) as unique_products,
    SUM(total) as revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Finding Duplicates

Use DISTINCT to identify and analyze duplicate records:

Detect Duplicate Emails

SELECT
    email,
    COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

Find Unique Values Excluding Duplicates

-- Get clean list without duplicates
SELECT DISTINCT email
FROM users
WHERE email IS NOT NULL
    AND email LIKE '%@%.%'
ORDER BY email;

Practical Business Use Cases

Customer Acquisition Report

SELECT
    DATE_TRUNC('week', signup_date) as week,
    COUNT(*) as signups,
    COUNT(DISTINCT email) as unique_emails,
    COUNT(DISTINCT referral_source) as marketing_channels,
    COUNT(*) - COUNT(DISTINCT email) as duplicate_signups
FROM users
WHERE signup_date >= DATE_SUB(CURDATE(), INTERVAL 12 WEEK)
GROUP BY DATE_TRUNC('week', signup_date)
ORDER BY week;

Product Catalog Analysis

SELECT
    category,
    COUNT(*) as total_products,
    COUNT(DISTINCT brand) as unique_brands,
    COUNT(DISTINCT SUBSTRING(sku, 1, 3)) as sku_prefixes,
    MIN(price) as min_price,
    MAX(price) as max_price
FROM products
GROUP BY category
ORDER BY total_products DESC;

Cross-Sell Analysis

-- Customers who bought multiple product categories
SELECT
    customer_id,
    COUNT(DISTINCT category) as categories_purchased,
    COUNT(DISTINCT product_id) as unique_products,
    COUNT(*) as total_purchases,
    SUM(total) as lifetime_value
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY customer_id
HAVING COUNT(DISTINCT category) >= 3
ORDER BY lifetime_value DESC;

DISTINCT vs GROUP BY

These queries produce the same result:

-- Using DISTINCT
SELECT DISTINCT category
FROM products;

-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;

Use GROUP BY when you also need aggregates (COUNT, SUM, etc.). Use DISTINCT for simple deduplication.

Performance Note: COUNT DISTINCT can be slow on large datasets because it must track all unique values. Consider using approximate counting functions like HyperLogLog for massive datasets where exact counts aren't critical.

← Back to SQL Tips