SQL DISTINCT and COUNT DISTINCT: Find Unique Values
DISTINCT removes duplicate rows from your results, while COUNT DISTINCT counts unique values. These are essential for data analysis and quality checks.
DISTINCT - Remove Duplicates
Use DISTINCT to return only unique rows:
-- Get list of unique cities
SELECT DISTINCT city
FROM customers
ORDER BY city;
-- Unique combinations of multiple columns
SELECT DISTINCT country, state, city
FROM customers
ORDER BY country, state, city;
COUNT DISTINCT - Count Unique Values
Find how many unique values exist in a column:
-- How many unique customers placed orders?
SELECT COUNT(DISTINCT customer_id) as unique_customers
FROM orders;
-- Total orders vs unique customers
SELECT
COUNT(*) as total_orders,
COUNT(DISTINCT customer_id) as unique_customers,
COUNT(*) / COUNT(DISTINCT customer_id) as orders_per_customer
FROM orders;
DISTINCT in Different Contexts
Find Unique Values in a Column
-- What payment methods do we accept?
SELECT DISTINCT payment_method
FROM orders
WHERE order_date >= '2024-01-01';
Unique Combinations
-- Find all product-category pairs
SELECT DISTINCT
category,
subcategory
FROM products
WHERE active = true
ORDER BY category, subcategory;
COUNT DISTINCT with GROUP BY
Combine COUNT DISTINCT with GROUP BY for powerful analysis:
Customer Reach by Product
SELECT
product_name,
COUNT(*) as times_ordered,
COUNT(DISTINCT customer_id) as unique_buyers,
SUM(quantity) as total_units_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY product_name
HAVING COUNT(DISTINCT customer_id) >= 10
ORDER BY unique_buyers DESC;
Daily Active Users
SELECT
DATE(login_time) as date,
COUNT(DISTINCT user_id) as daily_active_users,
COUNT(*) as total_logins,
COUNT(*) / COUNT(DISTINCT user_id) as avg_logins_per_user
FROM user_activity
WHERE login_time >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY DATE(login_time)
ORDER BY date;
Multiple COUNT DISTINCT in One Query
SELECT
DATE_TRUNC('month', order_date) as month,
COUNT(*) as total_orders,
COUNT(DISTINCT customer_id) as unique_customers,
COUNT(DISTINCT product_id) as unique_products,
SUM(total) as revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
Finding Duplicates
Use DISTINCT to identify and analyze duplicate records:
Detect Duplicate Emails
SELECT
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
Find Unique Values Excluding Duplicates
-- Get clean list without duplicates
SELECT DISTINCT email
FROM users
WHERE email IS NOT NULL
AND email LIKE '%@%.%'
ORDER BY email;
Practical Business Use Cases
Customer Acquisition Report
SELECT
DATE_TRUNC('week', signup_date) as week,
COUNT(*) as signups,
COUNT(DISTINCT email) as unique_emails,
COUNT(DISTINCT referral_source) as marketing_channels,
COUNT(*) - COUNT(DISTINCT email) as duplicate_signups
FROM users
WHERE signup_date >= DATE_SUB(CURDATE(), INTERVAL 12 WEEK)
GROUP BY DATE_TRUNC('week', signup_date)
ORDER BY week;
Product Catalog Analysis
SELECT
category,
COUNT(*) as total_products,
COUNT(DISTINCT brand) as unique_brands,
COUNT(DISTINCT SUBSTRING(sku, 1, 3)) as sku_prefixes,
MIN(price) as min_price,
MAX(price) as max_price
FROM products
GROUP BY category
ORDER BY total_products DESC;
Cross-Sell Analysis
-- Customers who bought multiple product categories
SELECT
customer_id,
COUNT(DISTINCT category) as categories_purchased,
COUNT(DISTINCT product_id) as unique_products,
COUNT(*) as total_purchases,
SUM(total) as lifetime_value
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY customer_id
HAVING COUNT(DISTINCT category) >= 3
ORDER BY lifetime_value DESC;
DISTINCT vs GROUP BY
These queries produce the same result:
-- Using DISTINCT
SELECT DISTINCT category
FROM products;
-- Using GROUP BY
SELECT category
FROM products
GROUP BY category;
Use GROUP BY when you also need aggregates (COUNT, SUM, etc.). Use DISTINCT for simple deduplication.
Performance Note: COUNT DISTINCT can be slow on large datasets because it must track all unique values. Consider using approximate counting functions like HyperLogLog for massive datasets where exact counts aren't critical.
← Back to SQL Tips