How to Count Duplicates in SQL
The standard pattern for finding duplicates in SQL is GROUP BY ... HAVING COUNT(*) > 1 on the columns that should be unique. To tag each duplicate row (so you can delete or inspect them) you use a window function — usually ROW_NUMBER(). Both work in every major dialect.
How to Count Duplicates by a Single Column
SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
-- email | duplicate_count
-- [email protected] | 4
-- [email protected] | 2
HAVING filters after aggregation — that's the part that drops unique rows from the result.
How to Count Duplicates Across Multiple Columns
List every column that defines "the same row" in the GROUP BY:
SELECT first_name, last_name, birthdate, COUNT(*) AS dupe_count
FROM customers
GROUP BY first_name, last_name, birthdate
HAVING COUNT(*) > 1;
Total Number of Duplicate Rows
Two slightly different questions — answer carefully:
-- How many DUPLICATE GROUPS are there?
SELECT COUNT(*) AS duplicate_groups
FROM (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(*) > 1
) AS dupes;
-- How many EXTRA rows are there (i.e., rows you'd delete to make email unique)?
SELECT SUM(cnt - 1) AS extra_rows
FROM (
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1
) AS dupes;
How to See the Actual Duplicate Rows (Not Just the Count)
Use ROW_NUMBER() to tag each row within its duplicate group. Rows where rn > 1 are the extras:
SELECT *
FROM (
SELECT
u.*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at
) AS rn
FROM users u
) ranked
WHERE rn > 1;
The ORDER BY in the window decides which copy is "rn = 1" (the keeper). Pick the column that makes sense — usually created_at ASC to keep the original.
Delete Duplicates and Keep One Copy
Same idea, wrapped in a DELETE. Always run the SELECT version first.
-- Postgres / SQL Server / modern MySQL syntax (with CTE):
WITH ranked AS (
SELECT
ctid, -- Postgres row id; use a PK in other dialects
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at
) AS rn
FROM users
)
DELETE FROM users
WHERE ctid IN (SELECT ctid FROM ranked WHERE rn > 1);
In SQL Server replace ctid with the primary key. In MySQL 8+ this same CTE syntax works.
Count Duplicates Without GROUP BY (Window Function)
Sometimes you want every row tagged with its group size, not collapsed. COUNT(*) OVER (PARTITION BY ...) does that:
SELECT
email,
created_at,
COUNT(*) OVER (PARTITION BY email) AS rows_with_this_email
FROM users
ORDER BY rows_with_this_email DESC, email;
Filter to rows_with_this_email > 1 with an outer query if you only want duplicates.
The "Approximately Duplicate" Case
If duplicates differ by whitespace or case, normalize first:
SELECT
LOWER(TRIM(email)) AS normalized_email,
COUNT(*) AS dupe_count
FROM users
GROUP BY LOWER(TRIM(email))
HAVING COUNT(*) > 1;
Common Pitfalls
- Using
WHERE COUNT(*) > 1: illegal —WHEREruns before aggregation. UseHAVING. - Forgetting NULL semantics:
NULL = NULLis unknown, butGROUP BYtreats all NULLs as the same group. Two rows with NULL email will be flagged as duplicates of each other. SELECT *withGROUP BY: not portable. List the columns you grouped on.- Deleting without a tiebreaker: if your
ORDER BYin the window can produce ties, the "kept" row is non-deterministic. Add a unique column as a tiebreaker.
Pro Tip: Before any DELETE, run the same query as a SELECT wrapped in a transaction. BEGIN; DELETE ...; SELECT COUNT(*) FROM users; — verify the new count, then COMMIT or ROLLBACK. A duplicate-cleanup query that targets the wrong column has erased real data more than once.