How to Count Duplicates in SQL

⏱️ 35 sec read 🗄️ SQL

The standard pattern for finding duplicates in SQL is GROUP BY ... HAVING COUNT(*) > 1 on the columns that should be unique. To tag each duplicate row (so you can delete or inspect them) you use a window function — usually ROW_NUMBER(). Both work in every major dialect.

How to Count Duplicates by a Single Column

SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

-- email                | duplicate_count
-- [email protected]    | 4
-- [email protected]      | 2

HAVING filters after aggregation — that's the part that drops unique rows from the result.

How to Count Duplicates Across Multiple Columns

List every column that defines "the same row" in the GROUP BY:

SELECT first_name, last_name, birthdate, COUNT(*) AS dupe_count
FROM customers
GROUP BY first_name, last_name, birthdate
HAVING COUNT(*) > 1;

Total Number of Duplicate Rows

Two slightly different questions — answer carefully:

-- How many DUPLICATE GROUPS are there?
SELECT COUNT(*) AS duplicate_groups
FROM (
    SELECT email
    FROM users
    GROUP BY email
    HAVING COUNT(*) > 1
) AS dupes;

-- How many EXTRA rows are there (i.e., rows you'd delete to make email unique)?
SELECT SUM(cnt - 1) AS extra_rows
FROM (
    SELECT email, COUNT(*) AS cnt
    FROM users
    GROUP BY email
    HAVING COUNT(*) > 1
) AS dupes;

How to See the Actual Duplicate Rows (Not Just the Count)

Use ROW_NUMBER() to tag each row within its duplicate group. Rows where rn > 1 are the extras:

SELECT *
FROM (
    SELECT
        u.*,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY created_at
        ) AS rn
    FROM users u
) ranked
WHERE rn > 1;

The ORDER BY in the window decides which copy is "rn = 1" (the keeper). Pick the column that makes sense — usually created_at ASC to keep the original.

Delete Duplicates and Keep One Copy

Same idea, wrapped in a DELETE. Always run the SELECT version first.

-- Postgres / SQL Server / modern MySQL syntax (with CTE):
WITH ranked AS (
    SELECT
        ctid,                          -- Postgres row id; use a PK in other dialects
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY created_at
        ) AS rn
    FROM users
)
DELETE FROM users
WHERE ctid IN (SELECT ctid FROM ranked WHERE rn > 1);

In SQL Server replace ctid with the primary key. In MySQL 8+ this same CTE syntax works.

Count Duplicates Without GROUP BY (Window Function)

Sometimes you want every row tagged with its group size, not collapsed. COUNT(*) OVER (PARTITION BY ...) does that:

SELECT
    email,
    created_at,
    COUNT(*) OVER (PARTITION BY email) AS rows_with_this_email
FROM users
ORDER BY rows_with_this_email DESC, email;

Filter to rows_with_this_email > 1 with an outer query if you only want duplicates.

The "Approximately Duplicate" Case

If duplicates differ by whitespace or case, normalize first:

SELECT
    LOWER(TRIM(email)) AS normalized_email,
    COUNT(*) AS dupe_count
FROM users
GROUP BY LOWER(TRIM(email))
HAVING COUNT(*) > 1;

Common Pitfalls

Pro Tip: Before any DELETE, run the same query as a SELECT wrapped in a transaction. BEGIN; DELETE ...; SELECT COUNT(*) FROM users; — verify the new count, then COMMIT or ROLLBACK. A duplicate-cleanup query that targets the wrong column has erased real data more than once.

← Back to SQL Tips