How to Count Duplicates in SQL (GROUP BY and Window Functions)

Q: How to Count Duplicates in SQL

The standard pattern for finding duplicates in SQL is GROUP BY ... HAVING COUNT(*) > 1 on the columns that should be unique. To tag each duplicate row (so you can delete or inspect them) you use a window function — usually ROW_NUMBER(). Both work in every major dialect.

Q: How to Count Duplicates by a Single Column

SELECT email, COUNT(*) AS duplicate_count FROM users GROUP BY email HAVING COUNT(*) > 1 ORDER BY duplicate_count DESC; -- email | duplicate_count -- alice@example.com | 4 -- bob@example.com | 2 HAVING filters after aggregation — that's the part that drops unique rows from the result.

Q: How to Count Duplicates Across Multiple Columns

List every column that defines "the same row" in the GROUP BY:

The standard pattern for finding duplicates in SQL is GROUP BY ... HAVING COUNT(*) > 1 on the columns that should be unique. To tag each duplicate row (so you can delete or inspect them) you use a window function — usually ROW_NUMBER(). Both work in every major dialect.

How to Count Duplicates by a Single Column

SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

-- email                | duplicate_count
-- [email protected]    | 4
-- [email protected]      | 2

HAVING filters after aggregation — that's the part that drops unique rows from the result.

How to Count Duplicates Across Multiple Columns

List every column that defines "the same row" in the GROUP BY:

SELECT first_name, last_name, birthdate, COUNT(*) AS dupe_count
FROM customers
GROUP BY first_name, last_name, birthdate
HAVING COUNT(*) > 1;

Total Number of Duplicate Rows

Two slightly different questions — answer carefully:

-- How many DUPLICATE GROUPS are there?
SELECT COUNT(*) AS duplicate_groups
FROM (
    SELECT email
    FROM users
    GROUP BY email
    HAVING COUNT(*) > 1
) AS dupes;

-- How many EXTRA rows are there (i.e., rows you'd delete to make email unique)?
SELECT SUM(cnt - 1) AS extra_rows
FROM (
    SELECT email, COUNT(*) AS cnt
    FROM users
    GROUP BY email
    HAVING COUNT(*) > 1
) AS dupes;

How to See the Actual Duplicate Rows (Not Just the Count)

Use ROW_NUMBER() to tag each row within its duplicate group. Rows where rn > 1 are the extras:

SELECT *
FROM (
    SELECT
        u.*,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY created_at
        ) AS rn
    FROM users u
) ranked
WHERE rn > 1;

The ORDER BY in the window decides which copy is "rn = 1" (the keeper). Pick the column that makes sense — usually created_at ASC to keep the original.

Delete Duplicates and Keep One Copy

Same idea, wrapped in a DELETE. Always run the SELECT version first.

-- Postgres / SQL Server / modern MySQL syntax (with CTE):
WITH ranked AS (
    SELECT
        ctid,                          -- Postgres row id; use a PK in other dialects
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY created_at
        ) AS rn
    FROM users
)
DELETE FROM users
WHERE ctid IN (SELECT ctid FROM ranked WHERE rn > 1);

In SQL Server replace ctid with the primary key. In MySQL 8+ this same CTE syntax works.

Count Duplicates Without GROUP BY (Window Function)

Sometimes you want every row tagged with its group size, not collapsed. COUNT(*) OVER (PARTITION BY ...) does that:

SELECT
    email,
    created_at,
    COUNT(*) OVER (PARTITION BY email) AS rows_with_this_email
FROM users
ORDER BY rows_with_this_email DESC, email;

Filter to rows_with_this_email > 1 with an outer query if you only want duplicates.

The "Approximately Duplicate" Case

If duplicates differ by whitespace or case, normalize first:

SELECT
    LOWER(TRIM(email)) AS normalized_email,
    COUNT(*) AS dupe_count
FROM users
GROUP BY LOWER(TRIM(email))
HAVING COUNT(*) > 1;

Common Pitfalls

Using WHERE COUNT(*) > 1: illegal — WHERE runs before aggregation. Use HAVING.
Forgetting NULL semantics: NULL = NULL is unknown, but GROUP BY treats all NULLs as the same group. Two rows with NULL email will be flagged as duplicates of each other.
SELECT * with GROUP BY: not portable. List the columns you grouped on.
Deleting without a tiebreaker: if your ORDER BY in the window can produce ties, the "kept" row is non-deterministic. Add a unique column as a tiebreaker.

Pro Tip: Before any DELETE, run the same query as a SELECT wrapped in a transaction. BEGIN; DELETE ...; SELECT COUNT(*) FROM users; — verify the new count, then COMMIT or ROLLBACK. A duplicate-cleanup query that targets the wrong column has erased real data more than once.

← Back to SQL Tips

Related Tips

Written by Aaron Pitt · Useful Data Tips