Find Duplicate Records in Table SQL

Find Duplicate Records in Table SQL

Added by Nasir Mahmood updated on Friday, November 17, 2017

Problem:

Having duplicate records in a table is not a problem, but finding duplicate records using SQL query is a problem which we face a lot of time while working with sql databases.

In this tutorial we will discuss different methods to get duplicate records in table. For this tutorial we are creating a Test table having columns ID, Name and Email by following SQL script

CREATE TABLE Test
(
    [ID] [int] NOT NULL,
    [Name] [varchar](30) NULL,
    [Email] [varchar](30) NULL
)
				

Now add some dummy data for testing purpose as shown below.

INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (1, N'Name 1', N'e1@a.com')
INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (2, N'Name 2', N'e1@a.com')
INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (3, N'Name 1', N'e1@a.com')
INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (4, N'Name 4', N'e2@a.com')
INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (5, N'Name 5', N'e2@a.com')
INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (6, N'Name 2', N'e2@a.com')
INSERT [dbo].[Test] ([ID], [Name], [Email]) VALUES (7, N'Name 5', N'e3@a.com')				
				

After executing command

SELECT * FROM TEST
					

we get result as shown

We can see from result that we have some duplicate records in table, now we want to isolate these duplicate records.

select all from table

GROUP BY CLAUSE:

The GROUP BY clause will gather all of the rows together that contain data in the specified column(s) and will allow aggregate functions to be performed on the one or more columns. One row is returned for each group.

Now by using GROUP BY clause we can find out how many records are duplicated in Name column.

SELECT NAME,COUNT(NAME) AS C0UNT FROM TEST GROUP BY NAME				
				

group by name

Similarly by executing following SQL command we can find how many duplicate records in Email column.

group by email

HAVING CLAUSE:

Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause.

The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.

By executing following SQL queries we can find duplicate records in Name column as shown

duplicate name records

By executing following SQL queries we can find duplicate records in Email column as shown

duplicate email records

Find Duplicate Records Having more than one column:

Now what if we want to find duplicate records with both Name and Email columns, by using following SQL query we can find our required result.

SELECT NAME,EMAIL,COUNT(NAME) AS C0UNT FROM TEST
GROUP BY NAME,EMAIL HAVING COUNT(NAME) > 1				
				

duplicate name email records

By Using Inner Join:

Another method to get duplicate records in Name and Email columns is using inner join as following SQL query

SELECT T1.* FROM TEST T1 INNER JOIN TEST AS T2
 ON T1.NAME=T2.NAME AND T1.EMAIL=T2.EMAIL 
 WHERE T1.ID <> T2.ID ORDER BY T1.NAME				
				

duplicate records using inner join

By Using Ranking Functions:

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.

Another method to get duplicate records in Name and Email columns is using ranking functions as following SQL query

WITH CTE AS
(
 SELECT *,C0UNT=ROW_NUMBER() 
  OVER (PARTITION BY NAME,EMAIL ORDER BY NAME) FROM TEST 
)
SELECT * FROM CTE WHERE C0UNT > 1				
				

duplicate records using partition

Related Tags

About

29 Tutorials
25 Snippets
6 Products

More

Contact Us

Contact us

Stay Connected