Friday, April 17, 2015

Searching and Removing Duplicate values using SQL Server

Duplicate values is always a nuisance for any database. Duplicate values can sometimes affect your analysis and result in wrong results. We can follow below steps to search and remove duplicate rows in database.

I am using SQL Server to search and remove duplicate values.

The result of query select * from Student1 is as shown below
 











In the table shown above, row 4 is similar to row 1. Both contain same SName and SCity. They both have different StdNo.

Searching Duplicate values:

Let us see how many values are duplicate in the table:

select Sname, SCity,count(*) as count from  Student1 group by Sname, Scity;












We can see that Sname: Joe and SCity: Portland appears twice.

To see only duplicate values, we can use below code:

select Sname, SCity,count(*) as count from  Student1 group by Sname, Scity having count(*)  > 1;









Removing Duplicate Values:

To remove duplicate values you will need atleast one column which is unique. In our case it was StdNo. In case you do not have any of such columns, you can use row number as a unique column as shown below:

select ROW_NUMBER() over (order by stdno) as Rownum,stdno,SName, SCity from Student1;














Now, you have one column which has unique values. Now you can proceed which removal of duplicate rows as shown below:

delete from Student1 where stdno not in (select min(stdno) from student1 group by Sname, Scity)

or you can add column rownum to your table and then remove duplicate values as shown below:

alter table Student1
add rownum int identity(1,1)

The table will have an extra column rownum.
















Now,  use the delete statement

delete from Student1 where rownum not in (select min(rownum) from student1 group by Sname, Scity)

Finally the duplicate data gets removed as shown below:













Now, you can drop the extra rownum column created as shown below :

Alter table Student1
drop column rownum

So, the resultant table is as shown below:











References

Add a column to existing table and uniquely number them. (2009, n.p. n.p.). Retrieved from http://stackoverflow.com: http://stackoverflow.com/questions/108211/add-a-column-to-existing-table-and-uniquely-number-them
Chauhan, S. (2012, Aug 24). Remove duplicate records from a table in SQL Server. Retrieved from http://www.dotnet-tricks.com: http://www.dotnet-tricks.com/Tutorial/sqlserver/IL3S290812-Remove-duplicate-records-from-a-table-in-SQL-Server.html
Microsoft . (n.p., n.p. n.p.). ROW_NUMBER (Transact-SQL). Retrieved from msdn.microsoft.com: https://msdn.microsoft.com/en-us/library/ms186734.aspx







No comments:

Post a Comment