Duplicate values is always a nuisance for any database. Duplicate values can sometimes affect your analysis and result in wrong results. We can follow below steps to search and remove duplicate rows in database.
I am using SQL Server to search and remove duplicate values.
The result of query select * from Student1 is as shown below
In the table shown above, row 4 is similar to row 1. Both contain same SName and SCity. They both have different StdNo.
Searching Duplicate values:
Let us see how many values are duplicate in the table:
select Sname, SCity,count(*) as count from Student1 group by Sname, Scity;
We can see that Sname: Joe and SCity: Portland appears twice.
To see only duplicate values, we can use below code:
select Sname, SCity,count(*) as count from Student1 group by Sname, Scity having count(*) > 1;
Removing Duplicate Values:
To remove duplicate values you will need atleast one column which is unique. In our case it was StdNo. In case you do not have any of such columns, you can use row number as a unique column as shown below:
select ROW_NUMBER() over (order by stdno) as Rownum,stdno,SName, SCity from Student1;
Now, you have one column which has unique values. Now you can proceed which removal of duplicate rows as shown below:
delete from Student1 where stdno not in (select min(stdno) from student1 group by Sname, Scity)
or you can add column rownum to your table and then remove duplicate values as shown below:
alter table Student1
add rownum int identity(1,1)
The table will have an extra column rownum.
Now, use the delete statement
delete from Student1 where rownum not in (select min(rownum) from student1 group by Sname, Scity)
Finally the duplicate data gets removed as shown below:
Now, you can drop the extra rownum column created as shown below :
Alter table Student1
drop column rownum
So, the resultant table is as shown below:
I am using SQL Server to search and remove duplicate values.
The result of query select * from Student1 is as shown below
In the table shown above, row 4 is similar to row 1. Both contain same SName and SCity. They both have different StdNo.
Searching Duplicate values:
Let us see how many values are duplicate in the table:
select Sname, SCity,count(*) as count from Student1 group by Sname, Scity;
We can see that Sname: Joe and SCity: Portland appears twice.
To see only duplicate values, we can use below code:
select Sname, SCity,count(*) as count from Student1 group by Sname, Scity having count(*) > 1;
Removing Duplicate Values:
To remove duplicate values you will need atleast one column which is unique. In our case it was StdNo. In case you do not have any of such columns, you can use row number as a unique column as shown below:
select ROW_NUMBER() over (order by stdno) as Rownum,stdno,SName, SCity from Student1;
Now, you have one column which has unique values. Now you can proceed which removal of duplicate rows as shown below:
delete from Student1 where stdno not in (select min(stdno) from student1 group by Sname, Scity)
or you can add column rownum to your table and then remove duplicate values as shown below:
alter table Student1
add rownum int identity(1,1)
The table will have an extra column rownum.
Now, use the delete statement
delete from Student1 where rownum not in (select min(rownum) from student1 group by Sname, Scity)
Finally the duplicate data gets removed as shown below:
Now, you can drop the extra rownum column created as shown below :
Alter table Student1
drop column rownum
So, the resultant table is as shown below:
References
Add a column to existing table and uniquely number
them. (2009, n.p. n.p.). Retrieved
from http://stackoverflow.com:
http://stackoverflow.com/questions/108211/add-a-column-to-existing-table-and-uniquely-number-them
Chauhan, S. (2012, Aug 24). Remove duplicate
records from a table in SQL Server. Retrieved from
http://www.dotnet-tricks.com:
http://www.dotnet-tricks.com/Tutorial/sqlserver/IL3S290812-Remove-duplicate-records-from-a-table-in-SQL-Server.html
Microsoft . (n.p., n.p. n.p.). ROW_NUMBER (Transact-SQL).
Retrieved from msdn.microsoft.com:
https://msdn.microsoft.com/en-us/library/ms186734.aspx
No comments:
Post a Comment