Monday, July 23, 2007

Delete single row from duplicate rows in SQL Server 2005 and 2000

        Lets assume that you are using SQL Server 2005 for your current project. You found that you have few rows which have duplicate data in all the columns. Lets consider that you have table name "Example" which has two columns ID and Name.

CREATE TABLE [dbo].[Example]
(
[ID] [int] NOT NULL,
[Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
ON [PRIMARY]

Fig - (1) Create Statement for Table     

INSERT INTO [Example] ([ID],[Name]) VALUES (1,Chirag)
INSERT INTO [Example] ([ID],[Name]) VALUES (1,Chirag)
INSERT INTO [Example] ([ID],[Name]) VALUES (2,'Shailesh')
INSERT INTO [Example] ([ID],[Name]) VALUES (3,'Dipak')
INSERT INTO [Example] ([ID],[Name]) VALUES (4,'Mihir')
INSERT INTO [Example] ([ID],[Name]) VALUES (4,'Mihir')
INSERT INTO [Example] ([ID],[Name]) VALUES (4,'Piyush')

Fig - (2) Insert Script.

          You can see that first two and last three rows are duplicate rows. All the values in these rows are same.  Here is the insert script, if you want to do this practically in your local database.

       Now you want to delete duplicate rows in such a way that only one row will be exist after delete statement. First let me write the query which will give return all the duplicate rows from table.

SELECT
        [ID],[NAME],COUNT([ID])
FROM
        [Example]
GROUP BY
        [ID],[NAME]
HAVING
        COUNT([ID]) > 1

Fig - (3) Query to identify duplicate rows in table.

       Here I have used COUNT([ID]) in select statement as ID is not null filed. You can use any column which is not NULL. If all the columns in your table allows NULL value than you can use COUNT(*). The Difference between COUNT(Column Name) and COUNT(*) is, if your column allows null value and in table you have 5 records with 2 null values in ColumnA. If you use COUNT(ColumnA) it will returns 3 and if you use COUNT(*) it will returns 5. So COUNT(Column Name) ignores NULL value. Lets get back to our query. I have used all the column in SELECT and GROUP BY clause. You also have to write all the columns of your table in SELECT and GROUP BY clause. This way you can identify all the duplicates row from table.

        Lets assume that you have to delete the row which has value (1, 'Chirag') so that only one row remains. Here is the query, (Note: This will work only in SQL Sever 2005)

      DELETE TOP(1) FROM [Example] WHERE [ID] = 1

Fig - (3) Delete single row from duplicate rows.

          Here I have used TOP(1) , If you have n rows which has all the values same than you have to use TOP(n-1) so that only 1 row will be remain after delete statement. To delete all the duplicate rows you need to write a cursor as shown below,

DECLARE @ID int
DECLARE @NAME NVARCHAR(50)
DECLARE @COUNT int

DECLARE CUR_DELETE CURSOR FOR
SELECT [ID],[NAME],COUNT([ID]) FROM [Example] GROUP BY [ID],[NAME] HAVING COUNT([ID]) > 1

OPEN CUR_DELETE

FETCH NEXT FROM CUR_DELETE INTO @ID,@NAME,@COUNT
/* Loop through cursor for remaining ID */
WHILE @@FETCH_STATUS = 0
BEGIN

DELETE TOP(@COUNT -1) FROM [Example] WHERE ID = @ID

FETCH NEXT FROM CUR_DELETE INTO @ID,@NAME,@COUNT
END

CLOSE CUR_DELETE
DEALLOCATE CUR_DELETE

Fig - (4) Cursor to delete all duplicate  records

         This is all about deleting duplicate rows in SQL Server 2005.

       Now to do the same in SQL server 2000.  There is function called ROWCOUNT in SQL.  I have used same [Example] table. You can do this by,

SET ROWCOUNT 1
DELETE FROM [Example] WHERE [ID] = 1

Fig - (5) Delete duplicate row in SQL Server 2000

      ROWCOUNT function specify that how many rows will be affected by the statement which is immediately written below.  Here also you have to write  ROWCOUNT (n -1) to delete n duplicate rows such that only  1 row will remain in database.

Happy Programming !!

4 comments:

Anonymous said...

dsfsd sfdf dfsdf sdf sdfsd fsdf sdf sdf dsf sdf saadgdfgdfgdfg

Anonymous said...

Thanks,very useful to me right now,
would u like to contact me?
my MSN: henryliuus@hotmail.com
Glad to be friends with u

Yogesh said...

The below article showed me a simple way
Delete Duplicate Rows in Sql Server

Anonymous said...

actually you said only to delete one duplicate row for example:if we have 10 duplicate rows should delete only one row and should show 9 rows and if we have 5 should delete one row and should show 4 rows.