Lets assume that you are using SQL Server 2005 for your current project. You found that you have few rows which have duplicate data in all the columns. Lets consider that you have table name "Example" which has two columns ID and Name.
CREATE TABLE [dbo].[Example]
(
[ID] [int] NOT NULL,
[Name] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
ON [PRIMARY]
Fig - (1) Create Statement for Table
INSERT INTO [Example] ([ID],[Name]) VALUES (1,Chirag)
INSERT INTO [Example] ([ID],[Name]) VALUES (1,Chirag)
INSERT INTO [Example] ([ID],[Name]) VALUES (2,'Shailesh')
INSERT INTO [Example] ([ID],[Name]) VALUES (3,'Dipak')
INSERT INTO [Example] ([ID],[Name]) VALUES (4,'Mihir')
INSERT INTO [Example] ([ID],[Name]) VALUES (4,'Mihir')
INSERT INTO [Example] ([ID],[Name]) VALUES (4,'Piyush')
Fig - (2) Insert Script.
You can see that first two and last three rows are duplicate rows. All the values in these rows are same. Here is the insert script, if you want to do this practically in your local database.
Now you want to delete duplicate rows in such a way that only one row will be exist after delete statement. First let me write the query which will give return all the duplicate rows from table.
SELECT
[ID],[NAME],COUNT([ID])
FROM
[Example]
GROUP BY
[ID],[NAME]
HAVING
COUNT([ID]) > 1
Fig - (3) Query to identify duplicate rows in table.
Here I have used COUNT([ID]) in select statement as ID is not null filed. You can use any column which is not NULL. If all the columns in your table allows NULL value than you can use COUNT(*). The Difference between COUNT(Column Name) and COUNT(*) is, if your column allows null value and in table you have 5 records with 2 null values in ColumnA. If you use COUNT(ColumnA) it will returns 3 and if you use COUNT(*) it will returns 5. So COUNT(Column Name) ignores NULL value. Lets get back to our query. I have used all the column in SELECT and GROUP BY clause. You also have to write all the columns of your table in SELECT and GROUP BY clause. This way you can identify all the duplicates row from table.
Lets assume that you have to delete the row which has value (1, 'Chirag') so that only one row remains. Here is the query, (Note: This will work only in SQL Sever 2005)
DELETE TOP(1) FROM [Example] WHERE [ID] = 1
Fig - (3) Delete single row from duplicate rows.
Here I have used TOP(1) , If you have n rows which has all the values same than you have to use TOP(n-1) so that only 1 row will be remain after delete statement. To delete all the duplicate rows you need to write a cursor as shown below,
DECLARE @ID int
DECLARE @NAME NVARCHAR(50)
DECLARE @COUNT int
DECLARE CUR_DELETE CURSOR FOR
SELECT [ID],[NAME],COUNT([ID]) FROM [Example] GROUP BY [ID],[NAME] HAVING COUNT([ID]) > 1
OPEN CUR_DELETE
FETCH NEXT FROM CUR_DELETE INTO @ID,@NAME,@COUNT
/* Loop through cursor for remaining ID */
WHILE @@FETCH_STATUS = 0
BEGIN
DELETE TOP(@COUNT -1) FROM [Example] WHERE ID = @ID
FETCH NEXT FROM CUR_DELETE INTO @ID,@NAME,@COUNT
END
CLOSE CUR_DELETE
DEALLOCATE CUR_DELETE
Fig - (4) Cursor to delete all duplicate records
This is all about deleting duplicate rows in SQL Server 2005.
Now to do the same in SQL server 2000. There is function called ROWCOUNT in SQL. I have used same [Example] table. You can do this by,
SET ROWCOUNT 1
DELETE FROM [Example] WHERE [ID] = 1
Fig - (5) Delete duplicate row in SQL Server 2000
ROWCOUNT function specify that how many rows will be affected by the statement which is immediately written below. Here also you have to write ROWCOUNT (n -1) to delete n duplicate rows such that only 1 row will remain in database.
Happy Programming !!