Friday, August 3, 2007

Delete Duplicate Rows using single query

Before I am going to discuss how to delete duplicate rows using single query on SQL Server 2005, I am going to explain what is CTE (Common Table Partitioning) and how to use Paritioning table.

CTE is Common Table Expressions (CTEs).

CTE is a "temporary result set" that exists only within the scope of a single SQL statement.It allows access to functionality within that single SQL statement that was previously only available through use of functions, temp tables, cursors, and so on.

Let's take a Example of CTE

WITH CTE(Name) AS (SELECT Name = 'DIPAK') SELECT [Name] FROM CTE

when you run above query on query anlyzer you can see the output as

Name
Dipak

Another Example of CTE is

USE AdventureWorks;
WITH OrderedOrders
AS( SELECT SalesOrderID, OrderDate, ROW_NUMBER() OVER (ORDER BY OrderDate) AS 'RowNumber' FROM Sales.SalesOrderHeader )
SELECT * FROM OrderedOrders WHERE RowNumber BETWEEN 50 AND 60;


Now Using CTE and partitioning table I am doing to delete duplicate rows on SQL Server 2005 with single query

Let's create the sample table

CREATE TABLE [dbo].temp(ID int NOT NULL,FirstName varchar(50),LastName varchar(50)) ON [PRIMARY]

Insert into temp values(1,'Dipak','Patel')
Insert into temp values(1,'Dipak','Patel')
Insert into temp values(2,'Chirag','Darji')
Insert into temp values(2,'Chirag','Darji')
Insert into temp values(3,'SK','Patel')


Now when you run select * from temp on query analyzer you could see below result with some duplicare records

ID FirstName LastName
1 Dipak Patel
1 Dipak Patel
2 Chirag Darji
2 Chirag Darji
3 SK Patel

Now when run below query

select ID,FirstName,Lastname, row_number() over (partition by ID,FirstName,LastName Order by ID) as Number from temp

you could see below result

ID FirstName LastName Number
1 Dipak Patel 1
1 Dipak Patel 2
2 Chirag Darji 1
2 Chirag Darji 2
3 SK Patel 1

Now here what happenes with partition by exactly, with the use of Partition by it Divides the result set into partitions with same records, Now we can easily remove the records using CTE,

WITH T1 AS (Select Id,FirstName,LastName, ROW_NUMBER() OVER (PARTITION BY Id,FirstName,LastName Order By Id) AS NUMBER From TEMP)Delete From T1 Where Number >1

For More Information about RowNumber and Partition By you can refer below URL

http://msdn2.microsoft.com/en-us/library/ms189461.aspx
http://msdn2.microsoft.com/en-us/library/ms186734.aspx


Thanks
Dipak Patel

4 comments:

Mihir said...

Thank you Mr. Dipak Patel... after searching numerous sites and forums, u finally gave the solution I was searching for.

Thanks once again

Shailesh Patel said...

we found this article very useful... Looking for more useful post....

Unknown said...

Thanks...It was very Useful For Me

jayachandra said...

Thank you deepak. It is very useful for developers.

--Jayachandra.