At times I come across a situation where there’s duplicate data in a table. There can be many reasons for this but aside from this being intentional, duplicate data is a problem.
In this situation, the duplicates were caused by an issue with GoldenGate replication (we believe) in which data may have been loaded twice. Because this table doesn’t have a unique index, there was no mechanism to prevent this.
Deletion of duplicates can be tedious, but it can also be quite quick (depending on your table size). The idea behind this mechanism is that duplicates get identified and then ranked (first duplicate, second duplicate etc) – in this way, we use that ranking to determine what we delete. By deleting all rows with a ranking of 2 or more, we delete all the duplicates.
It is essential that all columns in the table be declared for this, obviously.
Here’s the code.
WHERE rowid IN
(SELECT rowid rid,
dense_rank() over (partition BY ORDER_ID , LINE_ITEM_ID, PRODUCT_ID , UNIT_PRICE, QUANTITY ,
DISPATCH_DATE, RETURN_DATE , GIFT_WRAP , CONDITION, SUPPLIER_ID NUMBER, ESTIMATED_DELIVERY
order by rowid) rownumber
If it all looks good, issue a commit and go back to what you were doing..