Pages

Wednesday, 3 October 2012

Remove Duplicate Records Using SSIS


 Remove Duplicate Records Using SSIS Step by Step



It is very easy task to remove the duplicate rows from the table using SSIS package. To remove the duplicate records in SSIS there are multiple ways and I am using SORT transformation. Below I am showing you a simple example for removing the duplicates.

Go to the BIDS >> choose new project >> choose Integration Service Project >> give your package name and press OK button.
Now inside the Toolbox from Control flow item select the Data Flow Task and drag and drop into design area.
Rename the Data Flow Task.


Double click on Data Flow Task OR Right Click on Data Flow Task and choose edit.


Inside the Data Flow Task choose OLEDB Data Source and Drag and drop it into design area.
Rename it.
Double click on OLEDB Data Source.



 OLEDB Data Source Editor will open. Now click on NEW button.

 Again a new Editor will open here we again click on NEW button to configure the connection from the Database.



Here we create  connection to Database. For this I am using AdventureWorks Database So I use our local Server and Database AdventureWorks. After choosing the Database just check that everything is ok or not by clicking Test Connection button. If everything is ok then press OK button.


Click on Name of the Table or View and choose the table then click on column to select the column.


Now choose the columns which you want to display and click on OK.



Drag the SORT Transformation and double click on it for configure.


Here enable the check box i.e. at the bottom of the SORT  Transformation to remove the duplicate row and press OK.
Now you can choose any destination to load the data. I am using derived column to show you the result.
Here Data Viewer Shows the output rows.