Monday 24 October 2011

Incremental Loading in SSIS

Incremental Loads in SSIS 
Let's take a look at how you can accomplish this in SSIS using the Lookup Transformation (for the join functionality) combined with the Conditional Split (for the WHERE clause conditions) transformations.
Before we begin, let's reset our database tables to their original state using the following query:
USE SSISIncrementalLoad_SourceGOTRUNCATE TABLE dbo.tblSource-- insert an "unchanged" rowINSERT INTO dbo.tblSource(ColID,ColA,ColB,ColC)VALUES(0, 'A', '1/1/2007 12:01 AM', -1)-- insert a "changed" row

Once the project loads, open Solution Explorer and rename Package1.dtsx to SSISIncrementalLoad.dtsx:

When prompted to rename the package object, click the Yes button. From the toolbox, drag a Data Flow onto the Control Flow canvas:
 
Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB Source onto the Data Flow canvas:
 
Double-click the OLE DB Source connection adapter to edit it:

Click the New button beside the OLE DB Connection Manager dropdown:

Click the New button here to create a new Data Connection:

Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you created earlier. Click the OK button to return to the Connection Manager configuration dialog. Click the OK button to accept your newly created Data Connection as the Connection Manager you wish to define. Select "dbo.tblSource" from the Table dropdown:

Click the OK button to complete defining the OLE DB Source Adapter.
Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas. Connect the OLE DB connection adapter to the Lookup transformation by clicking on the OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:

When the editor opens, click the New button beside the OLE DB Connection Manager dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data Connection - this time to the SSISIncrementalLoad_Dest database. After setting up the new Data Connection and Connection Manager, configure the Lookup transformation to connect to "dbo.tblDest":

Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns available from the Lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps:
1. We'll need all the rows returned from the destination table, so check all the checkboxes beside the rows in the destination. We need these rows for our WHERE clauses and for our JOIN ON clauses.
2. We do not want to map all the rows between the source and destination - we only want to map the columns named ColID between the database tables. The Mappings drawn between the Available Input Columns and Available Lookup Columns define the JOIN ON clause. Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click "Delete Selected Mappings" to delete these columns from our JOIN ON clause.
3. Add the text "Dest_" to each column's Output Alias. These rows are being appended to the data flow pipeline. This is so we can distinguish between Source and Destination rows farther down the pipeline:

Next we need to modify our Lookup transformation behavior. By default, the Lookup operates as an INNER JOIN - but we need a LEFT (OUTER) JOIN. Click the "Configure Error Output" button to open the "Configure Error Output" screen. On the "Lookup Output" row, change the Error column from "Fail component" to "Ignore failure". This tells the Lookup transformation "If you don't find an INNER JOIN match in the destination table for the Source table's ColID value, don't fail." - which also effectively tells the Lookup "Don't act like an INNER JOIN, behave like a LEFT JOIN":

Click OK to complete the Lookup transformation configuration.
From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional Split and click Edit to open the Conditional Split Editor:

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New Rows" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <<expression>> )" function to the Condition column of the New Rows condition:

Next, drag Dest_ColID from the columns folder and drop it onto the "<<expression>>" text in the Condition column. "New Rows" should now be defined by the condition "ISNULL( [Dest_ColID] )". This defines the WHERE clause for new rows - setting it to "WHERE Dest_ColID Is NULL".
Type "Changed Rows" into a second Output Name column. Add the expression "(ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)" to the Condition column for the Changed Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))". Note "||" is used to convey "OR" in SSIS Expressions:

Change the "Default output name" from "Conditional Split Default Output" to "Unchanged Rows":

Click the OK button to complete configuration of the Conditional Split transformation.
Drag and drop an OLE DB Destination connection adapter and an OLE DB Command transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to the OLE DB Destination. A dialog will display prompting you to select a Conditional Split Output (those outputs you defined in the last step). Select the New Rows output:

Next connect the OLE DB Command transformation to the Conditional Split's "Changed Rows" output:

 Your Data Flow canvas should appear similar to the following:

Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest table:

Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren't you glad we prepended "Dest_" to the destination columns?):
 
Click the OK button to complete configuring the OLE DB Destination connection adapter.
Double-click the OLE DB Command to open the "Advanced Editor for OLE DB Command" dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager:
 
Click on the "Component Properties" tab. Click the elipsis (button with "...") beside the SQLCommand property:

 The String Value Editor displays. Enter the following parameterized T-SQL statement into the String Value textbox:
UPDATE dbo.tblDest
SET
ColA = ?
,ColB = ?
,ColC = ?
WHERE ColID = ?

 The question marks in the previous parameterized T-SQL statement map by ordinal to columns named "Param_0" through "Param_3". Map them as shown below - effectively altering the UPDATE statement for each row to read:
UPDATE SSISIncrementalLoad_Dest.dbo.tblDest
SET
ColA = SSISIncrementalLoad_Source.dbo.ColA
,ColB = SSISIncrementalLoad_Source.dbo.ColB
,ColC = SSISIncrementalLoad_Source.dbo.ColC
WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID
Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead.

 Click the OK button when mapping is completed.
Your Data Flow canvas should look like that pictured below:

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Note one row takes the "New Rows" output from the Conditional Split, and one row takes the "Changed Rows" output from the Conditional Split transformation. Although not visible, our third source row doesn't change, and would be sent to the "Unchanged Rows" output - which is simply the default Conditional Split output renamed. Any row that doesn't meet any of the predefined conditions in the Conditional Split is sent to the default output.INSERT INTO dbo.tblSource(ColID,ColA,ColB,ColC)VALUES(1, 'B', '1/1/2007 12:02 AM', -2)-- insert a "new" rowINSERT INTO dbo.tblSource(ColID,ColA,ColB,ColC)VALUES(2, 'N', '1/1/2007 12:03 AM', -3)USE SSISIncrementalLoad_DestGOTRUNCATE TABLE dbo.tblDest-- insert an "unchanged" rowINSERT INTO dbo.tblDest(ColID,ColA,ColB,ColC)VALUES(0, 'A', '1/1/2007 12:01 AM', -1)-- insert a "changed" rowINSERT INTO dbo.tblDest(ColID,ColA,ColB,ColC)VALUES(1, 'C', '1/1/2007 12:02 AM', -2)Next, create a new project using Business Intelligence Development Studio (BIDS). Name the project SSISIncrementalLoad:

No comments:

Post a Comment

SSIS: Creating Package Configurations

This post discusses the creation of Configuration Files and how they can be useful while migrating a package from one environment to an...