Sunday 30 October 2011

SSIS architecture

SSIS is a component of SQL Server 2005/2008 and is the successor of DTS (Data Transformation Services) which formed part of SQL Server 7.0/2000. From an end-user perspective DTS and SSIS appear similar, however they are quite different. SSIS has been completely written from the scratch (it is a new enterprise ETL product) and overcomes several limitations of DTS. Though the list of differences between DTS and SSIS is quite large, one thing to note is the internal architecture of SSIS is completely different from DTS. It has segregated the Data Flow Engine from the Control Flow Engine or SSIS Runtime Engine; designed to achieve a high degree of parallelism and improve the overall performance (see the architecture image below).




 
The SSIS architecture consists of two main components as given below:

SSIS Runtime Engine – The SSIS runtime engine handles the control flow of a package. It saves the layout of packages, runs packages and provides support for logging, breakpoints, configuration, connections and transactions. The run-time engine is a parallel control flow engine that coordinates the execution of tasks or units of work within SSIS and manages the engine threads that carry out those tasks.

The SSIS runtime engine executes the tasks inside a package in an orderly fashion. When the runtime engine encounters a data flow task in a package during execution it creates a data flow pipeline and lets that data flow task run in the pipeline.

Note:
The Integration Services service (a windows service) is not the same as the SSISruntime engine/service. It is not required if only the design and execute Integration Services packages are wanted. This windows service can be started to manage SSIS packages, for example to connect to multiple SSIS servers, start/stop package remotely/locally, manage the package store, import/export packages etc.

SSIS Data Flow Engine/Pipeline – SSIS Data Flow Engine or Data Flow Pipeline or Transformation pipeline engine manages the flow of data from data sources, through transformations, and on to destination targets. When the Data Flow task executes, the SSIS data flow engine extracts data from one or more data sources, performs any necessary transformations on the extracted data and then delivers the data to one or more destinations.

The Data flow engine is buffer oriented architecture (more details will be discussed in a later section), it pulls data from the source and stores it in a buffer (memory structure) and does the transformation in buffer/memory itself instead of processing on a row-by-row basis. The benefit of this in-memory processing is that processing is much faster as there is no need to physically copy/stage the data at each step of the data integration; the data flow engine manipulates data as it is transferred from source to destination.

No comments:

Post a Comment

SSIS: Creating Package Configurations

This post discusses the creation of Configuration Files and how they can be useful while migrating a package from one environment to an...