Here are the 10 SSIS best practices that would be good to
follow during any SSIS package development
§ The most desired feature in SSIS packages development is
re-usability. In other ways, we can call them as standard packages that can be
re-used during different ETL component development. In SSIS, this can be easily
achieved using template features. SSIS template packages are the re-usable
packages that one can use in any SSIS project at any number of times.
To know more about how to configure this, please see http://support.microsoft.com/kb/908018
§ Avoid using dot (.) naming convention for your package
names. Dot (.) naming convention sometime confuses with the SQL Server object
naming convention and hence should be avoided. Good approach would be to use
underscore (_) instead of using dot. Also make sure that package names should
not exceed 100 characters. During package deployment in SQLServer type mode, it
is noticed that any character over 100 are automatically removed from package
name. This might result your SSIS package failure during runtime, especially
when you are using ‘Execute Package Tasks’ in your package.
§ The flow of data from upstream to downstream in a
package is a memory intensive task, at most of the steps and component level we
have to carefully check and make sure that any unnecessary columns are not
passed to downstream. This helps in avoiding extra execution time overhead of
package and in turn improves overall performance of package execution.
§ While configuring any OLEDB connection manager as a
source, avoid using ‘Table or view’ as data access mode, this is
similar to ‘SELECT * FROM <TABLE_NAME>,
and as most of us know, SELECT * is our enemy, it takes all the columns in
account including those which are not even required. Always try to use ‘SQL command’ data access mode and only include
required column names in your SELECT T-SQL statement. In this way you can block
passing unnecessary columns to downstream.
§ In your Data Flow Tasks, use Flat File connection
manager very carefully, creating Flat File connection manager with default
setting will use data type string [DT_STR] as a default for all the column
values. This always might not be a right option because you might have some numeric,
integer or Boolean columns in your source, passing them as a string to
downstream would take unnecessary memory space and may cause some error at the
later stages of package execution.
§ Sorting of data is a time consuming
operation, in SSIS you can sort data coming from upstream using ‘Sort’ transformation, however this is a memory intensive task
and sometime result in degrade in overall package execution performance. As a
best practice, at most of the places where we know that data is coming from SQL
Server database tables, it’s better to perform the sorting
operation at the database level where sorting can be performed within the query.
This is in fact good because SQL Server database sorting is much refined and
happens at SQL Server level. This in turn sometime results overall performance
improvement in package
execution.
§ During SSIS packages development, most of the time one
has to share his package with other team members or one has to deploy same
package to any other dev, UAT or production systems. One thing that a developer
has to make sure is to use correct package protection level. If someone goes
with the default package protection level ‘EncryptSenstiveWithUserKey’then same package might not execute
as expected in other environments because package was encrypted with user’s personal key. To make package
execution smooth across environment, one has to first understand the package
protection level property behaviour,
please see http://msdn2.microsoft.com/enus/library/microsoft.sqlserver.dts.runtime.dtsprote
ctionlevel.aspx .
In general, to avoid most of the package deployment error
from one system to another system, set package protection level to ‘DontSaveSenstive’.
§ It’s a best practice to take use of
Sequence containers in SSIS packages to group different components at ‘Control Flow’ level. This offers a rich set of
facilities
o Provides a scope for variables that a group of related
tasks and containers can use
o Provides facility to manage properties of multiple
tasks by setting property at Sequence container
level
o Provide facility to set transaction isolation level at
Sequence container level.
For more information on Sequence containers, please see http://msdn2.microsoft.com/en-us/library/ms139855.aspx
.
§ If you are designing an ETL solution for a small, medium
or large enterprise business need, it’s always good to have a feature of
restarting failed packages from the point of failure. SSIS have an out of the
box feature called ‘Checkpoint’ to support restart of failed
packages from the point of failure. However, you have to configure the
checkpoint feature at the package level.
For more information, please see
http://msdn2.microsoft.com/en-us/library/ms140226.aspx
.
§ Execute SQL Task is our best friend in SSIS; we can use
this to run a single or multiple SQL statement at a time. The beauty of this
component is that it can return results in different ways e.g. single row, full
result set and XML. You can create different type of connection using this component
like OLEDB, ODBC, ADO, ADO.NET and SQL Mobile type etc. I prefer to use this component
most of the time with my FOR Each Loop container to define iteration loop on
the basis of result returned by Execute SQL Task. For more information, please
see
http://msdn2.microsoft.com/en-us/library/ms141003.aspx
&
http://www.sqlis.com/58.aspx
No comments:
Post a Comment