Transformation | Description | Examples of when Transformation Would be Used |
---|---|---|
Aggregate | Calculates aggregations such as SUM, COUNT, AVG, MIN and MAX based on the values of a given numeric column. This transformation produces additional output records. | Adding aggregated information to your output. This can be useful for adding totals and sub-totals to your output. |
Audit | Includes auditing information, such as computer name where the package runs, package version ID, task name, etc in the data flow. | Creates advanced logs which indicate where and when the package was executed, how long it took to run the package and the outcome of execution. |
Character Map | Performs minor manipulations on string columns. Converts all letters to uppercase, lowercase, reverse bytes, etc. | Applying string manipulations prior to loading data into the data warehouse. You can also apply the same manipulations to the data while it is being loaded into the warehouse. |
Conditional Split | Accepts an input and determines which destination to pipe the data into based on the result of an expression. | Cleansing the data to extract specific rows from the source. If a specific column does not conform to the predefined format (perhaps it has leading spaces or zeros), move such records to the error file. |
Copy Column | Makes a copy of a single or multiple columns which will be further transformed by subsequent tasks in the package. | Extracting columns that need to be cleansed of leading / trailing spaces, applying character map transformation to uppercase all data and then load it into the table. |
Data Conversion | Converts input columns from one data type to another. | Converting columns extracted from the data source to the proper data type expected by the data warehouse. Having such transformation options allows us the freedom of moving data directly from its source into the destination without having an intermediary staging database. |
Data Mining Query | Queries a data mining model. Includes a query builder to assist you with development of Data Mining eXpressions (DMX) prediction queries. | Evaluating the input data set against a data mining model developed with Analysis Services. |
Derived Column | Calculates new column value based on an existing column or multiple columns. | Removing leading and trailing spaces from a column. Add title of courtesy (Mr., Mrs., Dr, etc) to the name. |
Export Column | Exports contents of large columns (TEXT, NTEXT, IMAGE data types) into files. | Saving large strings or images into files while moving the rest of the columns into a transactional database or data warehouse. |
Fuzzy Grouping | Finds close or exact matches between multiple rows in the data source. Adds columns to the output including the values and similarity scores. | Cleansing data by translating various versions of the same value to a common identifier. For example, "Dr", "Dr.", "doctor", "M.D." should all be considered equivalent. |
Fuzzy Lookup | Compares values in the input data source rows to values in the lookup table. Finds the exact matches as well as those values that are similar. | Cleansing data by translating various versions of the same value to a common identifier. For example, "Dr", "Dr.", "doctor", "M.D." should all be considered equivalent. |
Import Column | Imports contents of a file and appends to the output. Can be used to append TEXT, NTEXT and IMAGE data columns to the input obtained from a separate data source. | This transformation could be useful for web content developers. For example, suppose you offer college courses online. Normalized course meta-data, such as course_id, name, and description is stored in a typical relational table. Unstructured course meta-data, on the other hand, is stored in XML files. You can use Import Column transformation to add XML meta-data to a text column in your course table. |
Lookup | Joins the input data set to the reference table, view or row set created by a SQL statement to lookup corresponding values. If some rows in the input data do not have corresponding rows in the lookup table then you must redirect such rows to a different output. | Obtaining additional data columns. For example, the majority of employee demographic information might be available in a flat file, but other data such as department where each employee works, their employment start date and job grade might be available from a table in relational database. |
Merge | Merges two sorted inputs into a single output based on the values of the key columns in each data set. Merged columns must have either identical or compatible data types. For example you can merge VARCHAR(30) and VARCHAR(50) columns. You cannot merge INT and DATETIME columns. | Combining the columns from multiple data sources into a single row set prior to populating a dimension table in a data warehouse. Using Merge transformation saves the step of having a temporary staging area. With prior versions of SQL Server you had to populate the staging area first if your data warehouse had multiple transactional data sources. |
Merge Join | Joins two sorted inputs using INNER JOIN, LEFT OUTER JOIN or FULL OUTER JOIN algorithm. You can specify columns used for joining inputs. | Combining the columns from multiple data sources into a single row set prior to populating a dimension table in a data warehouse. Using Merge Join transformation saves the step of having a temporary staging area. With prior versions of SQL Server you had to populate the staging area first if your data warehouse had multiple transactional data sources. Note that Merge and Merge Join transformations can only combine two data sets at a time. However, you could use multiple Merge Join transformations to include additional data sets. |
Multicast | Similar to the conditional split transformation, but the entire data set is piped to multiple destinations. | Populating the relational warehouse as well as the source file with the output of a derived column transformation. |
OLEDB Command | Runs a SQL command for each input data row. Normally your SQL statement will include a parameter (denoted by the question mark), for example: UPDATE employee_source SET has_been_loaded=1 WHERE employee_id=? | Setting the value of a column with BIT data type (perhaps called "has_been_loaded") to 1 after the data row has been loaded into the warehouse. This way the subsequent loads will only attempt importing the rows that haven't made it to the warehouse as of yet. |
Percentage Sampling | Loads only a subset of your data, defined as the percentage of all rows in the data source. Note that rows are chosen randomly. | Limiting the data set during development phases of your project. Your data sources might contain billions of rows. Processing cubes against the entire data set can be prohibitively lengthy. If you're simply trying to ensure that your warehouse functions properly and data values on transactional reports match the values obtained from your Analysis Services cubes you might wish to only load a subset of data into your cubes. |
Pivot | Pivots the normalized data set by certain column to create a more easily readable output. Similar to PIVOT command in Transact-SQL. You can think of this transformation as converting rows into columns. For example if your input rows have customer, account number and account balance columns the output will have the customer and one column for each account. | Creating a row set that displays the table data in a more user-friendly format. The data set could be consumed by a web service or could be distributed to users through email. |
Row count | Counts the number of transformed rows and store in a variable. | Determining the total size of your data set. You could also execute a different set of tasks based on the number of rows you have transformed. For example, if you increase the number of rows in your fact table by 5% you could perform no maintenance. If you increase the size of the table by 50% you might wish to rebuild the clustered index. |
Row sampling | Loads only a subset of your data, defined as the number of rows. Note that rows are chosen randomly. | Limiting the data set during development phases of your project. Your data warehouse might contain billions of rows. Processing cubes against the entire data set can be prohibitively lengthy. If you're simply trying to ensure that your warehouse functions properly and data values on transactional reports match the values obtained from your Analysis Services cubes you might wish to only load a subset of data into your cubes. |
Script Component | Every data flow consists of three main components: source, destination and transformation. Script Component allows you to write transformations for otherwise un-supported source and destination file formats. Script component also allows you to perform transformations not directly available through the built-in transformation algorithms. | Custom transformations can call functions in managed assemblies, including .NET framework. This type of transformation can be used when the data source (or destination) file format cannot be managed by typical connection managers. For example, some log files might not have tabular data structures. At times you might also need to parse strings one character at a time to import only the needed data elements. Much like Script Task the Script Component transformation must be written using Visual Basic .NET. |
Sunday, 30 October 2011
SSIS Transformations-1
SSIS supports numerous transformations that allow you to combine data originating from multiple sources, cleanse the data and give it the shape your data destination expects. Then you can import the data into a single or multiple destinations.
Subscribe to:
Post Comments (Atom)
SSIS: Creating Package Configurations
This post discusses the creation of Configuration Files and how they can be useful while migrating a package from one environment to an...
-
Synchronous vs Asynchronous The SSIS dataflow contain three types of transformations. They can be non-blocking, semi-blocking or ful...
-
The below are the differences between systems. Please go through them. . OLTP ...
-
1. How to provide security for the configuration file (xml package configuration file)? 2. Different approaches of deployme...
No comments:
Post a Comment