Over the years, Microsoft has built an end-to-end BI platform and is now taking it to the next level. BI technologies-in the form of Data Transformation Services (DTS), Analysis Services and Reporting Services-were first seen along with SQL Server2005.in SQL Server 2005, DTS was rewritten as SQL Server Integration Services (SSIS), which is an enterprise class Extract-Transform-Load (ETL) Tool. Reporting Services were further improved with SQLServer2005 by introducing features like ad hoc reporting, role based security, end-user sorting, direct client printing, etc. With Analysis Services, SQL Server2005 introduced Universal Dimension Model (UDM), cube partitioning, data mining and predictive analysis, what-if modeling, key performance indicators, etc. Business Intelligence Development Studio (BIDS) was introduced with SQL Server2005 to give a single and integrated tool for development of SSIS, SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) objects.
The new features in SQL Server2008 are added around the following four major themes.
1. Enterprise Data Platform: SQL Server2008 provides more reliable, secure, trusted and scalable platform. it improves availability, using enhanced data mirroring, predictable
Query performance, data and backup compression, and many development features. Another innovative feature is policy-based management, which enables an easier SQlServer administration.
2. Beyond Relational: The features in this category can handle any ,even those that are unstructured and non relational, New data types are introduced to handle geo spatial data, documents and image files as well. it enables the developers to design location intelligent applications and enhances the ability to handle the document management.
3. Dynamic Development: Using the new .NET Framework 3.5 reduces the complexity of the development with ADO.NET Entity Framework and Language Integrated Query(LINQ) to SQL.ADO.NET Entity Framework enables the developers to become more productive by directly interacting with the business entities. Developers can write synchronizing applications by using features like change tracking.
4. Pervasive Insight: These features further enhance the SSIS, SSAS and SSRS, making them more scalable. They enable the enterprises to integrate all the data into data warehouse more efficiently and enables real-time data analysis. it also empowers every user with actionable insights.
Top 5 Enhancements for effective data integration using, SSIS
The following are top 5 enhancements that help in integrating the data more effectively, using SSIS.
1. Change Data Capture(CDC)
2. Data Profiling Task
3. SSIS Performance Improvements
a. Persistent lookups
b. SSIS Pipeline improvements(Better Parallel Execution)
4. Merge SQL Statement
5. Enhanced Supportability
1. Change Data Capture(CDC)
CDC is a generic database engine feature added in SQL Server2008.it is useful in data integration, to incrementally load the data into the data ware house. This enables users to get the latest information from the warehouse.CDC is enabled first at the database and then at the table level. once enabled, it makes the changed data available in changed tables-including schema changes. The CDC feature also provides APIs/functions, which can be used in SSIS package to capture the changed schema and data.CDC works with the database log,asynchronously,to get the changes in the change tables, the below figure shows the components involved in CDC.
Advantages
Minimizes the overhead on the source Online Transaction
Processing (OLTP) databases, compared to the earlier
approaches
Reduces complexity in capturing the changes
Makes real-time data warehousing possible
Helps in automating the task of incremental data warehouse
loading, thereby improving the productivity
What you should also know
Plan for more database space due to data redundancy caused by CDC feature
Additional maintenance efforts, for the changed tables and the maintenance of jobs created by CDC, for capturing and purging the data
2. Data Profiling Task
Data profiling is statistical analysis and validation of data. it helps in analyzing the data quality and identifies issues like null, duplicate, missing values, etc. Data profiling is necessary in most BI applications as the business uses the warehouse data for data mining. SIS provides out-of-box transformation for these kind of tasks, which can be embedded in the ETL tool. The Data profiling task can be easily configured and used with script task to detect the data quality issues in the source data.
Advantages
Effective analysis to identify source data quality issues
Minimize the efforts in preventing the data quality issues as
Compared to earlier approaches
Complex logic in erroneous data handling is partially taken off
from ETL
What you should also know
Works with SQLServer2000 and later versions only
Does not support third party databases and fine-based datasources.
3.SSIS Performance Improvements
There are two key enhancements made in SSIS 2008 for performance
Improvements, which contribute even to scalability.
a. Persistent Lookups
The Lookup transformation in this version of SSIS will scale to large datasets compared to the earlier version, and hence improves the performance. SSIS 2008 has cache connection manager, which can be shared across the data flow tasks of a package. The cache can be populated in one data flow task, and be used in subsequent data flow tasks. it can also be persisted to a file and can be reused. The cache can also be rebuilt with new rows.
Advantages
Explicit control over sharing and the lifetime of the lookup data
Improved performance as the same cache can be used across the
package
b. SSIS Pipeline improvements(Better Parallel Execution)
Package execution in SSIS 2008 adopts a parallel execution mechanism. The parallel execution is handled by SQL Server internally. Until the earlier version, as per the architecture, only one thread was handling the execution of all the tasks.it has new thread scheduler,which will process multiple tasks in parallel and improves performance.
Advantages
Increased performance
Better runtime stability
Benefits for the multicast and conditional split transformations
Execution
4. Merge SQL Statement
Most of the data warehouse scenarios include a requirement like load the data into warehouse if it doesn’t exist; if the data exist, then update the latest data into warehouse based on certain key fields.SQL Server2008 has a TSQL enhancement, called “UPDATE”, which combines action of the multiple DML statements. This is used in scenarios like: when the data does not exist,” INSERT” it and when data exist “UPDATE” it. The entire job is done in single statement, reducing (IO), and hence can be used in integration of large data in a data warehousing scenario.
Advantages
Reduces the IO in large data integrations scenario
Handles the complex scenarios in single statement,there by reducing complexity
Reduces round trips to SQL Server in large data integration scenarios
5.Enhanced Supportability
This is to Improve the productivity of the developer and give more debugging capabilities.Imagine a scenario where you have a package that is in a deadlocked or hang state.It requires an expert to debug the problem.SQL Server2008 has a feature called “Super Dump” that helps in such scenarios.Dumps can be created on specific errors.the Dump can be either a text or a binary file. The binary file can be opened using the “WinDbg’ utility and you can look at the exact call stacks,the function where the error is occurring,the variable values, etc.
Advantages
Improves the debudding capability on the package,and hence the productivity
Can be used to find out the issues with running packages
Current Limitations
No comments:
Post a Comment