SQL Server Integration Services (SSIS) is used mostly
as an Extract-Transform-Load (ETL) tool. In this article, we are concerned mostly with the data source part of the SSIS.
SSIS was first
introduced in SQL Server 2005 as the successor to Data Transformation
Services (DTS) in SQL Server 2000. In SQL Server 2005, SSIS was written
from scratch, and SSIS came in the SQL Server Enterprise, Standard,
Workgroup, Web, Express, and, of course, Developer editions. However,
the Raw file source and XML sources are available only in Enterprise and
Standard editions.
|
With
the installation of SQL Server 2008, you will get an additional tool
called SQL Server Business Intelligence Development Studio, which is
used to create SSIS packages, SSAS cubes, and SSRS reports. Creating the
SSIS package is described in Exercise 1.
Exercise 1. How to Create an SSIS Package
Start SQL Server Business Intelligence Development Studio from Program Files | SQL Server 2008. Select Projects... from File | New at the main menu. You will be taken to the New Project dialog, which is shown in Figure 1.
Select Business Intelligence Projects as a project type and select Integration Services Projects from Visual Studio installed templates. Enter a name for the SSIS project and the folder location of the SSIS project and click OK. After clicking the OK button, you will see the Control Flow of the package (see Figure 2).
By default, the package name Package.dtsx is added to the project. You can rename the package and add new packages.
|
Here are the available sources in SSIS.
ADO Net source
This uses the ADO.NET connection manager to access data. In this
source, you need to provide the table name or command string directly.
Unlike OLE DB source, you cannot supply source in a variable in ADO net
source. Excel source
Excel source uses an Excel connection manager. This can be either a
worksheet or a name space. Data types are assigned to each column by
sampling a few rows. Flat File source Flat File source uses a Flat File manager, which can be delimited or of fixed length. OLE DB source OLE DB source uses the OLE DB connection manager. Raw File source Raw files are files written by the SSIS. Raw File Source uses these raw files. XML source Unlike other sources, XML Source does not use a connection manager; instead it uses an input file.
Exercise 2 will guide you in creating a connection manager.
Exercise 2. How to Create a Connection Manager
Right-click the area under the connection manager, and you will get the context menu shown in Figure 3.
Select New OLE DB Connection from the context menu. You will be taken to the Configure Connection Manager
dialog box. You have the option of selecting already existing
connection managers. You can create a new connection manager by clicking
the New button. Select the appropriate provider from the providers list. To create an MS Access connection manager, select Microsoft Jet 4.0 OLE DB Provider. Browse and select the Access database. In case there is a database password associated with the MS Access database, go to the All tab and enter the database password at the Jet OLEDB: Database Password as shown in Figure 4.
By clicking the Test Connection button, you can verify whether the connection manager is correct. Clicking the OK button will create the connection manager. You will see a connection manager at the bottom of the screen. You can rename the connection manager by right-clicking it.
|
In SQL Server 2005, if you want to access an Excel 2007 file from SSIS, it is not straightforward.
If you want to import an
Excel file, you can use Excel Source from the Data Flow Sources in SQL
Server Integration Services (SSIS) and select the correct version from
the available list. In SQL Server 2005, you will find only an Excel 3.0,
4.0, 5.0, and 97–2005 versions. To access an Excel 2007 file, you need
to configure an OLE DB Source and select the provider Native OLE DB\Microsoft Office 12.0 Access Database Engine OLE DB Provider and extended properties as Excel 12.0.
This is much easier in SQL
Server 2008 inasmuch as you have the additional version of Excel 2007 in
the Excel source control and you simply have to select that version
along with the filename.
|
SSIS provides several data flow controls that can fit your requirements (see Table 1).
Table 1. Data Flow ControlsOperation | Data Flow Controls |
---|
Union Two Data sets from SQL Server and MS Access | Union | Get Customer Name from Access database where you have the ID in the SQL Server database | Lookup | Inner Join | Merge Join | Outer Join | Merge Join | Full Outer Join | Merge Join |
Exercise 3
will illustrate how to create an SSIS package to extract a customer
name from the MS Access database where you have the Customer ID in the
SQL Server table. Let us assume that the SQL Server and MS Access table
names are Customer.
Exercise 3. Sample Ssis Packge to Get Selected Columns From Remote Database
Create an SSIS project and a package as mentioned in Exercise 1. Create two connection managers, one for MS Access and one for MS SQL Server databases, as mentioned in Exercise 2. Drag and drop a Data Flow Control
from the data flow tasks. By default, the data flow will take Data Flow
Task as its name and you can rename it any name you want. Double-click the dragged control and you will be taken to the data flow control. Drag and drop and an OLE DB source and rename it Access, as we are going to configure it to access the MS Access database. Double-click the Access OLE DB source to configure the access OLE DB source and you will be taken to Figure 5.
First
select the previously created Access OLE DB connection manager. Select
the data access mode as table or view. You have the option of writing a
query by selecting SQL Command as the data access mode. Next, select the required table from the table or view list. From the column list you can select only the wanted columns so that unnecessary columns will not be used. Add another OLE DB Source to configure SQL Server. The configuration will be the same as specified for MS Access. For
the Merge Join, you need sorted data sources; hence you need to add two
Sort data flow transformation controls and connect each OLE DB source
to sort and select the columns you need to order. Those columns should
be the ones that are going to be joined. Next, add another data flow transformation control named Merge Join and connect both data sources to it. Double-click the Merge Join control and configure as shown in Figure 6.
You have three join types, Inner Join, Left Outer Join, and Fuller Outer Join, to suit your requirements. Next you have to select the columns you need to add to the data flow, and finally your data flow should look like Figure 7.
|
With implementation of SSIS, you will have the features of package configuration, package logging, and checkpoint configuration.
|