History of Data Transfer in SQL Server
Before we jump right
into Integration Services, it is important to review the historical
context for the Integration Services design goals. Ten years ago in SQL
Server 6.5, one of the few ways to import data to or export data from
SQL Server outside of T-SQL and Bulk Insert was through the command-line
bcp utility. Although bcp did the job, it exported in a simple text
format by default and was difficult to automate. To master bcp, you had to go to the command line, set many parameters, and usually set up schema (format) files. In addition, bcp provided no easy way to manipulate data going into or out of SQL Server.
When SQL Server 7.0
shipped in 1998, we were first introduced to Data Transformation
Services. DTS was a robust and easy-to-use replacement for bcp built on OLE DB technology. (Of course, bcp
still ships with SQL Server, as does DTS.) With DTS, you could easily
import and export between SQL Server and any OLE DB data source. You
could also move data from one non-SQL Server–based data source to
another, so moving data from a text file to SQL Server or from Microsoft
Access to Oracle was simple—all you had to do was use DTS on your local
machine or remotely on a server. In addition, you could apply
transformations to your data and save the functionality into one
location called a DTS Package. DTS shipped with a wizard and interface
that made the package creation process pretty straightforward. This
opened a new door that bcp had kept nailed shut.
DTS Packages
DTS was a huge
improvement over its predecessors and whatever else was available in the
marketplace at the time, so millions of DTS packages were developed
around the world. Developers set up packages to process sets of data in a
succession of steps. SQL Server 2000 expanded on this methodology with
performance and wizard improvements, but packages were still created in
SQL Server Enterprise Manager. Developers now had many packages with
several steps in them, but they quickly learned that logic and control
flow was still far from perfect, and no form of debugging was available.
Developers could make steps in a package dependent on each other and
run simple ActiveX scripts as steps in a package, but other than that,
they had to turn to third-party components or fancy T-SQL hacks to get
the control flow that they desired.
Working with Integration Services Packages
SQL Server 2005 ships
with SQL Server Integration Services. Integration Services is a
replacement for DTS that extends the same metaphor of a package, but the
designer has been radically changed to use Visual Studio and to focus
on controlling the flow of the data, giving the developer more control
over the design of the package. In this section, we introduce the
concepts you must understand to work with Integration Services.
Control Flow
Control flow is the new
model for representing the process embodied by the package. When you
open a package in the Integration Services designer, the first thing you
see is the control flow design surface. The visual representation of
control flow is a flowchart. The shapes in the flowchart are called tasks. Tasks are the atomic units of execution in an Integration Services package. The lines connecting tasks are the precedence constraints, which govern the order in which tasks are executed.
In addition to tasks, the control flow can contain a special type of object called a container.
You can think of containers as very tightly built-in tasks; they are
internally integrated with the package execution engine and can provide
some interesting features above and beyond what is possible for tasks.
The Sequence container supports nesting of the control flow. When you
add a Sequence container to your package, it is as if you have another
package within the current package—you can add tasks and precedence
constraints, all scoped within that container. In fact, the package
itself is a top-level Sequence container. The For Loop container
supports iterative execution of the control flow in the container until
the loop expression remains true. The Foreach Loop container supports
looping over a collection of objects supplied by the chosen Foreach Loop
enumerator. Figure 1 shows a control flow containing a Sequence container and use of precedence constraints.
The control flow can
also contain a few other things to help you control execution of your
process. The first thing you need to do with almost any package is to
define connections to be used by your tasks. Structurally, there is a Connections
collection associated with the package, where you can define and manage
all the connections that the package uses. When you have a complete
package, which as experience shows often becomes large to handle the
complex extract, transform, and load (ETL) processes that must be
performed, you can appreciate the benefit of centrally controlling
properties of your connections in one spot. Compare this to DTS, where
DTS tasks must often be individually configured with connection
information.
Variables are
another important feature. For every Sequence container you have a
collection of variables. You can set the data type, the initial value,
and a few other properties on the variable. During execution, tasks can
read and write variables. Precedence constraints that control ordering
of execution of tasks can also refer to variables. Using variables, you
can specify that a particular task be executed if the preceding task had
succeeded and the value of a specific variable meets a certain
condition.
Major Tasks
Integration Services
comes with more than 20 items in the Control Flow toolbox. Let’s take a
look at some of the most significant and commonly used tasks.
Execute SQL task
Runs queries against databases. This task is very flexible and supports
many features, such as parameterized queries, input and output
parameters for stored procedures, and multiple ways of returning
results.
Execute Process task Executes a program. This is helpful for launching various command-line utilities from the package.
Execute Package task Executes another Integration Services package, either in the same or a different process space.
File System task Performs various file system operations.
FTP task Transfers files between systems using File Transfer Protocol (FTP).
Script task Executes custom scripts written in Microsoft Visual Basic .NET.
Data Flow task
Transfers and transforms data between multiple sources and
destinations. This is the biggest task in the product and is the subject
of the next section.