There are several ways of moving large amounts of
data either within a single server or between servers. Importing and
exporting tasks vary in SQL Server, and each task has an appropriate
implementation. You can move data by using two techniques:
Connected— Connected techniques, often called “live” updates, occur directly against the database.
Disconnected— Disconnected techniques, in general, store data into a file as an intermediate step in the transfer.
One
of the easier disconnected methods to use is a backup/restore that
involves the backing up of data from a database and the subsequent
restore to a different database or server. This technique is often used
as a method of initiating replication or log shipping. This is not a
selective technique, however, and other techniques are better suited to
importing and exporting information. In a similar fashion, you can use
the Copy Database Wizard as a connected mechanism to copy objects from
one server to another, but again, this is not usually best suited to
selectively importing and exporting.
The import and export of data to and from a file can use one of three basic methods: the bcp command, the BULK INSERT statement, and the INSERT ... SELECT * FROM OPENROWSET(BULK)
statement. Other methods can also be used for import and export, as
discussed later in this chapter. This chapter also discusses the use of
bcp to export data from SQL Server. A bcp command can export data from anywhere that a SELECT statement works, including partitioned views.
Tip
You can use bcp
with no options. When you do, you are prompted for the information
needed for the format file. You can then optionally save the format
file. The resulting files can be used as input for any of the three
operations bcp, OPENROWSET, and BULK INSERT.
The following section discusses a traditional mechanism for moving data: using bcp.
Using the Bulk Copy Program (BCP)
You
most often use the Bulk Copy Program (BCP) when the database can be
temporarily taken out of production to import large amounts of data. Of
course, BCP also allows for the export of data, in which case the
database can remain online. Using BCP for importing data is extremely
fast. As the name indicates, BCP is a process used to copy large
amounts of data.
BCP
is not a specific implementation of SQL Server. The BCP command-line
tool is often used from batch files. BCP’s biggest asset is its speed.
It moves information into and out of a database by using data files.
BCP is normally used to do the following:
Bulk copy from a table, view, or result set into a native-mode data file in the same format as the table or view.
Bulk
copy from a table, view, or result set into a data file in a format
other than the one that the table or view is in. In this instance, a
format file is created that defines the characteristics of each column.
If all columns are converted to character format, the resulting file is
called a character-mode data file.
Bulk copy from a data file into a table or view. A format file can be used to determine the layout of the data file.
Load data into program variables and then import the data into a table or view one row at a time.
As
stated earlier, BCP is best suited for quickly loading data into a
database. BCP does not create tables. You must have a table set up and
waiting before you run BCP. The basic syntax of the bcp command is as follows:
bcp <table|query> <in|out|queryout|format> filename
To avoid unnecessary confusion, this chapter avoids a lot of the specifics of each implementation of bcp and variation of the coding because you do not need to know about them for the 70-431 exam.
Exam Alert
To
prepare for the 70-431 exam, you should know several standard BCP
operations, as discussed in this chapter. Putting data into sequence,
use of a format file, and what happens in default scenarios with no
options specified are all potential topic areas for the exam.
The bcp
command provides a set of switches you can use to specify the options
to be used for the operation. If you do not supply any switches that
define the field information, you are prompted for formatting
information that you can then optionally save into a format file. The
prompts ask you to define the data type for each field, the field
prefix length, and the field terminator.
If you allow the defaults for the fields, the data type will default to the type supplied by the switches (-c for character, -w for Unicode, -n for native, and -N
for native Unicode for character data types). A tab is the default
field delimiter, and a newline (carriage return) is the default record
delimiter.
When
exporting data, you can easily select all data to be exported. If you
want to filter horizontally (that is, only some rows), filter
vertically (that is, only some columns), or save the data in a specific
order within the file, you use queryout and define a SQL query to be used as the basis for the export. The column list of the SELECT statement provides the vertical filter, the WHERE clause provides the horizontal filter, and the ORDER BY statement arranges the data as desired.
Importing Data with BCP Format File
When
you use BCP in a controlled fashion, the process utilizes two separate
files: One contains the data and the other, the format file, contains
the definition of the data. You can use the format file during an
import operation to help define the properties of the data being
imported. Using the bcp command with a format file would look like this:
BCP ONE.dbo.CustCopy IN C:\CustTwoBCP.dat -T -f C:\BCP2.fmt
At the
top of the format file are two header rows: The first identifies the
SQL version (SQL Server 2005 is version 9.0), and the second contains
the number of fields stored in the file. Each field is allocated a row
within the format file and includes the data type, field prefix length,
field size, field delimiter, field name, and collating sequence.
Microsoft
recommends that you include the collating sequence for every field.
During the actual import, you have limited control over the process,
but before the import, you can alter the data and format files. After
you start the BCP process to import the data, the process proceeds
unabated.
You
can have a minimal amount of control over the reactions to an import
through the use of import hints. Constraints are usually ignored during
BCP imports. To ensure that data conforms to constraints, you can
supply the CHECK_CONSTRAINTS hint,
although it significantly slows the process. Also, the server ignores
insert triggers by default during import; if you want these triggers to
fire, you must specify the FIRE_TRIGGERS hint.
Note
Because
constraints are by default not applied and triggers do not fire, BCP
activities could affect the standard data and work flows in a
production environment. Make sure you take into consideration any
trigger activities that should take place. Also, pre-check imported
data for errors.
You can aid in improving the performance of BCP by using the ORDER
hint. This hint identifies the order of the data within the data file.
It has no effect on the order of the data, and it is ignored when it
cannot be of any benefit. You can also use the TABLOCK
hint to change the default locking behavior from row-level locking.
Table level locks perform much more efficiently than row-level locks.
For large files with more than about a couple thousand rows, you should turn on the SELECT INTO/BULKCOPY option for the database or set the database recovery mode to SIMPLE.
Either of these methods disables all transaction log backups while they
are turned on, and you must do a full backup to get transaction log
backups to work afterward. These methods affect certain operations—that
is, those involving SELECT INTO and BULK COPY—by changing how transaction logging works.
If you set the recovery mode to SIMPLE, it is recommended that when you have completed your process, you turn it back to FULL and perform a full database backup.
Using BULK INSERT as an Alternative to BCP
A closely related technique to using BCP involves the use of BULK INSERT.
This is really just an extension of the BCP functionality that you can
use to perform bulk operations from within a query environment without
having to use the command prompt.
The BULK INSERT statement implements part of BCP inside SQL Server, so it has all the speed of BCP, but an easier-to-use interface. BULK INSERT allows for the copying of a data file into a database table or view with a user-specified format. However, the BULK INSERT statement cannot copy data from SQL Server out to a data file.
Although BULK INSERT
uses most of the same options as BPC, it doesn’t need to know which
server to use or what security to use because you use it from within
T-SQL, so it runs on that server, with the security context with which
you logged in. Using BULK INSERT
for importing data offers identical functionality as BCP. You could use
the following, for example, to import data in a similar fashion to the
way you use bcp:
BULK INSERT ONE.dbo.CustCopy
FROM C:\CustTwoBCP.dat
WITH FORMATFILE = C:\BCP2.fmt
Using OPENROWSET for Importing Data
You can use OPENROWSET
to gain access to any OLE DB data source that will return a rowset.
However, you should use it only for data that is accessed infrequently
to avoid the overhead of its use. In situations in which a data source
needs to be accessed repeatedly, a linked server should be defined.
Besides the associated overhead, OPENROWSET does not provide the functionality of linked server definitions.
You use a format file with the OPENROWSET(BULK...) option in the same manner in which you would for a bcp or BULK INSERT operation. Here is an example:
SELECT A.* FROM OPENROWSET(BULK N'C:\CustTwoBCP.dat',
FORMATFILE = 'C:\bcp2.fmt') AS A
In
this example, the user must have appropriate permissions to access the
file. When this user accesses other OLE DB data sources, he or she must
provide connection information to gain access to the external data.
You can import XML data as a single document by using OPENROWSET with the SINGLE_BLOB directive. This directive causes the data to be read as a single-row, single-column result set. The data is treated as a varbinary(max) data type, which can be used as XML without conversion. This is not the same as storing the data in a field that uses the XML data type and is often a preferred technique. You could also use the SINGLE_CLOB or SINGLE_NCLOB
to perform the action with character or Unicode data, which could then
be cast directly into an XML variable or database column.
Large-scale
importing and exporting of data are regularly recurring operations in
most production environments. For repeated access to another data
source, you should configure and use a linked server. For solutions
that involve data extraction, transformation, and loading (ETL) on
regular basis, you should use SQL Server Integration Services (SSIS),
as described in the following section.
Using SQL Server Integration Services (SSIS)
Many
import and export activities involve much more than reading data from
one location and pushing it into another. Some form of alteration or
massaging of the data is usually needed between the source and
destination. In previous versions of SQL Server, this is where Data
Transformation Services (DTS) came into play.
Like
many of the other SQL Server features, DTS has undergone a
transformation. Now known as SSIS, it has become a full-featured
platform for handling data transfer. Figure 1 shows an example of a package saved in SSIS.