SQL Server 2005 : Importing and Exporting Data

10/11/2010 2:09:56 PM

There are several ways of moving large amounts of data either within a single server or between servers. Importing and exporting tasks vary in SQL Server, and each task has an appropriate implementation. You can move data by using two techniques:

Connected— Connected techniques, often called “live” updates, occur directly against the database.
Disconnected— Disconnected techniques, in general, store data into a file as an intermediate step in the transfer.

One of the easier disconnected methods to use is a backup/restore that involves the backing up of data from a database and the subsequent restore to a different database or server. This technique is often used as a method of initiating replication or log shipping. This is not a selective technique, however, and other techniques are better suited to importing and exporting information. In a similar fashion, you can use the Copy Database Wizard as a connected mechanism to copy objects from one server to another, but again, this is not usually best suited to selectively importing and exporting.

The import and export of data to and from a file can use one of three basic methods: the bcp command, the BULK INSERT statement, and the INSERT ... SELECT * FROM OPENROWSET(BULK) statement. Other methods can also be used for import and export, as discussed later in this chapter. This chapter also discusses the use of bcp to export data from SQL Server. A bcp command can export data from anywhere that a SELECT statement works, including partitioned views.

Tip

You can use bcp with no options. When you do, you are prompted for the information needed for the format file. You can then optionally save the format file. The resulting files can be used as input for any of the three operations bcp, OPENROWSET, and BULK INSERT.

The following section discusses a traditional mechanism for moving data: using bcp.

Using the Bulk Copy Program (BCP)

You most often use the Bulk Copy Program (BCP) when the database can be temporarily taken out of production to import large amounts of data. Of course, BCP also allows for the export of data, in which case the database can remain online. Using BCP for importing data is extremely fast. As the name indicates, BCP is a process used to copy large amounts of data.

BCP is not a specific implementation of SQL Server. The BCP command-line tool is often used from batch files. BCP’s biggest asset is its speed. It moves information into and out of a database by using data files. BCP is normally used to do the following:

Bulk copy from a table, view, or result set into a native-mode data file in the same format as the table or view.
Bulk copy from a table, view, or result set into a data file in a format other than the one that the table or view is in. In this instance, a format file is created that defines the characteristics of each column. If all columns are converted to character format, the resulting file is called a character-mode data file.
Bulk copy from a data file into a table or view. A format file can be used to determine the layout of the data file.
Load data into program variables and then import the data into a table or view one row at a time.

As stated earlier, BCP is best suited for quickly loading data into a database. BCP does not create tables. You must have a table set up and waiting before you run BCP. The basic syntax of the bcp command is as follows:

bcp <table|query> <in|out|queryout|format> filename

To avoid unnecessary confusion, this chapter avoids a lot of the specifics of each implementation of bcp and variation of the coding because you do not need to know about them for the 70-431 exam.

Exam Alert

To prepare for the 70-431 exam, you should know several standard BCP operations, as discussed in this chapter. Putting data into sequence, use of a format file, and what happens in default scenarios with no options specified are all potential topic areas for the exam.

The bcp command provides a set of switches you can use to specify the options to be used for the operation. If you do not supply any switches that define the field information, you are prompted for formatting information that you can then optionally save into a format file. The prompts ask you to define the data type for each field, the field prefix length, and the field terminator.

If you allow the defaults for the fields, the data type will default to the type supplied by the switches (-c for character, -w for Unicode, -n for native, and -N for native Unicode for character data types). A tab is the default field delimiter, and a newline (carriage return) is the default record delimiter.

When exporting data, you can easily select all data to be exported. If you want to filter horizontally (that is, only some rows), filter vertically (that is, only some columns), or save the data in a specific order within the file, you use queryout and define a SQL query to be used as the basis for the export. The column list of the SELECT statement provides the vertical filter, the WHERE clause provides the horizontal filter, and the ORDER BY statement arranges the data as desired.

Importing Data with BCP Format File

When you use BCP in a controlled fashion, the process utilizes two separate files: One contains the data and the other, the format file, contains the definition of the data. You can use the format file during an import operation to help define the properties of the data being imported. Using the bcp command with a format file would look like this:

BCP ONE.dbo.CustCopy IN C:\CustTwoBCP.dat -T -f C:\BCP2.fmt

At the top of the format file are two header rows: The first identifies the SQL version (SQL Server 2005 is version 9.0), and the second contains the number of fields stored in the file. Each field is allocated a row within the format file and includes the data type, field prefix length, field size, field delimiter, field name, and collating sequence.

Microsoft recommends that you include the collating sequence for every field. During the actual import, you have limited control over the process, but before the import, you can alter the data and format files. After you start the BCP process to import the data, the process proceeds unabated.

You can have a minimal amount of control over the reactions to an import through the use of import hints. Constraints are usually ignored during BCP imports. To ensure that data conforms to constraints, you can supply the CHECK_CONSTRAINTS hint, although it significantly slows the process. Also, the server ignores insert triggers by default during import; if you want these triggers to fire, you must specify the FIRE_TRIGGERS hint.

Note

Because constraints are by default not applied and triggers do not fire, BCP activities could affect the standard data and work flows in a production environment. Make sure you take into consideration any trigger activities that should take place. Also, pre-check imported data for errors.

You can aid in improving the performance of BCP by using the ORDER hint. This hint identifies the order of the data within the data file. It has no effect on the order of the data, and it is ignored when it cannot be of any benefit. You can also use the TABLOCK hint to change the default locking behavior from row-level locking. Table level locks perform much more efficiently than row-level locks.

For large files with more than about a couple thousand rows, you should turn on the SELECT INTO/BULKCOPY option for the database or set the database recovery mode to SIMPLE. Either of these methods disables all transaction log backups while they are turned on, and you must do a full backup to get transaction log backups to work afterward. These methods affect certain operations—that is, those involving SELECT INTO and BULK COPY—by changing how transaction logging works.

If you set the recovery mode to SIMPLE, it is recommended that when you have completed your process, you turn it back to FULL and perform a full database backup.

Using BULK INSERT as an Alternative to BCP

A closely related technique to using BCP involves the use of BULK INSERT. This is really just an extension of the BCP functionality that you can use to perform bulk operations from within a query environment without having to use the command prompt.

The BULK INSERT statement implements part of BCP inside SQL Server, so it has all the speed of BCP, but an easier-to-use interface. BULK INSERT allows for the copying of a data file into a database table or view with a user-specified format. However, the BULK INSERT statement cannot copy data from SQL Server out to a data file.

Although BULK INSERT uses most of the same options as BPC, it doesn’t need to know which server to use or what security to use because you use it from within T-SQL, so it runs on that server, with the security context with which you logged in. Using BULK INSERT for importing data offers identical functionality as BCP. You could use the following, for example, to import data in a similar fashion to the way you use bcp:

BULK INSERT ONE.dbo.CustCopy
 FROM C:\CustTwoBCP.dat
 WITH FORMATFILE = C:\BCP2.fmt

Using OPENROWSET for Importing Data

You can use OPENROWSET to gain access to any OLE DB data source that will return a rowset. However, you should use it only for data that is accessed infrequently to avoid the overhead of its use. In situations in which a data source needs to be accessed repeatedly, a linked server should be defined. Besides the associated overhead, OPENROWSET does not provide the functionality of linked server definitions.

You use a format file with the OPENROWSET(BULK...) option in the same manner in which you would for a bcp or BULK INSERT operation. Here is an example:

SELECT A.* FROM OPENROWSET(BULK N'C:\CustTwoBCP.dat',
 FORMATFILE = 'C:\bcp2.fmt') AS A

In this example, the user must have appropriate permissions to access the file. When this user accesses other OLE DB data sources, he or she must provide connection information to gain access to the external data.

You can import XML data as a single document by using OPENROWSET with the SINGLE_BLOB directive. This directive causes the data to be read as a single-row, single-column result set. The data is treated as a varbinary(max) data type, which can be used as XML without conversion. This is not the same as storing the data in a field that uses the XML data type and is often a preferred technique. You could also use the SINGLE_CLOB or SINGLE_NCLOB to perform the action with character or Unicode data, which could then be cast directly into an XML variable or database column.

Large-scale importing and exporting of data are regularly recurring operations in most production environments. For repeated access to another data source, you should configure and use a linked server. For solutions that involve data extraction, transformation, and loading (ETL) on regular basis, you should use SQL Server Integration Services (SSIS), as described in the following section.

Using SQL Server Integration Services (SSIS)

Many import and export activities involve much more than reading data from one location and pushing it into another. Some form of alteration or massaging of the data is usually needed between the source and destination. In previous versions of SQL Server, this is where Data Transformation Services (DTS) came into play.

Like many of the other SQL Server features, DTS has undergone a transformation. Now known as SSIS, it has become a full-featured platform for handling data transfer. Figure 1 shows an example of a package saved in SSIS.

Figure 1. A sample SSIS package.

This package illustrates both the transactional and workflow capabilities of SSIS. Operations can be performed within transaction boundaries. If errors occur, then rollback processing and additional flow can be designed. In the event of successful execution, alternative processing can occur.

You can easily use SSIS to perform a number of common ETL activities. It can also assist in automating many of the data transfer tasks already discussed in this chapter. The following are some of the most common uses of SSIS:

Merging data from heterogeneous data stores— SSIS can be used to take data from different systems, extract the data, and then merge the data into a consistent source.
Populating data warehouses and data marts— With SSIS, updates of large amounts of data can be performed frequently.
Cleaning and standardizing data— With SSIS, abbreviations and data structures (names and addresses) can be processed to agree with corporate conventions.
Building business intelligence into data transformation— SSIS can be used for data summarization, conversion, or distribution based on data values.
Automating Data Transfers— Rarely are data transfer processes a one-time thing. SSIS packages can be stored and executed when needed. They can also be scheduled using SQL Server Agent jobs.

SSIS is a diverse tool that can be applied to almost any data movement process. Most of the mechanisms that you use to import and export data by using SQL Server Management Studio now use SSIS principles. When you select to save these data transfers, you save an SSIS package.

Exam Alert

Although SSIS is a versatile and powerful tool for importing and exporting data, it is not a particular focus of the 70-431 exam, and you are unlikely to get any questions dealing with the specifics of its use. You need to recognize only the basics of where it would be used and what it can do.

You use the SQL Server Business Intelligence Development Studio to create and edit SSIS packages. You can use it to provide a full workflow definition as well as data flow definitions and event handlers. Anyone comfortable with DTS will see many similarities in SSIS, although the tool does have a significant amount of improved functionality.

Other

Understanding Snapshot Isolation

SQL Server 2008 : Programming Objects - Implementing Triggers

SQL Server 2008 : Programming Objects - Implementing Stored Procedures

SQL Server 2008 Command-Line Utilities : The sqlservr Command-Line Utility

SQL Server 2008 Command-Line Utilities : The sqldiag Command-Line Utility

SQL Server 2008 Command-Line Utilities : The bcp Command-Line Utility

SQL Server 2008 Command-Line Utilities : The tablediff Command-Line Utility

SQL Server 2008 : Programming Objects - Implementing Functions

Surviving Changes to Columns

Surviving Changes to the Signature of a Stored Procedure