DATABASE

Transact-SQL in SQL Server 2008 : Change Data Capture (part 1)

8/6/2012 6:03:16 PM
In SQL Server 2008, Microsoft introduced a new feature called Change Data Capture (CDC), which is designed to make it much easier and less resource intensive to identify and retrieve changed data from tables in an online transaction processing (OLTP) database. In a nutshell, CDC captures and records INSERT, UPDATE, and DELETE activity in an OLTP database and stores it in a form that is easily consumed by an application, such as a SQL Server Integration Services (SSIS) package.

In the past, capturing data changes for your tables for auditing or extract, transform, and load (ETL) purposes required using replication, time stamp columns, triggers, complex queries, or expensive third-party tools. None of these other methods are easy to implement, and many of them use a lot of server resources, negatively affecting the performance of the OLTP server.

Change Data Capture provides for a more efficient mechanism for capturing the data changes in a table.

Note

Change Data Capture is available only in the SQL Server 2008 Developer, Enterprise, and Datacenter Editions.


The source of change data for Change Data Capture is the SQL Server transaction log. As inserts, updates, and deletes are applied to tables, entries that describe those changes are added to the transaction log. When Change Data Capture is enabled for a database, a SQL Server Agent capture job is created to invoke the sp_replcmds system procedure. This procedure is an internal server function and is the same mechanism used by transactional replication to harvest changes from the transaction log.

Note

If replication is already enabled for the database, the transactional log reader used for replication is also used for CDC. This strategy significantly reduces log contention when both replication and Change Data Capture are enabled for the same database.


The principal task of the Change Data Capture process is to scan the log and identify changes to data rows in any tables configured for Change Data Capture. As these changes are identified, the process writes column data and transaction-related information to the Change Data Capture tables. The changes can then be read from these change tables to be applied as needed.

The Change Data Capture Tables

When CDC is enabled for a database and one or more tables, an associated Change Data Capture table is created for each table being monitored. The Change Data Capture tables are used to store the changes made to the data in corresponding source tables, along with some metadata used to track the changes. By default, the name of the CDC change table is schemaname_tablename_CT and is based on the name of the source table.

The first five columns of a Change Data Capture change table are metadata columns and contain additional information relevant to the recorded change:

  • __$start_lsn— Identifies the commit log sequence number (LSN) assigned to the change. This value can be used to determine the order of the transactions.

  • __$end_lsn— Is currently not used and in SQL Server 2008 is always NULL.

  • __$seqval— Can be used to order changes that occur within the same transaction.

  • __$operation— Records the operation associated with the change: 1 = delete, 2 = insert, 3 = update before image(delete), and 4 = update after image(insert)

  • __$update_mask— Is a variable bit mask with one defined bit for each captured column to identify what columns were changed. For insert and delete entries, the update mask always has all bits set. Update rows have the bits set only for the columns that were modified.

The remaining columns in the Change Data Capture change table are identical to the columns from the source table in name and type and are used to store the column data gathered from the source table when an insert, update, or delete operation is performed on the table.

For every row inserted into the source table, a single row a single row is inserted into the change table, and this row contains the column values inserted into the source table. Every row deleted from the source table is also inserted as a single row into the change table but contains the column values in the row before the delete operation. An update operation is captured as a delete followed by an insert, so two rows are captured for each update: one row entry to capture the column values before the update, and a second row entry to capture the column values after the update.

In addition to the Change Data Capture tables, the following Change Data Capture metadata tables are also created:

  • cdc.change_tables— Contains one row for each change table in the created when Change Data Capture is enabled on a source table.

  • cdc.index_columns— Contains one row for each index column used by Change Data Capture to uniquely identify rows in the source table. By default, this is the column of the primary key of the source table, but a different unique index on the source table can be specified when Change Data Capture is enabled on the source table. A primary key or unique index is required on the source table only if Net Change Tracking is enabled.

  • cdc.captured_columns— Contains one row for each column tracked in each source table. By default, all columns of the source table are captured, but you can include or exclude columns when enabling Change Data Capture for a table by specifying a column list.

  • cdc.ddl_history— Contains a row for each Data Definition Language (DDL) change made to any table enabled for Change Data Capture. You can use this table to determine when a DDL change occurred on a source table and what the change was.

  • cdc.lsn_time_mapping— Contains a row for each transaction stored in a change table and is used to map between log sequence number (LSN) commit values and the actual time the transaction was committed.

Although you can query the Change Data Capture tables directly, it is not recommended. Instead, you should use the Change Data Capture functions, which are discussed later.

All these objects associated with a CDC instance are created in the special schema called cdc when Change Data Capture is enabled for a database.

Enabling CDC for a Database

Before you can begin capturing data changes for a table, you must first enable the database for Change Data Capture. You do this by running the stored procedure sys.sp_cdc_enable_db within the desired database context. When a database is enabled for Change Data Capture, the cdc schema, cdc user, metadata tables, as well as the system functions, are used to query for change data.

Note

To determine whether a database is already enabled for CDC, you can check the value in the is_cdc_enabled column in the sys.databases catalog view. A value of 1 indicates that CDC is enabled for the specified database.


The following SQL code enables CDC for the AdventureWorks2008R2 database and then checks that CDC is enabled by querying the sys.databases catalog view:

use AdventureWorks2008R2
go
exec  sys.sp_cdc_enable_db
go
select is_cdc_enabled
from sys.databases
where name = 'AdventureWorks2008R2'
go

is_cdc_enabled
--------------
1

Note

Although the examples presented here are run against the AdventureWorks2008R2 database, they can also be run against the AdventureWorks2008 database. However, you should be aware that some of the column values displayed may not be exactly the same.


Enabling CDC for a Table

When the database is enabled for Change Data Capture, you can use the sys.sp_cdc_enable_table stored procedure to enable a Change Data Capture instance for any tables in that database. The sp_cdc_enable_Table stored procedure supports the following parameters:

  • @source_schema— Specifies the name of the schema in which the source table resides.

  • @source_name— Specifies the name of the source table.

  • @role_name— Indicates the name of the database role used to control access to Change Data Capture tables. If this parameter is set to NULL, no role is used to limit access to the change data. If the specified role does not exist, SQL Server creates a database role with the specified name.

  • @capture_instance— Specifies the name of the capture instance used to name the instance-specific Change Data Capture objects. By default, this is the source schema name plus the source table name in the format schemaname_sourcename. A source table can have a maximum of two capture instances.

  • @supports_net_changes— Is set to 1 or 0 to indicate whether support for querying for net changes is to be enabled for this capture instance. If this parameter is set to 1, the source table must have a defined primary key, or an alternate unique index must be specified for the @index_name parameter.

  • @index_name— Specifies the name of a unique index to use to uniquely identify rows in the source table.

  • @captured_column_list— Specifies the source table columns to be included in the change table. By default, all columns are included in the change table.

  • @filegroup_name— Specifies the filegroup to be used for the change table created for the capture instance. If this parameter is NULL or not specified, the default filegroup is used. If possible, it is recommended you create a separate filegroup from your source tables for the Change Data Capture change tables.

  • @allow_partition_switch— Indicates whether the SWITCH PARTITION command of ALTER TABLE can be executed against a table that is enabled for Change Data Capture. The default is 1 (enabled). If any partition switches occur, Change Data Capture does not track the changes resulting from the switch. This causes data inconsistencies when the change data is consumed.

The @source_schema, @source_name, and @role_name parameters are the only required parameters. All the others are optional and apply default values if not specified.

To implement basic change data tracking for a table, let’s first create a copy of the Customer table to play around with:

select * into MyCustomer from Sales.Customer
alter table MyCustomer add Primary key (CUstomerID)

Now, to enable CDC on the MyCustomer table, you can execute the following:

EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name   = N'MyCustomer',
@role_name     = NULL

Note

If this is the first time you are enabling CDC for a table in the database, you may see the following messages, which indicate that SQL Server is enabling the SQL Agent jobs to begin capturing the data changes in the database:

Job 'cdc.AdventureWorks2008R2_capture' started successfully.
Job 'cdc.AdventureWorks2008R2_cleanup' started successfully.

The Capture job that is created generally runs continuously and is used to move changed data to the CDC tables from the transaction log. The Cleanup job runs on a scheduled basis to remove older data from the CDC tables so that they don’t grow too large. By default, it automatically removes data that is more than three days old. The properties of these jobs can be viewed and modified using the sys.sp_cdc_help_jobs and sys.sp_cdc_change_job procedures, respectively.


To determine whether or not a source table has been enabled for Change Data Capture, you can query the is_tracked_by_cdc column in the sys.tables catalog view for that table:

select is_tracked_by_cdc
from sys.tables
where name = 'MyCustomer'
go

is_tracked_by_cdc
-----------------
1

Tip

To get information on which tables are configured for CDC and what the settings for each are, you can execute the sys.sp_cdc_help_change_data_capture stored procedure. It reports the name and ID of the source and Change Tracking tables, the CDC table properties, the columns included in the capture, and the date the CDC was enabled/created for the source table.

Other  
  •  Transact-SQL in SQL Server 2008 : Spatial Data Types (part 3) - Spatial Data Support in SSMS
  •  Transact-SQL in SQL Server 2008 : Spatial Data Types (part 2) - Working with Geography Data
  •  Transact-SQL in SQL Server 2008 : Spatial Data Types (part 1) - Representing Spatial Data, Working with Geometry Data
  •  Transact-SQL in SQL Server 2008 : Sparse Columns
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 4) - Analysis Services CLR Support: Server-Side ADO MD.NET
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 3) - XMLA at Your Service
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 2) - OLAP Development with ADO MD.NET
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 1) - Management Studio as an MDX Client
  •  Microsoft SQL Server 2008 R2 : Using FILESTREAM Storage (part 2) - Using FILESTREAM Storage for Data Columns
  •  Microsoft SQL Server 2008 R2 : Using FILESTREAM Storage (part 1) - Enabling FILESTREAM Storage
  •  
    Video
    Top 10
    SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
    The latest Audi TT : New angles for TT
    Era of million-dollar luxury cars
    Game Review : Hearthstone - Blackrock Mountain
    Game Review : Battlefield Hardline
    Google Chromecast
    Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
    Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
    Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
    Michael Kors Designs Stylish Tech Products for Women
    REVIEW
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    Popular Tags
    Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone