DATABASE

Transact-SQL in SQL Server 2008 : Change Data Capture (part 2) - Querying the CDC Tables

8/6/2012 6:04:34 PM

Querying the CDC Tables

After you enable change data tracking for a table, SQL Server begins capturing any data changes for the table in the Change Data Capture tables. To identify the data changes, you need to query the Change Data Capture tables. Although you can query the Change Data Capture tables directly, it is recommended that you use the CDC functions instead. The main CDC table-valued functions (TVFs) are

  • cdc.fn_cdc_get_all_changes_capture_instance

  • cdc.fn_cdc_get_net_changes_capture_instance

Note

The Change Data Capture change table and associated CDC table-valued functions created along with it constitute what is referred to as a capture instance. A capture instance is created for every source table that is enabled for CDC.

Each capture instance is given a unique name based on the schema and table names. For example, if the table named sales.products is CDC enabled, the capture instance created is named sales_products. The name of the CDC change table within the capture instance is sales_products_CT, and the names of the two associated CDC query functions are cdc.fn_cdc_get_all_changes_sales_products and cdc.fn_cdc_get_net_changes_sales_products.


Both of the CDC table-valued functions require two parameters to define the range of log sequence numbers to use as the upper and lower bounds to determine which records are to be included in the returned result set. A third required parameter, the row_filter_option, specifies the content of the metadata columns as well as the rows to be returned in the result set. Two values can be specified for the row_filter for the cdc.fn_cdc_get_all_changes_capture_instance function: "all" and "all update old".

If "all" is specified, the function returns all changes within the specified log sequence number (LSN) range. For changes due to an update operation, only the row containing the new values after the update are returned. If "all update old" is specified, the function returns all changes within the specified LSN range. For changes due to an update operation, this option returns both the before and after update copies of the row.

For the cdc.fn_cdc_get_net_changes_capture_instance function, three values can be specified for the row_filter parameter: "all", "all with mask", and "all with merge". If "all" is specified, the function returns the LSN of the final change to the row, and the operation needed to apply the change to the row is returned in the __$start_lsn and __$operation metadata columns. The __$update_mask column is always NULL. If "all with mask" is specified, the function returns the LSN of the final change to the row and the operation needed to apply the change to the row. Plus, if the __$operation equals 4 (that is, it contains the after update row values), the columns actually modified in the update are identified by the bit mask returned in the __$update_mask column.

If the "all with merge" option is passed, the function returns the LSN of the final change to the row and the operation needed to apply the change to the row. The __$operation column will have one of two values: 1 for delete and 5 to indicate that the operation needed to apply the change is either an insert or update. The column __$update_mask is always NULL.

So how do you determine what LSNs to specify to return the rows you need? Fortunately, SQL Server provides several functions to help determine the appropriate LSN values for use in querying the TVFs:

  • sys.fn_cdc_get_min_lsn— Returns the smallest LSN associated with a capture instance validity interval. The validity interval is the time interval for which change data is currently available for its capture instances.

  • sys.fn_cdc_get_max_lsn— Returns the largest LSN in the validity interval.

  • sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time— Are used to correlate LSN values with a standard time value.

  • sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsn— Can be used to make an incremental adjustment to an LSN value. This adjustment is sometimes necessary to ensure that changes are not duplicated in consecutive query windows.

So, before you can start querying the CDC tables, you need to generate some records in them by running some data modifications against the source tables. First, you need to run the statements in Listing 1 against the MyCustomer table to generate some records in the dbo_MyCustomer_CT Change Data Capture change table.

Listing 1. Some Data Modifications to Populate the MyCustomer CDC Capture Table
delete MyCustomer where CustomerID = 22

Insert MyCustomer (PersonID, StoreID, TerritoryID,
                   AccountNumber, rowguid, ModifiedDate)
Values (20778, null, 9,
        'AW' + RIGHT('00000000'
        + convert(varchar(8), IDENT_Current('MyCustomer')), 8),
        NEWID(),
        GETDATE())

declare @ident int
select @ident = SCOPE_IDENTITY()

update MyCustomer
set TerritoryID = 3,
    ModifiedDate = GETDATE()
where CustomerID = @ident

Now that you have some rows in the CDC capture table, you can start retrieving them. First, you need to identify the min and max LSN values to pass to the cdc.fn_cdc_get_all_changes_dbo_MyCustomer function. This can be done using the sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn functions. Listing 2 puts all these pieces together to return the records stored in the CDC capture table.

Listing 2. Querying the MyCustomer CDC Capture Table
USE AdventureWorks2008R2
GO

--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)

-- get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn('dbo_MyCustomer')

-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()

-- get all changes in the range using "all update old" parameter
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_MyCustomer
         (@from_lsn, @to_lsn, 'all update old');

GO

__$start_lsn           __$seqval              __$operation
__$update_mask CustomerID  PersonID    StoreID     TerritoryID
AccountNumber rowguid
ModifiedDate
---------------------- ---------------------- ------------
-------------- ----------- ----------- ----------- -----------
------------- ------------------------------------
-----------------------
0x00000039000014400004 0x00000039000014400002 1
0x7F           22          NULL        494         3
AW00000022    9774AED6-D673-412D-B481-2573E470B478
2008-10-13 11:15:07.263
0x00000039000014410004 0x00000039000014410003 2
0x7F           30119       20778       NULL        9
AW00030119    2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
0x000000390000144C0004 0x000000390000144C0002 3
0x48           30119       20778       NULL        9
AW00030119    2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
ccc0x000000390000144C0004 0x000000390000144C0002 4
ccc0x48           30119       20778       NULL        3
cccAW00030119    2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
ccc2010-04-27 22:38:48.263


					  

Because the option "all update old" is specified in Listing 2, all the rows in the dbo_MyCustomer_CT capture table are returned, including the deleted row, inserted row, and both the before and after copies of the row updated.

If you want to return only the final version of each row within the LSN range (and the @supports_net_changes was set to 1 when CDC was enabled for the table), you can use the cdc.fn_cdc_get_net_changes_capture_instance function, as shown in Listing 3.

Listing 3. Querying the MyCustomer CDC Capture Table for Net Changes
USE AdventureWorks2008R2
GO

--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)

-- get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn('dbo_MyCustomer')

-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()

-- get all changes in the range using "all with_merge" parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
           (@from_lsn, @to_lsn, 'all with merge');

GO

__$start_lsn           __$operation __$update_mask CustomerID
PersonID    StoreID     TerritoryID AccountNumber
rowguid                              ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- -------------
------------------------------------ -----------------------
0x00000039000014400004 1            NULL           22
NULL        494         3           AW00000022
9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263
ccc0x000000390000144C0004 5            NULL           30119
ccc20778       NULL        3           AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263


					  

For typical ETL-type applications, querying for change data is an ongoing process, making periodic requests for all the changes that occurred since the last request which need to be applied to the target. For these types of queries, you can use the sys.fn_cdc_increment_lsn function to determine the next lowest LSN boundary that is greater than the max LSN boundary of the previous query. To demonstrate this, let’s first execute some additional data modifications against the MyCustomer table:

Insert MyCustomer (PersonID, StoreID, TerritoryID,
                   AccountNumber, rowguid, ModifiedDate)
Values (20779, null, 12,
        'AW' + RIGHT('00000000'
        + convert(varchar(8), IDENT_Current('MyCustomer')), 8),
        NEWID(),
        GETDATE())

delete MyCustomer where CustomerID = 30119

The max LSN from the previous examples is 0x000000390000144C0004. We want to increment from this LSN to find the next set of changes. In Listing 4, you pass this value to the sys.fn_cdc_increment_lsn to set the min LSN value you’ll use with the cdc.fn_cdc_get_net_changes_dbo_MyCustomer function as the lower bound.

Listing 4. Using sys.fn_cdc_increment_lsn to Return the Net Changes to the MyCustomer CDC Capture Table Since the Last Retrieval
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)

-- get the Next lowest LSN after the previous Max LSN
SELECT @from_lsn = sys.fn_cdc_increment_lsn(0x000000390000144C0004)

-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()

-- get all changes in the range using "all with_merge" parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
       (@from_lsn, @to_lsn, 'all with merge');

GO

__$start_lsn           __$operation __$update_mask CustomerID
PersonID    StoreID     TerritoryID AccountNumber
rowguid                              ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- ------------- ---------------------------------
--- -----------------------
0x00000039000017D30004 5            NULL           30120
20779       NULL        12          AW00030120
CE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477
ccc0x00000039000017E50004 1            NULL           30119
ccc20778       NULL        3           AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263


					  

If you want to retrieve the changes captured during a specific time period, you can use the sys.fn_cdc_map_time_to_lsn function, as shown in Listing 5.

Listing 5. Retrieving all Changes to MyCustomer During a Specific Time Period
DECLARE @begin_time datetime,
        @end_time datetime,
        @begin_lsn binary(10),
        @end_lsn binary(10);

SET @begin_time = '2010-04-27 22:38:48.250'
SET @end_time = '2010-04-27 23:52:36.500'

SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn
                      ('smallest greater than', @begin_time);

SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn
                     ('largest less than or equal', @end_time);

SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
         (@begin_lsn, @end_lsn, 'all');
Go

__$start_lsn           __$operation __$update_mask CustomerID
PersonID    StoreID     TerritoryID AccountNumber
rowguid                              ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- -------------
------------------------------------ -----------------------
0x000000390000144C0004 4            NULL           30119
20778       NULL        3           AW00030119
2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
ccc0x00000039000017D30004 2            NULL           30120
ccc20779       NULL        12          AW00030120
cccCE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477


					  

CDC and DDL Changes to Source Tables

One of the common challenges when capturing data changes from your source tables is how to handle DDL changes to the source tables. This can be an issue if the downstream consumer of the changes has not reflected the same DDL changes for its destination tables.

Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent DDL changes from occurring. However, Change Data Capture does help to mitigate the effect on the downstream consumers by allowing the delivered result sets that are returned from the CDC capture tables to remain unchanged even as the column structure of the underlying source table changes. Essentially, the capture process responsible for populating the change table ignores any new columns not present when the source table was enabled for Change Data Capture. If a tracked column is dropped, NULL values are supplied for the column in the subsequent change entries.

However, if the data type of a tracked column is modified, the data type change is also propagated to the change table to ensure that the capture mechanism does not introduce data loss in tracked columns as a result of mismatched data types. When a column is modified, the capture process posts any detected changes to the cdc.ddl_history table. Downstream consumers of the change data from the source tables that may need to be alerted of the column changes (and make similar adjustments to the destination tables) can use the stored procedure sys.sp_cdc_get_ddl_history to identify any modifications to the source table columns.

So how do you modify the capture instance to recognize any added or dropped columns in the source table? Unfortunately, the only way to do this is to disable CDC on the table and re-enable it. However, in an active source environment where it’s not possible to suspend processing while CDC is being disabled and re-enabled, there is the possibility of data loss between when CDC is disabled and re-enabled.

Fortunately, CDC allows two capture instances to be associated with a single source table. This makes it possible to create a second capture instance for the table that reflects the new column structure. The capture process then captures changes to the same source table into two distinct change tables having two different column structures. While the original change table continues to feed current operational programs, the new change table feeds environments that have been modified to incorporate the new column data. Allowing the capture mechanism to populate both change tables in tandem provides a mechanism for smoothly transitioning from one table structure to the other without any loss of change data. When the transition to the new table structure has been fully effected, the obsolete capture instance can be removed.

Other  
  •  Transact-SQL in SQL Server 2008 : Spatial Data Types (part 3) - Spatial Data Support in SSMS
  •  Transact-SQL in SQL Server 2008 : Spatial Data Types (part 2) - Working with Geography Data
  •  Transact-SQL in SQL Server 2008 : Spatial Data Types (part 1) - Representing Spatial Data, Working with Geometry Data
  •  Transact-SQL in SQL Server 2008 : Sparse Columns
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 4) - Analysis Services CLR Support: Server-Side ADO MD.NET
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 3) - XMLA at Your Service
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 2) - OLAP Development with ADO MD.NET
  •  SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 1) - Management Studio as an MDX Client
  •  Microsoft SQL Server 2008 R2 : Using FILESTREAM Storage (part 2) - Using FILESTREAM Storage for Data Columns
  •  Microsoft SQL Server 2008 R2 : Using FILESTREAM Storage (part 1) - Enabling FILESTREAM Storage
  •  
    Video
    Top 10
    SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
    The latest Audi TT : New angles for TT
    Era of million-dollar luxury cars
    Game Review : Hearthstone - Blackrock Mountain
    Game Review : Battlefield Hardline
    Google Chromecast
    Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
    Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
    Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
    Michael Kors Designs Stylish Tech Products for Women
    REVIEW
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    Popular Tags
    Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone