Querying the CDC Tables
After you enable change data
tracking for a table, SQL Server begins capturing any data changes for
the table in the Change Data Capture tables. To identify the data
changes, you need to query the Change Data Capture tables. Although you
can query the Change Data Capture tables directly, it is recommended
that you use the CDC functions instead. The main CDC table-valued
functions (TVFs) are
Note
The Change Data Capture
change table and associated CDC table-valued functions created along
with it constitute what is referred to as a capture instance. A capture instance is created for every source table that is enabled for CDC.
Each capture instance is given a unique name based on the schema and table names. For example, if the table named sales.products is CDC enabled, the capture instance created is named sales_products. The name of the CDC change table within the capture instance is sales_products_CT, and the names of the two associated CDC query functions are cdc.fn_cdc_get_all_changes_sales_products and cdc.fn_cdc_get_net_changes_sales_products.
Both of the CDC table-valued
functions require two parameters to define the range of log sequence
numbers to use as the upper and lower bounds to determine which records
are to be included in the returned result set. A third required
parameter, the row_filter_option,
specifies the content of the metadata columns as well as the rows to be
returned in the result set. Two values can be specified for the row_filter for the cdc.fn_cdc_get_all_changes_capture_instance function: "all" and "all update old".
If "all" is
specified, the function returns all changes within the specified log
sequence number (LSN) range. For changes due to an update operation,
only the row containing the new values after the update are returned. If
"all update old" is specified, the
function returns all changes within the specified LSN range. For
changes due to an update operation, this option returns both the before
and after update copies of the row.
For the cdc.fn_cdc_get_net_changes_capture_instance function, three values can be specified for the row_filter parameter: "all", "all with mask", and "all with merge". If "all"
is specified, the function returns the LSN of the final change to the
row, and the operation needed to apply the change to the row is returned
in the __$start_lsn and __$operation metadata columns. The __$update_mask column is always NULL. If "all with mask"
is specified, the function returns the LSN of the final change to the
row and the operation needed to apply the change to the row. Plus, if
the __$operation equals 4
(that is, it contains the after update row values), the columns
actually modified in the update are identified by the bit mask returned
in the __$update_mask column.
If the "all with merge"
option is passed, the function returns the LSN of the final change to
the row and the operation needed to apply the change to the row. The __$operation column will have one of two values: 1 for delete and 5 to indicate that the operation needed to apply the change is either an insert or update. The column __$update_mask is always NULL.
So how do you determine
what LSNs to specify to return the rows you need? Fortunately, SQL
Server provides several functions to help determine the appropriate LSN
values for use in querying the TVFs:
sys.fn_cdc_get_min_lsn—
Returns the smallest LSN associated with a capture instance validity
interval. The validity interval is the time interval for which change
data is currently available for its capture instances.
sys.fn_cdc_get_max_lsn— Returns the largest LSN in the validity interval.
sys.fn_cdc_map_time_to_lsn and sys.fn_cdc_map_lsn_to_time— Are used to correlate LSN values with a standard time value.
sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsn—
Can be used to make an incremental adjustment to an LSN value. This
adjustment is sometimes necessary to ensure that changes are not
duplicated in consecutive query windows.
So, before you can start
querying the CDC tables, you need to generate some records in them by
running some data modifications against the source tables. First, you
need to run the statements in Listing 1 against the MyCustomer table to generate some records in the dbo_MyCustomer_CT Change Data Capture change table.
Listing 1. Some Data Modifications to Populate the MyCustomer CDC Capture Table
delete MyCustomer where CustomerID = 22
Insert MyCustomer (PersonID, StoreID, TerritoryID,
AccountNumber, rowguid, ModifiedDate)
Values (20778, null, 9,
'AW' + RIGHT('00000000'
+ convert(varchar(8), IDENT_Current('MyCustomer')), 8),
NEWID(),
GETDATE())
declare @ident int
select @ident = SCOPE_IDENTITY()
update MyCustomer
set TerritoryID = 3,
ModifiedDate = GETDATE()
where CustomerID = @ident
|
Now that you have some rows in the CDC capture table, you can start retrieving them. First, you need to identify the min and max LSN values to pass to the cdc.fn_cdc_get_all_changes_dbo_MyCustomer function. This can be done using the sys.fn_cdc_get_min_lsn and sys.fn_cdc_get_max_lsn functions. Listing 2 puts all these pieces together to return the records stored in the CDC capture table.
Listing 2. Querying the MyCustomer CDC Capture Table
USE AdventureWorks2008R2
GO
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
-- get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn('dbo_MyCustomer')
-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
-- get all changes in the range using "all update old" parameter
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, 'all update old');
GO
__$start_lsn __$seqval __$operation
__$update_mask CustomerID PersonID StoreID TerritoryID
AccountNumber rowguid
ModifiedDate
---------------------- ---------------------- ------------
-------------- ----------- ----------- ----------- -----------
------------- ------------------------------------
-----------------------
0x00000039000014400004 0x00000039000014400002 1
0x7F 22 NULL 494 3
AW00000022 9774AED6-D673-412D-B481-2573E470B478
2008-10-13 11:15:07.263
0x00000039000014410004 0x00000039000014410003 2
0x7F 30119 20778 NULL 9
AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
0x000000390000144C0004 0x000000390000144C0002 3
0x48 30119 20778 NULL 9
AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
ccc0x000000390000144C0004 0x000000390000144C0002 4
ccc0x48 30119 20778 NULL 3
cccAW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
ccc2010-04-27 22:38:48.263
|
Because the option "all update old" is specified in Listing 2, all the rows in the dbo_MyCustomer_CT capture table are returned, including the deleted row, inserted row, and both the before and after copies of the row updated.
If you want to return only the final version of each row within the LSN range (and the @supports_net_changes was set to 1 when CDC was enabled for the table), you can use the cdc.fn_cdc_get_net_changes_capture_instance function, as shown in Listing 3.
Listing 3. Querying the MyCustomer CDC Capture Table for Net Changes
USE AdventureWorks2008R2
GO
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
-- get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn('dbo_MyCustomer')
-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
-- get all changes in the range using "all with_merge" parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, 'all with merge');
GO
__$start_lsn __$operation __$update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- -------------
------------------------------------ -----------------------
0x00000039000014400004 1 NULL 22
NULL 494 3 AW00000022
9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263
ccc0x000000390000144C0004 5 NULL 30119
ccc20778 NULL 3 AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
|
For typical ETL-type
applications, querying for change data is an ongoing process, making
periodic requests for all the changes that occurred since the last
request which need to be applied to the target. For these types of
queries, you can use the sys.fn_cdc_increment_lsn function to determine the next lowest LSN boundary that is greater than the max LSN boundary of the previous query. To demonstrate this, let’s first execute some additional data modifications against the MyCustomer table:
Insert MyCustomer (PersonID, StoreID, TerritoryID,
AccountNumber, rowguid, ModifiedDate)
Values (20779, null, 12,
'AW' + RIGHT('00000000'
+ convert(varchar(8), IDENT_Current('MyCustomer')), 8),
NEWID(),
GETDATE())
delete MyCustomer where CustomerID = 30119
The max LSN
from the previous examples is 0x000000390000144C0004. We want to
increment from this LSN to find the next set of changes. In Listing 4, you pass this value to the sys.fn_cdc_increment_lsn to set the min LSN value you’ll use with the cdc.fn_cdc_get_net_changes_dbo_MyCustomer function as the lower bound.
Listing 4. Using sys.fn_cdc_increment_lsn to Return the Net Changes to the MyCustomer CDC Capture Table Since the Last Retrieval
--declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
-- get the Next lowest LSN after the previous Max LSN
SELECT @from_lsn = sys.fn_cdc_increment_lsn(0x000000390000144C0004)
-- get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
-- get all changes in the range using "all with_merge" parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, 'all with merge');
GO
__$start_lsn __$operation __$update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- ------------- ---------------------------------
--- -----------------------
0x00000039000017D30004 5 NULL 30120
20779 NULL 12 AW00030120
CE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477
ccc0x00000039000017E50004 1 NULL 30119
ccc20778 NULL 3 AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
|
If you want to retrieve the changes captured during a specific time period, you can use the sys.fn_cdc_map_time_to_lsn function, as shown in Listing 5.
Listing 5. Retrieving all Changes to MyCustomer During a Specific Time Period
DECLARE @begin_time datetime,
@end_time datetime,
@begin_lsn binary(10),
@end_lsn binary(10);
SET @begin_time = '2010-04-27 22:38:48.250'
SET @end_time = '2010-04-27 23:52:36.500'
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn
('smallest greater than', @begin_time);
SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn
('largest less than or equal', @end_time);
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@begin_lsn, @end_lsn, 'all');
Go
__$start_lsn __$operation __$update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
---------------------- ------------ -------------- -----------
----------- ----------- ----------- -------------
------------------------------------ -----------------------
0x000000390000144C0004 4 NULL 30119
20778 NULL 3 AW00030119
2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263
ccc0x00000039000017D30004 2 NULL 30120
ccc20779 NULL 12 AW00030120
cccCE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477
|
CDC and DDL Changes to Source Tables
One
of the common challenges when capturing data changes from your source
tables is how to handle DDL changes to the source tables. This can be an
issue if the downstream consumer of the changes has not reflected the
same DDL changes for its destination tables.
Enabling Change Data Capture
on a source table in SQL Server 2008 does not prevent DDL changes from
occurring. However, Change Data Capture does help to mitigate the effect
on the downstream consumers by allowing the delivered result sets that
are returned from the CDC capture tables to remain unchanged even as the
column structure of the underlying source table changes. Essentially,
the capture process responsible for populating the change table ignores
any new columns not present when the source table was enabled for Change
Data Capture. If a tracked column is dropped, NULL values are supplied for the column in the subsequent change entries.
However, if the data type of a
tracked column is modified, the data type change is also propagated to
the change table to ensure that the capture mechanism does not introduce
data loss in tracked columns as a result of mismatched data types. When
a column is modified, the capture process posts any detected changes to
the cdc.ddl_history table.
Downstream consumers of the change data from the source tables that may
need to be alerted of the column changes (and make similar adjustments
to the destination tables) can use the stored procedure sys.sp_cdc_get_ddl_history to identify any modifications to the source table columns.
So how do you modify the
capture instance to recognize any added or dropped columns in the source
table? Unfortunately, the only way to do this is to disable CDC on the
table and re-enable it. However, in an active source environment where
it’s not possible to suspend processing while CDC is being disabled and
re-enabled, there is the possibility of data loss between when CDC is
disabled and re-enabled.
Fortunately, CDC allows
two capture instances to be associated with a single source table. This
makes it possible to create a second capture instance for the table that
reflects the new column structure. The capture process then captures
changes to the same source table into two distinct change tables having
two different column structures. While the original change table
continues to feed current operational programs, the new change table
feeds environments that have been modified to incorporate the new column
data. Allowing the capture mechanism to populate both change tables in
tandem provides a mechanism for smoothly transitioning from one table
structure to the other without any loss of change data. When the
transition to the new table structure has been fully effected, the
obsolete capture instance can be removed.