Transact-SQL in SQL Server 2008 : Change Tracking (part 2) - Identifying Tracked Changes, Identifying Changed Columns, Change Tracking Overhead

8/24/2012 3:05:59 AM

Identifying Tracked Changes

After Change Tracking is enabled for a table, any data modification statements that affect rows in the table cause Change Tracking information for each modified row to be recorded. To query for the rows that have changed and to obtain information about the changes, you can use the built-in Change Tracking functions.

Unless you enabled the TRACK_COLUMNS_UPDATED option, only the values of the primary key column are recorded with the change information to allow you to identify the rows that have been changed. To identify the changed rows, use the CHANGETABLE (CHANGES ...) Change Tracking function. The CHANGETABLE (CHANGES ...) function takes two parameters: the first is the table name, and the second is the last synchronization version number.

If you pass 0 for the last synchronization version parameter, you get a list of all the rows that have been modified since version 0, which means all the changes to the table since first enabling Change Tracking. Typically, however, you do not want all the rows that have changed from the beginning of Change Tracking, but only those rows that have changed since the last time you retrieved the changed rows.

Rather than having to keep track of the version numbers, you can use the CHANGE_TRACKING_CURRENT_VERSION() function to obtain the current version that will be used the next time you query for changes. The version returned represents the version of the last committed transaction.

Before an application can obtain changes for the first time, the application must first execute a query to obtain the initial data from the table and a query to retrieve the initial synchronization version using CHANGE_TRACKING_CURRENT_VERSION() function. The version number that is retrieved is passed to the CHANGETABLE(CHANGES ...) function the next time it is invoked.

The following example illustrates how to obtain the initial synchronization version and initial data set:

USE AdventureWorks2008R2
Go
declare @synchronization_version bigint

Select change_tracking_version =  CHANGE_TRACKING_CURRENT_VERSION();

-- Obtain initial data set.

select CustomerID, TerritoryID, @synchronization_version as version
 from MyCustomer
where CustomerID <= 5
go

change_tracking_version
-----------------------
0

CustomerID  TerritoryID
----------- -----------
1           1
2           1
3           4
4           4
5           4

As you can see, because no updates have been performed since Change Tracking was enabled, the initial version is 0.

Now let’s perform some updates on these rows to effect some changes:

update MyCustomer
    set TerritoryID = 5
    where CustomerID = 4

update MyCustomer
    set TerritoryID = 4
    where CustomerID = 5

Now you can use the CHANGETABLE(CHANGES ...) function to find the rows that have changed since the last version (0):

declare @last_synchronization_version bigint
set @last_synchronization_version = 0
SELECT
    CT.CustomerID as CustID, CT.SYS_CHANGE_OPERATION,
    CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM
    CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
Go

CustID SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT
------ -------------------- ------------------ ------------------
4      U                    0x0000000004000000 NULL
5      U                    0x0000000004000000 NULL

You can see in these results that this query returns the CustomerIDs of the two rows that were changed. However, most applications also want the data from these rows as well. To return the data, you can join the results from CHANGETABLE(CHANGES ...) with the data in the user table. For example, the following query joins with the MyCustomer table to obtain the values for the PersonID, StoredID, and TerritoryID columns. Note that the query uses an OUTER JOIN to make sure that the change information is returned for any rows that may have been deleted from the user table. Also, at the same time you are retrieving the data rows, you also want to retrieve the current version as well to use the next time the application comes back to retrieve the latest changes:

declare @last_synchronization_version bigint
set @last_synchronization_version = 0
select current_version = CHANGE_TRACKING_CURRENT_VERSION()

SELECT
    CT.CustomerID as CustID,
    C.PersonID,
    C.StoreID,
    C.TerritoryID,
    CT.SYS_CHANGE_OPERATION,
    CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM
    MyCustomer C
        RIGHT OUTER JOIN
    CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
        on C.CustomerID = CT.CustomerID
go

current_version
--------------------
2

CustID      PersonID    StoreID     TerritoryID
SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT
----------- ----------- ----------- -----------
-------------------- ------------------ -------------------
4           NULL        932         5
U                    0x0000000004000000 NULL
5           NULL        1026        4
U                    0x0000000004000000 NULL

You can see in the output from this query that the current version is now 2. The next time the application issues a query to identify the rows that have been changed since this query, it will pass the value of 2 as the @last_synchronization_version to the CHANGETABLE(CHANGES ...) function.

Caution

The version number is NOT specific to a table or user session. The Change Tracking version number is maintained across the entire database for all users and change tracked tables. Whenever a data modification is performed by any user on any table that has Change Tracking enabled, the version number is incremented.

For example, immediately after running an update on change tracked table A in the current application and incrementing the version to 3, another application could run an update on change tracked table B and increment the version to 4, and so on. This is why you should always capture the current version number whenever you are retrieving the latest set of changes from the change tracked tables.

If an application has not synchronized with the database in a while, the stored version number could no longer be valid if the Change Tracking retention period has expired for any row modifications that have occurred since that version. To validate the version number, you can use the CHANGE_TRACKING_MIN_VALID_VERSION() function. This function returns the minimum valid version that a client can have and still obtain valid results from CHANGETABLE(). Your client applications should check the last synchronization version obtained against the value returned by this function and if the last synchronization version is less than the version returned by this function, that version is invalid. The client application has to reinitialize all the data rows from the table. The following T-SQL code snippet can be used to validate the last_synchronization_version:

-- Check individual table.
IF (@last_synchronization_version <
    CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('MyCustomer')))
    BEGIN
      -- Handle invalid version and do not enumerate changes.
      -- Client must be reinitialized.
    END

Identifying Changed Columns

In addition to information about which rows were changed and the operation that caused the change (insert, update, or delete—reported as I, U, or D in the SYS_CHANGE_OPERATION), the CHANGETABLE(CHANGES ...) function also provides information on which columns were modified if you enabled the TRACK_COLUMNS_UPDATED option. You can use this information to determine whether any action is needed in your client application based on which columns changed.

To identify whether a specific column has changed, you can use the CHANGE_TRACKING_IS_COLUMN_IN_MASK (column_id, change_columns) function. This function interprets the SYS_CHANGE_COLUMNS bitmap value returned by the CHANGETABLE(CHANGES ...) function and returns a 1 if the column was modified or 0 if it was not:

declare @last_synchronization_version bigint
set @last_synchronization_version = 0

SELECT
    CT.CustomerID as CustID,
    TerritoryChanged = CHANGE_TRACKING_IS_COLUMN_IN_MASK
                        (COLUMNPROPERTY(OBJECT_ID('MyCustomer'),
                         'TerritoryID', 'ColumnId'),
                         CT.SYS_CHANGE_COLUMNS),
    CT.SYS_CHANGE_OPERATION,
    CT.SYS_CHANGE_COLUMNS
FROM
    CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
go

CustID      TerritoryChanged SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS
----------- ---------------- -------------------- ------------------
4           1                U                    0x0000000004000000
5           1                U                    0x0000000004000000

In the query results, you can see that both update operations (SYS_CHANGE_OPERATION = 'U') modified the TerritoryID column (TerritoryChanged = 1).

Change Tracking Overhead

Although Change Tracking has been optimized to minimize the performance overhead on DML operations, it is important to know that there are some performance overhead and space requirements within the application databases when implementing Change Tracking.

The performance overhead associated with using Change Tracking on a table is similar to the index maintenance overhead incurred for insert, update, and delete operations. For each row changed by a DML operation, a row is added to the internal Change Tracking table. The amount of overhead incurred depends on various factors, such as

The number of primary key columns
The amount of data being changed in the user table row
The number of operations being performed in a transaction
Whether column Change Tracking is enabled

Change Tracking also consumes some space in the databases where it is enabled as well. Change Tracking data is stored in the following types of internal tables:

Internal change tables— There is one internal change table for each user table that has Change Tracking enabled.
Internal transaction table— There is one internal transaction table for the database.

These internal tables affect storage requirements in the following ways:

For each change to each row in the user table, a row is added to the internal change table. This row has a small fixed overhead plus a variable overhead equal to the size of the primary key columns. The row can contain optional context information set by an application. In addition, if column tracking is enabled, each changed column requires an additional 4 bytes per row in the tracking table.
For each committed transaction, a row is added to an internal transaction table.

If you are concerned about the space usage requirements of the internal Change Tracking tables, you can determine the space they use by executing the sp_spaceused stored procedure. The internal transaction table is called sys.syscommittab. The names of the internal change tables for each table are in the form change_tracking_object_id. The following example returns the size of the internal transaction table and internal change table for the MyCustomer table:

exec sp_spaceused 'sys.syscommittab'
declare @tablename varchar(128)
set @tablename = 'sys.change_tracking_'
                  + CONVERT(varchar(16), object_id('MyCustomer'))
exec sp_spaceused @tablename

Transact-SQL in SQL Server 2008 : Change Tracking (part 1) - Implementing Change Tracking

Other

SQL Server 2005 : Report Definition and Design (part 3) - Report Builder

SQL Server 2005 : Report Definition and Design (part 2) - Report Designer

SQL Server 2005 : Report Definition and Design (part 1) - Data Sources, Report Layouts

Monitoring MySQL : Database Performance (part 2) - Database Optimization Best Practices

Monitoring MySQL : Database Performance (part 1) - Measuring Database Performance

Transact-SQL in SQL Server 2008 : Change Data Capture (part 2) - Querying the CDC Tables

Transact-SQL in SQL Server 2008 : Change Data Capture (part 1)

Transact-SQL in SQL Server 2008 : Spatial Data Types (part 3) - Spatial Data Support in SSMS

Transact-SQL in SQL Server 2008 : Spatial Data Types (part 2) - Working with Geography Data

Transact-SQL in SQL Server 2008 : Spatial Data Types (part 1) - Representing Spatial Data, Working with Geometry Data