Transact-SQL in SQL Server 2008 : MERGE Statement

7/6/2011 5:49:51 PM

In versions of SQL Server prior to SQL Server 2008, if you had a set of data rows in a source table that you wanted to synchronize with a target table, you had to perform at least three operations: one scan of the source table to find matching rows to update in the target table, another scan of the source table to find nonmatching rows to insert into the target table, and a third scan to find rows in the target table not contained in the source table that needed to be deleted. SQL Server 2008, however, introduces the MERGE statement. With the MERGE statement, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table, all in just a single statement, minimizing the number of times that rows in the source and target tables need to be processed. The MERGE statement can also be used for performing conditional inserts or updates of rows in a target table from a source table.

The MERGE syntax consists of the following primary clauses:

The MERGE clause specifies the table or view that is the target of the insert, update, or delete operations.
The USING clause specifies the data source being joined with the target.
The ON clause specifies the join conditions that determine how the target and source match.
The WHEN MATCHED clause specifies either the update or delete operation to perform when rows of target table match rows in the source table and any additional search conditions.
WHEN NOT MATCHED BY TARGET specifies the insert operation when a row in the source table does not have a match in the target table.
WHEN NOT MATCHED BY SOURCE specifies the update or delete operation to perform when rows of the target table do not have matches in the source table.
The OUTPUT clause returns a row for each row in the target that is inserted, updated, or deleted.

The basic syntax of the MERGE statement is as follows:

[ WITH common_table_expression [,...n] ]
MERGE
    [ TOP ( N ) [ PERCENT ] ]
    [ INTO ] target_table  [ [ AS ] table_alias ]
    USING table_or_view_name [ [ AS ] table_alias ]
    ON merge_search_condition
    [ WHEN MATCHED [ AND search_condition ]
        THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
    [ WHEN NOT MATCHED [ BY TARGET ] [ AND search_condition ]
        THEN { INSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES
}} ]
    [ WHEN NOT MATCHED BY SOURCE [ AND search_condition ]
        THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
     [ OUTPUT column_name | scalar_expression
INTO { @table_variable | output_table } [ (column_list) ] ]
    [ OUTPUT column_name | scalar_expression   [ [AS] column_alias_identifier ] [
,...n ]  ] ;

The WHEN clauses specify the actions to take on the rows identified by the conditions specified in the ON clause. The conditions specified in the ON clause determine the full result set that will be operated on. Additional filtering to restrict the affected rows can be specified in the WHEN clauses. Multiple WHEN clauses with different search conditions can be specified. However, if there is a MATCH clause that includes a search condition, it must be specified before all other WHEN MATCH clauses.

Note that the MERGE command must be terminated with a semicolon (;). Otherwise, you receive a syntax error.

When you run a MERGE statement, rows in the source are matched with rows in the target based on the join predicate that you specify in the ON clause. The rows are processed in a single pass, and one insert, update, or delete operation is performed per input row depending on the WHEN clauses specified. The WHEN clauses determine which of the following matches exist in the result set:

A matched pair consisting of one row from the target and one from the source as a result of matching condition in the WHEN MATCHED clause
A row from the source that has no matching row in the target as a result of the condition specified the WHEN NOT MATCHED BY TARGET clause
A row from the target that has no corresponding row in the source as a result of the condition specified in the WHEN NOT MATCHED BY SOURCE clause

The combination of WHEN clauses specified in the MERGE statement determines the join method that SQL Server will use to process the query (see Table 1).

Table 1. Join Methods Used for WHEN Clauses
Specified WHEN Clauses	Join Method
WHEN MATCHED clause only	INNER JOIN
WHEN NOT MATCHED BY TARGET clause, but notWHEN NOT MATCHED BY SOURCE clause the	LEFT OUTER JOIN from source to target
WHEN MATCHED clause and the WHEN NOT MATCHED BY SOURCE clause, but not the WHEN NOT MATCHED BY TARGET clause	RIGHT OUTER JOIN from source to target
WHEN NOT MATCHED BY TARGET clause and the WHEN NOT MATCHED BY SOURCE clause	FULL OUTER JOIN
WHEN NOT MATCHED BY SOURCE clause only	ANTI SEMI JOIN

To improve the performance of the MERGE statement, you should make sure you have appropriate indexes to support the join columns between the source table and target table. Any additional columns in the source table index that will help to cover the query may help improve performance even more . The indexes should ensure that the join keys are unique and, if possible, sort the data in the tables in the order it will be processed so additional sort operations are not necessary. Unique indexes supporting the join conditions for the MERGE statement will improve query performance because the query optimizer does not need to perform extra validation processing to locate and update duplicate rows.

To better understand how the MERGE statement works, let’s look at an example. First, you need to set up some data in a source table. In the bigpubs2008 database, there is a table called stores. For this example, let’s assume you want to set up a new table that keeps track of each store’s inventory to support an application that can monitor each store’s inventory and send notifications when certain items run low, as well as to support the ability of each store to search other store inventories to locate rare and out-of-print books that other stores may have available. On a daily basis, each store uploads a full refresh of its current inventory to a staging table (inventory_load), which is the source table for the MERGE. You then use the inventory_load table to modify the store’s inventory in the store_inventory table (which is the target table for the MERGE operation).

First, let’s create the new store_inventory table (see Listing 1 ). Just for sake of the example, you can create and populate it with the existing data from the sales table for stor_id 'A011' and create a primary key constraint on the stor_id and title_id columns. The next step is to load the inventory_load table. Normally, in a real-world scenario, this table would likely be populated via a BULK INSERT statement or SQL Server Integration Services. However, for the sake of this example, you simply are going to create some test data by creating and populating the inventory_load table using SELECT INTO with data merged from the sales data for both stor_id 'A011' and 'A017'.

When the inventory_load table is created and populated, you can create a primary key on the stor_id and title_id columns as well to support the join with the store_inventory table.

The next step is to build out the MERGE statement. Following are the rules to be applied:

If there is a matching row between the source and target tables and the qty value is different, update the qty value in the target table to the value in the source table.
If a row in the source table doesn’t have a match in the target table, this is a new inventory item, so insert the new row to the target table.
If a row in the target table doesn’t have a matching row in the source table, that inventory item no longer exists, so delete it from the target table.

Also for the sake of the example so that you can see just what the MERGE statement ends up doing, the OUTPUT clause has been added with the $action column included. The $action column displays what operation (INSERT, UPDATE, DELETE) was performed on each row, and displays the title_id and qty values for both the source and target tables for each row processed (note that if the title_id and qty columns are NULL, that was a nonmatching row).

Listing 1. A MERGE Example

use bigpubs2008
go
if OBJECT_ID('store_inventory') is not null
    drop table store_inventory
go
-- Create and populate the store_inventory table
select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE()
into store_inventory
from sales s
where stor_id = 'A011'
group by stor_id, title_id
go
-- add primary key on store_inventory to support the join to source table
alter table store_inventory add constraint PK_store_inventory primary key
  (stor_id, title_id)
Go
if OBJECT_ID('inventory_load') is not null
    drop table inventory_load
go

-- Now, create and populate the inventory_load table
select stor_id = 'A011',
          title_id,
          qty = SUM(qty)
into inventory_load
from sales s
where stor_id like 'A01[17]'
and title_id not like '%8'
group by title_id
go
— add primary key on store_inventory to support the join to target  table
alter table inventory_load add constraint PK_inventory_load primary key
  (stor_id, title_id)
go
select * from store_inventory
go

-- perform the marge, updating any matching rows with different quantities
--  adding any rows in source not in the target, and deleting any rows from the
--  target that are not in the source.
-- Output clause is specified to display the results of the MERGE
MERGE
     INTO store_inventory as s
     USING inventory_load as i
     ON s.stor_id = i.stor_id
     and s.title_id = i.title_id
     WHEN MATCHED and s.qty <> i.qty
          THEN UPDATE
              SET s.qty = i.qty,
          update_dt = getdate()
     WHEN NOT MATCHED
          THEN INSERT (stor_id, title_id, qty, update_dt)
          VALUES (i.stor_id, i.title_id, i.qty, getdate())
     WHEN NOT MATCHED BY SOURCE
          THEN DELETE
     OUTPUT $action,
     isnull(inserted.title_id, '') as src_titleid, isnull(str(inserted.qty, 5),
 '') as src_qty,
     isnull(deleted.title_id, '') as tgt_titleid, isnull(str(deleted.qty, 5),
 '') as tgt_qty
;
go
select * from store_inventory
go

If you run the script in Listing 1, you should see output like the following.

stor_id title_id qty         update_dt
------- -------- ----------- -----------------------
A011    CH0741   1452        2010-03-25 00:34:25.597
A011    CH3348   24          2010-03-25 00:34:25.597
A011    FI0324   1392        2010-03-25 00:34:25.597
A011    FI0392   1176        2010-03-25 00:34:25.597
A011    FI1552   1476        2010-03-25 00:34:25.597
A011    FI1872   540         2010-03-25 00:34:25.597
A011    FI3484   1428        2010-03-25 00:34:25.597
A011    FI3660   984         2010-03-25 00:34:25.597
A011    FI4020   1704        2010-03-25 00:34:25.597
A011    FI4970   1140        2010-03-25 00:34:25.597
A011    FI4992   180         2010-03-25 00:34:25.597
A011    FI5832   1632        2010-03-25 00:34:25.597
A011    NF8918   1140        2010-03-25 00:34:25.597
A011    PC9999   1272        2010-03-25 00:34:25.597
A011    TC7777   1692        2010-03-25 00:34:25.597

(15 row(s) affected)

$action
---------- ------ ----- ------ -----
INSERT     BU2075  1536
DELETE                  CH3348    24
INSERT     CH5390   888
INSERT     CH7553   540
INSERT     FI1950  1308
INSERT     FI2100  1104
INSERT     FI3822   996
UPDATE     FI4970  1632 FI4970  1140
INSERT     FI7040  1596
INSERT     LC8400   732
DELETE                  NF8918  1140

(11 row(s) affected)
stor_id title_id qty         update_dt
------- -------- ----------- -----------------------
A011    BU2075   1536        2010-03-25 00:54:54.547
A011    CH0741   1452        2010-03-25 00:34:25.597
A011    CH5390   888         2010-03-25 00:54:54.547
A011    CH7553   540         2010-03-25 00:54:54.547
A011    FI0324   1392        2010-03-25 00:34:25.597
A011    FI0392   1176        2010-03-25 00:34:25.597
A011    FI1552   1476        2010-03-25 00:34:25.597
A011    FI1872   540         2010-03-25 00:34:25.597
A011    FI1950   1308        2010-03-25 00:54:54.547
A011    FI2100   1104        2010-03-25 00:54:54.547
A011    FI3484   1428        2010-03-25 00:34:25.597
A011    FI3660   984         2010-03-25 00:34:25.597
A011    FI3822   996         2010-03-25 00:54:54.547
A011    FI4020   1704        2010-03-25 00:34:25.597
A011    FI4970   1632        2010-03-25 00:54:54.547
A011    FI4992   180         2010-03-25 00:34:25.597
A011    FI5832   1632        2010-03-25 00:34:25.597
A011    FI7040   1596        2010-03-25 00:54:54.547
A011    LC8400   732         2010-03-25 00:54:54.547
A011    PC9999   1272        2010-03-25 00:34:25.597
A011    TC7777   1692        2010-03-25 00:34:25.597

(21 row(s) affected)

If you examine the results and compare the before and after contents of the store_inventory, you see that eight new rows were inserted to store_inventory, two rows were deleted, and one row was updated.

MERGE Statement Best Practices and Guidelines

The MERGE statement is a great addition to the T-SQL language. It provides a concise and efficient mechanism to perform multiple operations on a table based on contents in a source table without having to resort to using a cursor or running multiple set-oriented operations against the table. However, there are some guidelines and best practices you should keep in mind to help ensure you get the best performance from your MERGE statements.

First, you should try to reduce the number of rows accessed by the MERGE statement early in the process by specifying any additional search condition to the ON clause that filters out rows that do not need to be processed. You should avoid using the conditions in the WHEN clauses as row filters. However, you need to be careful if you are using any of the WHEN NOT MATCHED clauses because the elimination of rows via the ON clause may cause unexpected and incorrect results. Because the additional search conditions specified in the ON clause are not used for matching the source and target data, they can be misapplied.

To ensure correct results are obtained, you should specify only search conditions in the ON clause that determine the criteria for matching data in the source and target tables. That is, specify only columns from the target table that are compared to the corresponding columns of the source table. Do not include comparisons to other values such as a constant.

To filter out rows from the source or target tables, you should consider using one of the following methods.

Specify the search condition for row filtering in the appropriate WHEN clause. For example, WHEN NOT MATCHED AND qty > 0 THEN INSERT....
Define a view on the source or target that returns the filtered rows and reference the view as the source or target table. If the view is used as the target, make sure the view is updateable .
Use the WITH <common table expression> clause to filter out rows from the source or target tables. However, if you are not careful, this method is similar to specifying additional search criteria in the ON clause and may produce incorrect results. You should test this approach thoroughly before implementing it .

Other

SQL Server 2008 : Transact-SQL Programming - The TABLESAMPLE Clause

SQL Server 2008 : Transact-SQL Programming - TRY...CATCH Logic for Error Handling

SQL Server 2008 : Transact-SQL Programming - The APPLY Operator

SQL Server 2008 : Transact-SQL Programming - PIVOT and UNPIVOT

SQL Server 2008 : Transact-SQL Programming - Ranking Functions

SQL Server 2008 : Transact-SQL Programming - Common Table Expressions

SQL Server 2008 : Transact-SQL Programming - The OUTPUT Clause

SQL Server 2008 : Transact-SQL Programming - TOP Enhancements

SQL Server 2008 : Transact-SQL Programming - The max Specifier

SQL Server 2008 : T-SQL Tips and Tricks (part 3) - Generating T-SQL Statements with T-SQL & De-Duping Data with Ranking Functions

Transact-SQL in SQL Server 2008 : MERGE Statement

Table 1. Join Methods Used for WHEN Clauses

Listing 1. A MERGE Example

MERGE Statement Best Practices and Guidelines