In versions of SQL Server prior to SQL Server 2008,
if you had a set of data rows in a source table that you wanted to
synchronize with a target table, you had to perform at least three
operations: one scan of the source table to find matching rows to update
in the target table, another scan of the source table to find
nonmatching rows to insert into the target table, and a third scan to
find rows in the target table not contained in the source table that
needed to be deleted. SQL Server 2008, however, introduces the MERGE statement. With the MERGE
statement, you can synchronize two tables by inserting, updating, or
deleting rows in one table based on differences found in the other
table, all in just a single statement, minimizing the number of times
that rows in the source and target tables need to be processed. The MERGE statement can also be used for performing conditional inserts or updates of rows in a target table from a source table.
The MERGE syntax consists of the following primary clauses:
The MERGE clause specifies the table or view that is the target of the insert, update, or delete operations.
The USING clause specifies the data source being joined with the target.
The ON clause specifies the join conditions that determine how the target and source match.
The WHEN MATCHED
clause specifies either the update or delete operation to perform when
rows of target table match rows in the source table and any additional
search conditions.
WHEN NOT MATCHED BY TARGET specifies the insert operation when a row in the source table does not have a match in the target table.
WHEN NOT MATCHED BY SOURCE specifies the update or delete operation to perform when rows of the target table do not have matches in the source table.
The OUTPUT clause returns a row for each row in the target that is inserted, updated, or deleted.
The basic syntax of the MERGE statement is as follows:
[ WITH common_table_expression [,...n] ]
MERGE
[ TOP ( N ) [ PERCENT ] ]
[ INTO ] target_table [ [ AS ] table_alias ]
USING table_or_view_name [ [ AS ] table_alias ]
ON merge_search_condition
[ WHEN MATCHED [ AND search_condition ]
THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND search_condition ]
THEN { INSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES
}} ]
[ WHEN NOT MATCHED BY SOURCE [ AND search_condition ]
THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
[ OUTPUT column_name | scalar_expression
INTO { @table_variable | output_table } [ (column_list) ] ]
[ OUTPUT column_name | scalar_expression [ [AS] column_alias_identifier ] [
,...n ] ] ;
The WHEN clauses specify the actions to take on the rows identified by the conditions specified in the ON clause. The conditions specified in the ON
clause determine the full result set that will be operated on.
Additional filtering to restrict the affected rows can be specified in
the WHEN clauses. Multiple WHEN clauses with different search conditions can be specified. However, if there is a MATCH clause that includes a search condition, it must be specified before all other WHEN MATCH clauses.
Note that the MERGE command must be terminated with a semicolon (;). Otherwise, you receive a syntax error.
When you run a MERGE statement, rows in the source are matched with rows in the target based on the join predicate that you specify in the ON
clause. The rows are processed in a single pass, and one insert,
update, or delete operation is performed per input row depending on the WHEN clauses specified. The WHEN clauses determine which of the following matches exist in the result set:
A matched pair consisting of one row from the target and one from the source as a result of matching condition in the WHEN MATCHED clause
A row from the source that has no matching row in the target as a result of the condition specified the WHEN NOT MATCHED BY TARGET clause
A row from the target that has no corresponding row in the source as a result of the condition specified in the WHEN NOT MATCHED BY SOURCE clause
The combination of WHEN clauses specified in the MERGE statement determines the join method that SQL Server will use to process the query (see Table 1).
Table 1. Join Methods Used for WHEN Clauses
Specified WHEN Clauses | Join Method |
---|
WHEN MATCHED clause only | INNER JOIN |
WHEN NOT MATCHED BY TARGET clause, but notWHEN NOT MATCHED BY SOURCE clause the | LEFT OUTER JOIN from source to target |
WHEN MATCHED clause and the WHEN NOT MATCHED BY SOURCE clause, but not the WHEN NOT MATCHED BY TARGET clause | RIGHT OUTER JOIN from source to target |
WHEN NOT MATCHED BY TARGET clause and the WHEN NOT MATCHED BY SOURCE clause | FULL OUTER JOIN |
WHEN NOT MATCHED BY SOURCE clause only | ANTI SEMI JOIN |
To improve the performance of the MERGE
statement, you should make sure you have appropriate indexes to support
the join columns between the source table and target table. Any
additional columns in the source table index that will help to cover the
query may help improve performance even more . The indexes should ensure
that the join keys are unique and, if possible, sort the data in the
tables in the order it will be processed so additional sort operations
are not necessary. Unique indexes supporting the join conditions for the
MERGE statement will improve
query performance because the query optimizer does not need to perform
extra validation processing to locate and update duplicate rows.
To better understand how the MERGE statement works, let’s look at an example. First, you need to set up some data in a source table. In the bigpubs2008 database, there is a table called stores.
For this example, let’s assume you want to set up a new table that
keeps track of each store’s inventory to support an application that can
monitor each store’s inventory and send notifications when certain
items run low, as well as to support the ability of each store to search
other store inventories to locate rare and out-of-print books that
other stores may have available. On a daily basis, each store uploads a
full refresh of its current inventory to a staging table (inventory_load), which is the source table for the MERGE. You then use the inventory_load table to modify the store’s inventory in the store_inventory table (which is the target table for the MERGE operation).
First, let’s create the new store_inventory table (see Listing 1). Just for sake of the example, you can create and populate it with the existing data from the sales table for stor_id 'A011' and create a primary key constraint on the stor_id and title_id columns. The next step is to load the inventory_load table. Normally, in a real-world scenario, this table would likely be populated via a BULK INSERT
statement or SQL Server Integration Services. However, for the sake of
this example, you simply are going to create some test data by creating
and populating the inventory_load table using SELECT INTO with data merged from the sales data for both stor_id 'A011' and 'A017'.
When the inventory_load table is created and populated, you can create a primary key on the stor_id and title_id columns as well to support the join with the store_inventory table.
The next step is to build out the MERGE statement. Following are the rules to be applied:
- If there is a matching row between the source and target tables and the qty value is different, update the qty value in the target table to the value in the source table.
- If
a row in the source table doesn’t have a match in the target table,
this is a new inventory item, so insert the new row to the target table.
- If
a row in the target table doesn’t have a matching row in the source
table, that inventory item no longer exists, so delete it from the
target table.
Also for the sake of the example so that you can see just what the MERGE statement ends up doing, the OUTPUT clause has been added with the $action column included. The $action column displays what operation (INSERT, UPDATE, DELETE) was performed on each row, and displays the title_id and qty values for both the source and target tables for each row processed (note that if the title_id and qty columns are NULL, that was a nonmatching row).
Listing 1. A MERGE Example
use bigpubs2008 go if OBJECT_ID('store_inventory') is not null drop table store_inventory go -- Create and populate the store_inventory table select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE() into store_inventory from sales s where stor_id = 'A011' group by stor_id, title_id go -- add primary key on store_inventory to support the join to source table alter table store_inventory add constraint PK_store_inventory primary key (stor_id, title_id) Go if OBJECT_ID('inventory_load') is not null drop table inventory_load go
-- Now, create and populate the inventory_load table select stor_id = 'A011', title_id, qty = SUM(qty) into inventory_load from sales s where stor_id like 'A01[17]' and title_id not like '%8' group by title_id go — add primary key on store_inventory to support the join to target table alter table inventory_load add constraint PK_inventory_load primary key (stor_id, title_id) go select * from store_inventory go
-- perform the marge, updating any matching rows with different quantities -- adding any rows in source not in the target, and deleting any rows from the -- target that are not in the source. -- Output clause is specified to display the results of the MERGE MERGE INTO store_inventory as s USING inventory_load as i ON s.stor_id = i.stor_id and s.title_id = i.title_id WHEN MATCHED and s.qty <> i.qty THEN UPDATE SET s.qty = i.qty, update_dt = getdate() WHEN NOT MATCHED THEN INSERT (stor_id, title_id, qty, update_dt) VALUES (i.stor_id, i.title_id, i.qty, getdate()) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action, isnull(inserted.title_id, '') as src_titleid, isnull(str(inserted.qty, 5), '') as src_qty, isnull(deleted.title_id, '') as tgt_titleid, isnull(str(deleted.qty, 5), '') as tgt_qty ; go select * from store_inventory go
|
If you run the script in Listing 1, you should see output like the following.
stor_id title_id qty update_dt
------- -------- ----------- -----------------------
A011 CH0741 1452 2010-03-25 00:34:25.597
A011 CH3348 24 2010-03-25 00:34:25.597
A011 FI0324 1392 2010-03-25 00:34:25.597
A011 FI0392 1176 2010-03-25 00:34:25.597
A011 FI1552 1476 2010-03-25 00:34:25.597
A011 FI1872 540 2010-03-25 00:34:25.597
A011 FI3484 1428 2010-03-25 00:34:25.597
A011 FI3660 984 2010-03-25 00:34:25.597
A011 FI4020 1704 2010-03-25 00:34:25.597
A011 FI4970 1140 2010-03-25 00:34:25.597
A011 FI4992 180 2010-03-25 00:34:25.597
A011 FI5832 1632 2010-03-25 00:34:25.597
A011 NF8918 1140 2010-03-25 00:34:25.597
A011 PC9999 1272 2010-03-25 00:34:25.597
A011 TC7777 1692 2010-03-25 00:34:25.597
(15 row(s) affected)
$action
---------- ------ ----- ------ -----
INSERT BU2075 1536
DELETE CH3348 24
INSERT CH5390 888
INSERT CH7553 540
INSERT FI1950 1308
INSERT FI2100 1104
INSERT FI3822 996
UPDATE FI4970 1632 FI4970 1140
INSERT FI7040 1596
INSERT LC8400 732
DELETE NF8918 1140
(11 row(s) affected)
stor_id title_id qty update_dt
------- -------- ----------- -----------------------
A011 BU2075 1536 2010-03-25 00:54:54.547
A011 CH0741 1452 2010-03-25 00:34:25.597
A011 CH5390 888 2010-03-25 00:54:54.547
A011 CH7553 540 2010-03-25 00:54:54.547
A011 FI0324 1392 2010-03-25 00:34:25.597
A011 FI0392 1176 2010-03-25 00:34:25.597
A011 FI1552 1476 2010-03-25 00:34:25.597
A011 FI1872 540 2010-03-25 00:34:25.597
A011 FI1950 1308 2010-03-25 00:54:54.547
A011 FI2100 1104 2010-03-25 00:54:54.547
A011 FI3484 1428 2010-03-25 00:34:25.597
A011 FI3660 984 2010-03-25 00:34:25.597
A011 FI3822 996 2010-03-25 00:54:54.547
A011 FI4020 1704 2010-03-25 00:34:25.597
A011 FI4970 1632 2010-03-25 00:54:54.547
A011 FI4992 180 2010-03-25 00:34:25.597
A011 FI5832 1632 2010-03-25 00:34:25.597
A011 FI7040 1596 2010-03-25 00:54:54.547
A011 LC8400 732 2010-03-25 00:54:54.547
A011 PC9999 1272 2010-03-25 00:34:25.597
A011 TC7777 1692 2010-03-25 00:34:25.597
(21 row(s) affected)
If you examine the results and compare the before and after contents of the store_inventory, you see that eight new rows were inserted to store_inventory, two rows were deleted, and one row was updated.
MERGE Statement Best Practices and Guidelines
The MERGE
statement is a great addition to the T-SQL language. It provides a
concise and efficient mechanism to perform multiple operations on a
table based on contents in a source table without having to resort to
using a cursor or running multiple set-oriented operations against the
table. However, there are some guidelines and best practices you should
keep in mind to help ensure you get the best performance from your MERGE statements.
First, you should try to reduce the number of rows accessed by the MERGE statement early in the process by specifying any additional search condition to the ON clause that filters out rows that do not need to be processed. You should avoid using the conditions in the WHEN clauses as row filters. However, you need to be careful if you are using any of the WHEN NOT MATCHED clauses because the elimination of rows via the ON clause may cause unexpected and incorrect results. Because the additional search conditions specified in the ON clause are not used for matching the source and target data, they can be misapplied.
To ensure correct results are obtained, you should specify only search conditions in the ON
clause that determine the criteria for matching data in the source and
target tables. That is, specify only columns from the target table that
are compared to the corresponding columns of the source table. Do not
include comparisons to other values such as a constant.
To filter out rows from the source or target tables, you should consider using one of the following methods.
Specify the search condition for row filtering in the appropriate WHEN clause. For example, WHEN NOT MATCHED AND qty > 0 THEN INSERT....
Define
a view on the source or target that returns the filtered rows and
reference the view as the source or target table. If the view is used as
the target, make sure the view is updateable .
Use the WITH <common table expression>
clause to filter out rows from the source or target tables. However, if
you are not careful, this method is similar to specifying additional
search criteria in the ON
clause and may produce incorrect results. You should test this approach
thoroughly before implementing it .