programming4us
programming4us
DATABASE

Transact-SQL in SQL Server 2008 : MERGE Statement

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
7/6/2011 5:49:51 PM
In versions of SQL Server prior to SQL Server 2008, if you had a set of data rows in a source table that you wanted to synchronize with a target table, you had to perform at least three operations: one scan of the source table to find matching rows to update in the target table, another scan of the source table to find nonmatching rows to insert into the target table, and a third scan to find rows in the target table not contained in the source table that needed to be deleted. SQL Server 2008, however, introduces the MERGE statement. With the MERGE statement, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table, all in just a single statement, minimizing the number of times that rows in the source and target tables need to be processed. The MERGE statement can also be used for performing conditional inserts or updates of rows in a target table from a source table.

The MERGE syntax consists of the following primary clauses:

  • The MERGE clause specifies the table or view that is the target of the insert, update, or delete operations.

  • The USING clause specifies the data source being joined with the target.

  • The ON clause specifies the join conditions that determine how the target and source match.

  • The WHEN MATCHED clause specifies either the update or delete operation to perform when rows of target table match rows in the source table and any additional search conditions.

  • WHEN NOT MATCHED BY TARGET specifies the insert operation when a row in the source table does not have a match in the target table.

  • WHEN NOT MATCHED BY SOURCE specifies the update or delete operation to perform when rows of the target table do not have matches in the source table.

  • The OUTPUT clause returns a row for each row in the target that is inserted, updated, or deleted.

The basic syntax of the MERGE statement is as follows:

[ WITH common_table_expression [,...n] ]
MERGE
[ TOP ( N ) [ PERCENT ] ]
[ INTO ] target_table [ [ AS ] table_alias ]
USING table_or_view_name [ [ AS ] table_alias ]
ON merge_search_condition
[ WHEN MATCHED [ AND search_condition ]
THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND search_condition ]
THEN { INSERT [ ( column_list ) ] { VALUES ( values_list ) | DEFAULT VALUES
}} ]
[ WHEN NOT MATCHED BY SOURCE [ AND search_condition ]
THEN { UPDATE SET set_clause | DELETE } ] [ ...n ]
[ OUTPUT column_name | scalar_expression
INTO { @table_variable | output_table } [ (column_list) ] ]
[ OUTPUT column_name | scalar_expression [ [AS] column_alias_identifier ] [
,...n ] ] ;



The WHEN clauses specify the actions to take on the rows identified by the conditions specified in the ON clause. The conditions specified in the ON clause determine the full result set that will be operated on. Additional filtering to restrict the affected rows can be specified in the WHEN clauses. Multiple WHEN clauses with different search conditions can be specified. However, if there is a MATCH clause that includes a search condition, it must be specified before all other WHEN MATCH clauses.

Note that the MERGE command must be terminated with a semicolon (;). Otherwise, you receive a syntax error.

When you run a MERGE statement, rows in the source are matched with rows in the target based on the join predicate that you specify in the ON clause. The rows are processed in a single pass, and one insert, update, or delete operation is performed per input row depending on the WHEN clauses specified. The WHEN clauses determine which of the following matches exist in the result set:

  • A matched pair consisting of one row from the target and one from the source as a result of matching condition in the WHEN MATCHED clause

  • A row from the source that has no matching row in the target as a result of the condition specified the WHEN NOT MATCHED BY TARGET clause

  • A row from the target that has no corresponding row in the source as a result of the condition specified in the WHEN NOT MATCHED BY SOURCE clause

The combination of WHEN clauses specified in the MERGE statement determines the join method that SQL Server will use to process the query (see Table 1).

Table 1. Join Methods Used for WHEN Clauses
Specified WHEN ClausesJoin Method
WHEN MATCHED clause onlyINNER JOIN
WHEN NOT MATCHED BY TARGET clause, but notWHEN NOT MATCHED BY SOURCE clause the LEFT OUTER JOIN from source to target
WHEN MATCHED clause and the WHEN NOT MATCHED BY SOURCE clause, but not the WHEN NOT MATCHED BY TARGET clauseRIGHT OUTER JOIN from source to target
WHEN NOT MATCHED BY TARGET clause and the WHEN NOT MATCHED BY SOURCE clauseFULL OUTER JOIN
WHEN NOT MATCHED BY SOURCE clause onlyANTI SEMI JOIN

To improve the performance of the MERGE statement, you should make sure you have appropriate indexes to support the join columns between the source table and target table. Any additional columns in the source table index that will help to cover the query may help improve performance even more . The indexes should ensure that the join keys are unique and, if possible, sort the data in the tables in the order it will be processed so additional sort operations are not necessary. Unique indexes supporting the join conditions for the MERGE statement will improve query performance because the query optimizer does not need to perform extra validation processing to locate and update duplicate rows.

To better understand how the MERGE statement works, let’s look at an example. First, you need to set up some data in a source table. In the bigpubs2008 database, there is a table called stores. For this example, let’s assume you want to set up a new table that keeps track of each store’s inventory to support an application that can monitor each store’s inventory and send notifications when certain items run low, as well as to support the ability of each store to search other store inventories to locate rare and out-of-print books that other stores may have available. On a daily basis, each store uploads a full refresh of its current inventory to a staging table (inventory_load), which is the source table for the MERGE. You then use the inventory_load table to modify the store’s inventory in the store_inventory table (which is the target table for the MERGE operation).

First, let’s create the new store_inventory table (see Listing 1). Just for sake of the example, you can create and populate it with the existing data from the sales table for stor_id 'A011' and create a primary key constraint on the stor_id and title_id columns. The next step is to load the inventory_load table. Normally, in a real-world scenario, this table would likely be populated via a BULK INSERT statement or SQL Server Integration Services. However, for the sake of this example, you simply are going to create some test data by creating and populating the inventory_load table using SELECT INTO with data merged from the sales data for both stor_id 'A011' and 'A017'.

When the inventory_load table is created and populated, you can create a primary key on the stor_id and title_id columns as well to support the join with the store_inventory table.

The next step is to build out the MERGE statement. Following are the rules to be applied:

  • If there is a matching row between the source and target tables and the qty value is different, update the qty value in the target table to the value in the source table.

  • If a row in the source table doesn’t have a match in the target table, this is a new inventory item, so insert the new row to the target table.

  • If a row in the target table doesn’t have a matching row in the source table, that inventory item no longer exists, so delete it from the target table.

Also for the sake of the example so that you can see just what the MERGE statement ends up doing, the OUTPUT clause has been added with the $action column included. The $action column displays what operation (INSERT, UPDATE, DELETE) was performed on each row, and displays the title_id and qty values for both the source and target tables for each row processed (note that if the title_id and qty columns are NULL, that was a nonmatching row).

Listing 1. A MERGE Example
use bigpubs2008
go
if OBJECT_ID('store_inventory') is not null
drop table store_inventory
go
-- Create and populate the store_inventory table
select stor_id, title_id, qty = SUM(qty), update_dt = GETDATE()
into store_inventory
from sales s
where stor_id = 'A011'
group by stor_id, title_id
go
-- add primary key on store_inventory to support the join to source table
alter table store_inventory add constraint PK_store_inventory primary key
(stor_id, title_id)
Go
if OBJECT_ID('inventory_load') is not null
drop table inventory_load
go

-- Now, create and populate the inventory_load table
select stor_id = 'A011',
title_id,
qty = SUM(qty)
into inventory_load
from sales s
where stor_id like 'A01[17]'
and title_id not like '%8'
group by title_id
go
— add primary key on store_inventory to support the join to target table
alter table inventory_load add constraint PK_inventory_load primary key
(stor_id, title_id)
go
select * from store_inventory
go

-- perform the marge, updating any matching rows with different quantities
-- adding any rows in source not in the target, and deleting any rows from the
-- target that are not in the source.
-- Output clause is specified to display the results of the MERGE
MERGE
INTO store_inventory as s
USING inventory_load as i
ON s.stor_id = i.stor_id
and s.title_id = i.title_id
WHEN MATCHED and s.qty <> i.qty
THEN UPDATE
SET s.qty = i.qty,
update_dt = getdate()
WHEN NOT MATCHED
THEN INSERT (stor_id, title_id, qty, update_dt)
VALUES (i.stor_id, i.title_id, i.qty, getdate())
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT $action,
isnull(inserted.title_id, '') as src_titleid, isnull(str(inserted.qty, 5),
'') as src_qty,
isnull(deleted.title_id, '') as tgt_titleid, isnull(str(deleted.qty, 5),
'') as tgt_qty
;
go
select * from store_inventory
go



If you run the script in Listing 1, you should see output like the following.

stor_id title_id qty         update_dt
------- -------- ----------- -----------------------
A011 CH0741 1452 2010-03-25 00:34:25.597
A011 CH3348 24 2010-03-25 00:34:25.597
A011 FI0324 1392 2010-03-25 00:34:25.597
A011 FI0392 1176 2010-03-25 00:34:25.597
A011 FI1552 1476 2010-03-25 00:34:25.597
A011 FI1872 540 2010-03-25 00:34:25.597
A011 FI3484 1428 2010-03-25 00:34:25.597
A011 FI3660 984 2010-03-25 00:34:25.597
A011 FI4020 1704 2010-03-25 00:34:25.597
A011 FI4970 1140 2010-03-25 00:34:25.597
A011 FI4992 180 2010-03-25 00:34:25.597
A011 FI5832 1632 2010-03-25 00:34:25.597
A011 NF8918 1140 2010-03-25 00:34:25.597
A011 PC9999 1272 2010-03-25 00:34:25.597
A011 TC7777 1692 2010-03-25 00:34:25.597

(15 row(s) affected)

$action
---------- ------ ----- ------ -----
INSERT BU2075 1536
DELETE CH3348 24
INSERT CH5390 888
INSERT CH7553 540
INSERT FI1950 1308
INSERT FI2100 1104
INSERT FI3822 996
UPDATE FI4970 1632 FI4970 1140
INSERT FI7040 1596
INSERT LC8400 732
DELETE NF8918 1140

(11 row(s) affected)
stor_id title_id qty update_dt
------- -------- ----------- -----------------------
A011 BU2075 1536 2010-03-25 00:54:54.547
A011 CH0741 1452 2010-03-25 00:34:25.597
A011 CH5390 888 2010-03-25 00:54:54.547
A011 CH7553 540 2010-03-25 00:54:54.547
A011 FI0324 1392 2010-03-25 00:34:25.597
A011 FI0392 1176 2010-03-25 00:34:25.597
A011 FI1552 1476 2010-03-25 00:34:25.597
A011 FI1872 540 2010-03-25 00:34:25.597
A011 FI1950 1308 2010-03-25 00:54:54.547
A011 FI2100 1104 2010-03-25 00:54:54.547
A011 FI3484 1428 2010-03-25 00:34:25.597
A011 FI3660 984 2010-03-25 00:34:25.597
A011 FI3822 996 2010-03-25 00:54:54.547
A011 FI4020 1704 2010-03-25 00:34:25.597
A011 FI4970 1632 2010-03-25 00:54:54.547
A011 FI4992 180 2010-03-25 00:34:25.597
A011 FI5832 1632 2010-03-25 00:34:25.597
A011 FI7040 1596 2010-03-25 00:54:54.547
A011 LC8400 732 2010-03-25 00:54:54.547
A011 PC9999 1272 2010-03-25 00:34:25.597
A011 TC7777 1692 2010-03-25 00:34:25.597

(21 row(s) affected)



If you examine the results and compare the before and after contents of the store_inventory, you see that eight new rows were inserted to store_inventory, two rows were deleted, and one row was updated.

MERGE Statement Best Practices and Guidelines

The MERGE statement is a great addition to the T-SQL language. It provides a concise and efficient mechanism to perform multiple operations on a table based on contents in a source table without having to resort to using a cursor or running multiple set-oriented operations against the table. However, there are some guidelines and best practices you should keep in mind to help ensure you get the best performance from your MERGE statements.

First, you should try to reduce the number of rows accessed by the MERGE statement early in the process by specifying any additional search condition to the ON clause that filters out rows that do not need to be processed. You should avoid using the conditions in the WHEN clauses as row filters. However, you need to be careful if you are using any of the WHEN NOT MATCHED clauses because the elimination of rows via the ON clause may cause unexpected and incorrect results. Because the additional search conditions specified in the ON clause are not used for matching the source and target data, they can be misapplied.

To ensure correct results are obtained, you should specify only search conditions in the ON clause that determine the criteria for matching data in the source and target tables. That is, specify only columns from the target table that are compared to the corresponding columns of the source table. Do not include comparisons to other values such as a constant.

To filter out rows from the source or target tables, you should consider using one of the following methods.

  • Specify the search condition for row filtering in the appropriate WHEN clause. For example, WHEN NOT MATCHED AND qty > 0 THEN INSERT....

  • Define a view on the source or target that returns the filtered rows and reference the view as the source or target table. If the view is used as the target, make sure the view is updateable .

  • Use the WITH <common table expression> clause to filter out rows from the source or target tables. However, if you are not careful, this method is similar to specifying additional search criteria in the ON clause and may produce incorrect results. You should test this approach thoroughly before implementing it .

Other  
 
programming4us
 
 
programming4us