Prior to SQL Server 2008, finding and tuning similar
queries was a daunting task. Prior to SQL Server 2008, you would have to
capture a set of queries using a trace or Dynamic Management View
(DMV), perform some logic to replace literal values within those
queries, and then group by some or all of the query text. The concept is
to try to replace literal values with something, such as a pound sign
(for example, SELECT col1 FROM Table1 WHERE col1 = # AND col2 = #),
so you can group the queries such that the only difference is the
parameter values. This approach is very time consuming and in many cases
less than accurate. Microsoft has added the query_hash and
query_plan_hash columns to the sys.dm_exec_query_stats and the sys.dm_exec_requests DMVs to provide a far more accurate way of finding similar queries.
The sys.dm_exec_query_stats DMV returns aggregate performance statistics about query plans that are currently in cache, and the sys.dm_exec_requests DMV returns information about queries that are currently executing. A query hash is a binary hash value that is calculated on a query based on the query text prior to being sent to the optimizer. A query plan hash
is a binary hash value that is calculated based on a compiled query
plan. Two queries may have the same query hash values and different plan
hash values. For example, if you execute two queries that are identical
except for the literal values, the query hash value will be the same.
However, one query may produce an index scan, while the other may
produce an index seek in the query plan (depending on the range of
values being requested), resulting in different query plan hash values.
Now that you understand the mechanics of using hash values, let's take a look at a few examples. Run the script in Listing 1 to clear the plan cache and execute some similar queries.
Example 1. Sample Queries to Create Similar Hash Values
DBCC FREEPROCCACHE GO SELECT * FROM sys.messages WHERE severity = 23 GO SELECT * FROM sys.messages WHERE severity = 23 GO SELECT * FROM sys.messages WHERE severity = 23; GO SELECT * FROM sys.messages WHERE severity=23 GO SELECT * FROM sys.messages WHERE severity = 24 GO SELECT * FROM sys.messages WHERE severity IN (24) GO SELECT * FROM sys.messages WHERE severity in (23,24) GO
|
Now that you have executed a few similar queries, you can review the output from the sys.dm_exec_query_stats DMV. You will also use the sys.dm_exec_sql_text function to return the query text by passing in the plan handle used for the query. Run the query in Listing 2 to review the hash values generated by the queries in Listing 1.
Example 2. Query to Analyze the Hash Values Generated by Listing 1
SELECT qs.query_hash, qs.query_plan_hash, st.text , qs.execution_count, qs.total_worker_time FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text (plan_handle) st WHERE st.text like 'Select * From sys.messages%' ORDER BY qs.total_worker_time desc
|
The results from Listing 2 are displayed in Figure 1.
As you can see by the execution count, out of the seven queries we ran,
only two of them used the same cached execution plan. The query text
must be exactly the same in order for SQL Server to reuse the execution
plan. There are only minor differences in the queries we are limiting on
severity 23, but even an extra space or a semicolon causes SQL Server
to compile a separate execution plan. It would be much more efficient to
parameterize these queries so that SQL Server could make use of the
same cached execution plan for each one. If you look at the hash values
in the first two columns, you will see they all match except for the
query in the second row. The query text and query plan were both
slightly different enough to produce different hash values.
Now let's run the query in Listing 3
to aggregate the data based on the hash values to get a more accurate
view of the execution count and total worker time for the queries run in
Listing 1.
Example 3. Query to Aggregate Execution Count and Total Worker Time Using Hash Values
SELECT qs.query_hash, qs.query_plan_hash, MIN(st.text) QueryTextLike, SUM(qs.execution_count) ExecutionCount, SUM(qs.total_worker_time)TotalWorkerTime FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text (plan_handle) st WHERE st.text LIKE 'SELECT * FROM sys.messages%' GROUP BY qs.query_hash, qs.query_plan_hash ORDER BY TotalWorkerTime DESC
|
Now look at the aggregated results in Figure 2.
This shows an execution count of six for the queries that share the
same hash values. This level of aggregation could raise queries to the
top of your result set that may otherwise go unnoticed. For example, if
you are looking for the top ten queries with the highest total worker
time, and if several like queries were run hundreds of times generating
different execution plans, they may not even make the list. Using the
new plan hash feature will allow you to focus on tuning the queries
actually using the most resources on your system instead of just those
heavy hitters that are easy to find.