SQL Server 2008 : Managing Query Performance - Finding Similar Queries

2/8/2011 4:04:05 PM

Prior to SQL Server 2008, finding and tuning similar queries was a daunting task. Prior to SQL Server 2008, you would have to capture a set of queries using a trace or Dynamic Management View (DMV), perform some logic to replace literal values within those queries, and then group by some or all of the query text. The concept is to try to replace literal values with something, such as a pound sign (for example, SELECT col1 FROM Table1 WHERE col1 = # AND col2 = #), so you can group the queries such that the only difference is the parameter values. This approach is very time consuming and in many cases less than accurate. Microsoft has added the query_hash and query_plan_hash columns to the sys.dm_exec_query_stats and the sys.dm_exec_requests DMVs to provide a far more accurate way of finding similar queries.

The sys.dm_exec_query_stats DMV returns aggregate performance statistics about query plans that are currently in cache, and the sys.dm_exec_requests DMV returns information about queries that are currently executing. A query hash is a binary hash value that is calculated on a query based on the query text prior to being sent to the optimizer. A query plan hash is a binary hash value that is calculated based on a compiled query plan. Two queries may have the same query hash values and different plan hash values. For example, if you execute two queries that are identical except for the literal values, the query hash value will be the same. However, one query may produce an index scan, while the other may produce an index seek in the query plan (depending on the range of values being requested), resulting in different query plan hash values.

Now that you understand the mechanics of using hash values, let's take a look at a few examples. Run the script in Listing 1 to clear the plan cache and execute some similar queries.

Example 1. Sample Queries to Create Similar Hash Values

DBCC FREEPROCCACHE
GO
SELECT * FROM sys.messages WHERE severity = 23
GO
SELECT * FROM sys.messages WHERE severity = 23
GO
SELECT * FROM sys.messages WHERE severity = 23;
GO
SELECT * FROM sys.messages WHERE severity=23
GO
SELECT * FROM sys.messages WHERE severity = 24
GO
SELECT * FROM sys.messages WHERE severity IN (24)
GO
SELECT * FROM sys.messages WHERE severity in (23,24)
GO

Now that you have executed a few similar queries, you can review the output from the sys.dm_exec_query_stats DMV. You will also use the sys.dm_exec_sql_text function to return the query text by passing in the plan handle used for the query. Run the query in Listing 2 to review the hash values generated by the queries in Listing 1.

Example 2. Query to Analyze the Hash Values Generated by Listing 1

SELECT qs.query_hash,
       qs.query_plan_hash,
       st.text ,
       qs.execution_count,
       qs.total_worker_time
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text (plan_handle) st
WHERE st.text like 'Select * From sys.messages%'
ORDER BY qs.total_worker_time desc

The results from Listing 2 are displayed in Figure 1. As you can see by the execution count, out of the seven queries we ran, only two of them used the same cached execution plan. The query text must be exactly the same in order for SQL Server to reuse the execution plan. There are only minor differences in the queries we are limiting on severity 23, but even an extra space or a semicolon causes SQL Server to compile a separate execution plan. It would be much more efficient to parameterize these queries so that SQL Server could make use of the same cached execution plan for each one. If you look at the hash values in the first two columns, you will see they all match except for the query in the second row. The query text and query plan were both slightly different enough to produce different hash values.

Figure 1. Results returned from running the query in Listing 2

Now let's run the query in Listing 3 to aggregate the data based on the hash values to get a more accurate view of the execution count and total worker time for the queries run in Listing 1.

Example 3. Query to Aggregate Execution Count and Total Worker Time Using Hash Values

SELECT qs.query_hash,
       qs.query_plan_hash,
       MIN(st.text) QueryTextLike,
       SUM(qs.execution_count) ExecutionCount,
       SUM(qs.total_worker_time)TotalWorkerTime
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text (plan_handle) st
WHERE st.text LIKE 'SELECT * FROM sys.messages%'
GROUP BY qs.query_hash, qs.query_plan_hash
ORDER BY TotalWorkerTime DESC

Now look at the aggregated results in Figure 2 . This shows an execution count of six for the queries that share the same hash values. This level of aggregation could raise queries to the top of your result set that may otherwise go unnoticed. For example, if you are looking for the top ten queries with the highest total worker time, and if several like queries were run hundreds of times generating different execution plans, they may not even make the list. Using the new plan hash feature will allow you to focus on tuning the queries actually using the most resources on your system instead of just those heavy hitters that are easy to find.