Tuning SQL Server queries can be as
much of an art as a science. However, you can use a number of tools and
methods to make tuning your queries easier. The first thing to realize
is that most of the time, when queries are running slowly, it’s because
the T-SQL code within them is incorrect or badly structured.
Frequently, queries run slowly because a well-written query is not
using an index correctly or an index is missing from the table.
Sometimes, you even run into odd bits of behavior that just require
extra work from you to speed up the query.
Regardless of the cause of the performance
problem, you’ll need a mechanism to identify what is occurring within
the T-SQL query. SQL Server provides just such a mechanism in the form
of execution plans. You’ll also need some method of retrieving query
performance data and other query information directly from SQL Server.
You can capture query execution times using Extended Events. You may
not have the time to learn all the latest methods and tricks for tuning
your system, but you’re going to want it tuned anyway. This is where
the Database Tuning Advisor comes in. These three tools—execution
plans, Extended Events, and Database Tuning Advisor—provide the means
for you to identify queries for tuning, understand what’s occurring
within the query, and automatically provide some level of tuning to the
query.
1. Understanding Execution Plans
There are two types of execution plans in SQL
Server: estimated and actual. Queries that manipulate data, also known
as Data Manipulation Language (DML) queries, are the only ones that
generate execution plans. When a query is submitted to SQL Server, it
goes through a process known as query optimization. The query
optimization process uses the statistics about the data, the indexes
inside the databases, and the constraints within and between the tables
in SQL Server to figure out the best method for accessing the data that
was defined by the query. It makes these estimates based on the
estimated cost to the system in terms of the length of time that the
query will run. The cost-based estimate that comes out of the
optimization process is the estimated execution plan. The query and the
estimated execution plan are passed to the data access engine within
SQL Server. The data access engine will, most of the time, use the
estimated execution plan to gather the data. Sometimes, it will find
conditions that cause it to request a different plan from the
optimizer. Either way, the plan that is used to access the data becomes
the actual execution plan.
Each plan is useful in its own way. The best
reason to use an estimated plan is because it doesn’t actually execute
the query involved. This means that if you have a very large query or a
query that is running for very excessive amounts of time, rather than
waiting for the query to complete its execution and an actual plan to
be generated, you can immediately generate an estimated plan. The main
reason to use actual plans is that they show some actual metrics from
the query execution as well as all the information supplied with the
estimated plan. When the data access engine gets a changed plan, you
will see the changed execution plan, not the estimated plan, when you
look at the actual execution plan.
There are a number of
possible ways to generate both estimated and actual execution plans.
There are also a number of different formats that the plans can be
generated in. These include the following:
- Graphical: This is one of the most frequently used execution
plans and one of the easiest to browse. Most of the time, you’ll be
reading this type of execution plan.
- XML: SQL Server stores and manipulates its plans as XML. It
is possible for you to get to this raw data underneath the graphical
plan when you need to do so. By itself, the XML format is extremely
difficult to read. However, it can be converted into a graphical plan
quite easily. This format for the execution plan is very handy for
sending to coworkers, consultants, or Microsoft Support when someone is
helping you troubleshoot bad performance.
- Text: The text execution plans are being phased out of SQL
Server. They can be easy to read as long as the plan is not very big,
and they are quite mobile for transmitting to others. However, since
this format is on the deprecation list for SQL Server, no time will be
spent on it here.
The easiest and most frequently used method for
generating a graphical execution plan is through the query window in
SQL Server Management Studio. Open Management Studio, connect to your
server, and right-click a database. From the context menu, select New
Query. A new query window will open. For this example, we’re using
Microsoft’s test database, AdventureWorks2008R2. Type a query into the
window that selects from a table or executes a stored procedure. Here’s
the query we’re using (salesquery.sql
in the download):
SELECT p.[Name],
soh.OrderDate,
soh.AccountNumber,
sod.OrderQty,
sod.UnitPrice
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID
JOIN Production.Product AS p
ON sod.ProductID = p.ProductID
WHERE p.[Name] LIKE 'LL%'
AND soh.OrderDate BETWEEN '1/1/2008' AND '1/6/2008' ;
You can run this query and get results. To see
the estimated execution plan, click the appropriate icon on the SQL
Editor toolbar. It’s the circled icon on the toolbar in Figure 1.
Figure 1. SQL Editor toolbar with the Display Estimated Execution Plan icon and tooltip
This will immediately open a new tab in the
results pane of the Query Editor window. On this tab will be displayed
the estimated execution plan. Figure 2 shows the estimated execution plan.
Figure 2. Estimated execution plan
The first thing to note is that the query was
not executed. Instead, the query was passed to the optimizer inside SQL
Server, and the output of the optimizer, this execution plan, was
returned. There’s a lot of information to understand on this execution
plan. At the top of Figure 3,
you see the text “Query 1: Query cost (relative to the batch): 100%.”
When there is more than one statement inside a query, meaning two SELECT
statements, a SELECT
statement and an INSERT
statement, and so on, each of the individual statements within the
query batch will show its estimated cost to the entire batch. In this
case, there’s only one query in the batch, so it takes 100 percent of
the cost. Just below that, the text of the query is listed. Next,
printed in green, is Missing Index information. This will only be
visible if the optimizer has identified a potential missing index. In
some instances, the optimizer can recognize that an index may improve
performance. When it does, it will return that information with the
execution plan. Immediately below this is the graphical execution plan.
A graphical plan consists of icons representing operations, or
operators, within the query and arrows connecting these operations. The
arrows present the flow of data from one operator to the next.
There is a lot more to be seen within the
execution plan, but instead of exploring the estimated plan in detail,
we’ll drill down on the actual execution plan. To enable the actual
execution plan, refer to Figure 1.
Use the icon second from the right of the figure to enable the display
of actual execution plans. When you click it, nothing will happen, but
it’s a switch. It will stay selected. Now execute the query. When the
query completes, the result set and/or the Messages tab will be
displayed as it normally would. In addition, the Execution Plan tab is
visible. Click that, and you will see something similar to Figure 3.
Figure 3. Actual execution plan, including operator order
The numbers displayed to the right of each of
the operators were added and will be explained a little later. You’ll
see that this actual execution plan looks more or less identical to the
estimated execution plan shown in Figure 1.
In lots of instances, the statistics on the indexes and data within the
database are good enough that the estimated plan will be the same as
the actual execution plan. There are, however, large differences not
immediately visible, but we’ll get to those later.
Graphical execution plans show two different
flows of information. They are displayed in a manner that defines the
logical flow of data; there is a SELECT
statement that
has to pull information from a hash match operator, and so on. The
physical flow of information is read from the top, right, and then down
and to the left. But you have to take into account that some operations
are being fed from other operators. We’ve shown the sequence that this
particular execution plan is following through the numbers to the right
of the operators. The first operator in sequence is the Clustered Index
Scan operator at the top of the execution plan. This particular
operator represents the reads necessary from the clustered index,
detailed on the graphical plan, SalesOrderheader.PK_SaleOrderHeaderId
.
You can see a number below that: “Cost: 51%.” That number represents
the optimizer’s estimates of how much this operator will cost, compared
to all the other operations in the execution plan. But it’s not an
actual number; it represents the number of seconds that the optimizer
estimates this operation will take. These estimates are based on the
statistics that the optimizer deals with and the data returned by
preceding operations. When this varies, and it does frequently, these
estimated costs will be wrong. However, they don’t change as the
execution plan changes. The output from the Clustered Index Scan
is represented by the thin little arrow pointing to operator 5. That arrow represents the rows of data coming out of the Clustered Index Scan
operator. The size of the arrow is emblematic of the number of rows being moved. Because the next operation, Hash Match
, relies on two feeds of data, you must resolve the feed before resolving the Hash Match
operator. That’s why you then move back over to the right to find operation 2, Index Seek
. The output from 2, Index Seek
, feeds into 4, the Nested Loop
operator. Since the Nested Loop
operator has two feeds, you again must find the source of the other feed, which is 3, the other Index Seek
operator. Operations 2 and 3 combine in operation 4, and then output
from operation 4 combines with that of operation 1 inside operation 5.
The final output goes to operation 6, the SELECT
statement.
It can sound daunting and
possibly even confusing to explain how the data flows from one operator
to the next, but the arrows representing the rows of data should help
show the order. There are more than 100 different operations and
operators, so we won’t detail them here. In this instance, the
operators that are taking multiple feeds represent the JOIN
operations within the query that combines the data from multiple
tables. A lot more information is available within the graphical
execution plan. If you hover over an operator with the mouse pointer,
you’ll get a tooltip displaying details about the operator. Figure 4 shows the tooltip for Nested Loops (Inner Join)
.
Figure 4. Nested Loops tooltip
The tooltip gives you a
lot more information about the operator. Each of the operator tooltips
is laid out in roughly the same way, although the details will vary.
You can see a description at the top window that names the operator and
succinctly describes what it does and how it works. Next is a listing
of measurements about the operation. In these measurements, you can
begin drilling down on the operators to understand what each individual
operator is doing and how well it’s doing it. You can see some of the
differences between the estimated and actual execution plans here. Near
the top of Figure 4
is the measurement Actual Number of Rows and a value of 2207. Just
below halfway down is the Estimated Number of Rows measurement and a
value of 2576.3. This means that although there are an estimated 2576.3
rows being returned, the actual number of rows is a slightly less, at
2207. At the bottom of the tooltip are details about the operator: the
output, the input, or the objects on which the operator is working. In
this case, it was the output of the operator and the references used to
do the loop join. When you move the mouse again, the tooltip closes.
You can also get a tooltip about the rows of information. Again, hover over the arrow instead of the operator. Figure 5 shows an example of the data flow tooltip.
Figure 5. Data flow tooltip
The data flow tooltip just shows the
information you see. The actual plan shows the number of rows in
addition to the estimated rows available in the estimated execution
plan. It’s useful to see how much data is being moved through the query.
Even more details about the operators are
available. Right-click one of the operators, and select Properties from
the context menu to open a properties view similar to the one shown in Figure 6.
Figure 6. Execution plan operator properties
A lot of the information available on the
tooltip is repeated here in the properties, and the Properties window
has even more information available. You can open the pieces of data
that have a plus sign next to them to get more information. All this is
available to you so you can understand what’s happening within a query.
However, getting to the information from the graphical execution plan a
little bit of work. If you want to deal with nothing but raw data, you
need to look at the XML execution plan.
There are a few ways to generate an XML
execution plan, but since even the graphical execution plans we’ve been
working with have XML behind the scenes, it’s possible to simply use a
graphical plan to generate XML. Right-click inside the execution plan,
and select Show Execution Plan XML. This will open a new window. Figure 7 shows a partial representation of an XML execution plan.
Figure 7. The XML execution plan
All the information available through
the graphical part of the plan and from the properties of the plan is
available within the XML. Unfortunately, XML is somewhat difficult to
read. Primarily, you’ll use the XML plans as a means to transmit the
execution plan to coworkers or support personnel.