One of the most powerful and beneficial features of
SQL Server is its query optimizer. Part of what we are paying for when
we purchase SQL Server is the expertise of the SQL Server product
engineers in the form of the query optimizer. The optimizer uses a
number of internal mechanisms to determine what statistically appears to
be the most efficient way to execute each statement. It generates a
plan for the statement’s execution based on its determinations and then
hands the plan off to be executed. The majority of the time we are
better off letting the optimizer do its job and to just stay out of its
way. There are times, however, when for one or more reasons we may need
to either offer guidance to the optimizer or completely override it.
This section discusses Hints, the Query Governor, and the new Resource
Governor. These are tools that we can use to either guide or control the
query optimizer, to limit which statements it is allowed to execute
based on their cost, or to limit what resources are available to those
statements as they execute.
Understanding Hints
Hints are syntax
elements we can use to override the optimizer’s behavior and force
certain behaviors that the optimizer may not otherwise choose
automatically. It is the query optimizer’s job to determine the most
efficient way of executing your statements. It has a number of factors
to consider as it goes through the optimization process. Before the
optimizer can optimize our statements, it first needs to understand the
intent of the query. It does this by normalizing the syntax you submit
into a more concise internal representation. It is this normalization of
our syntax that makes it possible for us to write statements in ways
that make sense to us rather
than requiring us to adhere to specific syntax coding rules in order to
optimize performance. Once the optimizer has standardized the syntax
into a more normal form, it can then create an optimal execution plan
for it.
As the optimizer works for
the most efficient query plan, it deals with the estimated cost of the
query. The cost is a number made up of both I/O and CPU costs. The
optimizer uses statistics about the data in our tables and indexes. The
statistics are calculated periodically by the data engine and stored as
metadata in the database. Because the statistics are an approximate
model of the data, they can sometimes be misleading. That leaves a small
chance that the optimizer can be misguided by those statistics and
actually come up with a plan that is not optimal. It is in those rare
cases that we may consider overriding the optimizer with hints mixed
into our syntax to guide or override its behavior. There are three
different types of hints: Query Hints, Join Hints, and Table Hints. We
will discuss them here.
Tip
Hints are to be used
only in exceptional circumstances when creating production code. For
that reason, unless a question is asking you specifically about how to
do a hint, the answer with a hint in it is likely the wrong answer.
Using Query Hints
Query hints are included in your statements as part of the OPTION clause. The OPTION clause can be used in all of the DML statements—SELECT, INSERT, UPDATE, DELETE, and MERGE. The OPTION clause is always the last clause in the statement, with the exception of the INSERT statement where it is very near the end. The OPTION clause takes the format :
OPTION (<query_hint>[,...n])
We talked about a specific query hint in the Common Table Expression section when we talked about the MAXRECURSION option. You may recall from that discussion that the default MAXRECURSION value is 100, but we can override it with the MAXRECURSION
query option. Typically, we would use that to allow hierarchies that
are deeper than 100 levels to be processed. We could of course also use
it to ensure that a query did not go deeper than a certain level by setting it to the desired depth. Here is an example of a recursive query with the MAXRECURSION option included:
USE AdventureWorks2008;
GO
WITH OrgChart (OrganizationNode,LoginID,MLoginID) AS
(SELECT OrganizationNode, LoginID, CAST(NULL AS nvarchar(256))
FROM HumanResources.Employee
WHERE OrganizationNode = HierarchyID::GetRoot()
UNION ALL
SELECT E.OrganizationNode, E.LoginID, M.LoginID
FROM HumanResources.Employee AS E
JOIN OrgChart AS M
ON E.OrganizationNode.GetAncestor(1) = M.OrganizationNode)
SELECT * FROM OrgChart
OPTION(MAXRECURSION 4);
You may be tempted to try this query out yourself. Go ahead and give it a try! If you do, try it a second time with a MAXRECURSION value of 3 rather than 4 and see what happens. The important thing to take from the example is the position and syntax of the OPTION clause. A number of query hints can be specified in the OPTION
clause, far too many for us to discuss in detail. You should review the
SQL Server 2008 documentation regarding query hints. We will cover them
at a high level, however.
One more comment about the OPTION clause should be discussed before we list some of the hints. That is that the OPTION clause affects the entire statement. There can also only be one OPTION clause per statement. If the statement is a UNION, the OPTION clause needs to be stated at the end of the last SELECT statement. Now let’s take a quick look at some of the options.
The GROUP, UNION, and JOIN hints allow us to specify the mechanism by the optimizer to affect GROUP BY, UNION, and JOIN operations. The FAST hint allows us to request that the query be optimized for a certain number of rows to be returned first. The FORCE ORDER hint allows us to require that the optimizer use tables of multiple joins in the same order we specify them. The KEEP PLAN, KEEPFIXED PLAN, and RECOMPILE
hints allow us to reduce the frequency of, disable, or require the
recompilation of the plan for a given statement. There is even a USE PLAN
hint that allows us to specify the precise plan we would like the
optimizer to use. Of course, if we state a hint that makes the query
unexecutable, SQL Server will raise an error (Error 8622). Again, you
should review the SQL Server 2008 documentation regarding the use of
query hints, which presents more complete explanations of the hints and
examples of their use.
Remember that we only get one OPTION clause per statement, and the OPTION
clause affects the entire statement. If we need a way to more
explicitly override the behavior of different tables or views referenced
in a statement, then we can use table hints.
Using Table Hints
Table hints can be used in
our syntax to indicate specific behaviors on a table-by-table basis.
They are much more specific than query hints in that they target a
single table in a statement rather than the entire statement. As with
query hints, table hints can be used in SELECT, INSERT, UPDATE, DELETE, and MERGE statements. Table hints are specified in a WITH
clause that is stated immediately after each table reference in a
statement. Note that SQL Server will permit you to state table hints
without using the keyword WITH,
but only in some cases. This style of syntax, however, has been
deprecated, meaning that it will no longer be supported in future
versions. For that reason we should always include the WITH keyword in our table hints. The WITH clause has the following format:
WITH ( <table_hint> [ [,]...n ])
The available table hints fall into different categories; see Table 1 for a list of categories and their associated table hints.
Table 1. Table Hint Categories
Hint Category | Table Hints |
---|
Index Usage |
|
Lock Contention |
|
Lock Grain | NOLOCK, ROWLOCK,PAGLOCK, TABLOCK
|
Lock Mode |
|
Transaction Isolation Levels | READUNCOMMITTED,READCOMMITTED, READCOMMITTEDLOCK, REPEATABLEREAD, SERIALIZABLE, HOLDLOCK
|
As you can see, there are
a number of table hints. As with the section on query hints, you should
refer to the SQL Server 2008 documentation for a complete explanation
of each of them with examples for their usage. However, we can look at a
couple of them here.
One hint that can be a useful tool when we are testing indexes and their benefits is the INDEX
hint. This hint allows us to specify a specific index or set of indexes
we want the optimizer to use as it accesses data in the given table.
This tool should generally be used during index implementation and
development. Rarely is it a wise choice to implement query hints in
production code. There are exceptions to that rule, but they require
significant testing and frequent review. Rather than overriding the
optimizer, we should make sure it has the tools it needs (see the
sidebar “Indexes and Statistics”) to help it make better decisions. With
that disclaimer out of the way, let’s look at an example. Figure 1 shows two queries and their execution plans. The two queries are identical except for the table hint in the second query.
SQL
Server uses indexes in much the same way that we as humans use indexes
in a book. Indexes can help SQL Server to find just the data it needs to
service our queries. Books, however, usually only have a single index,
where tables can have numerous indexes. The optimizer will evaluate the
indexes that exist in a table as part of its optimization process to
determine if any of them would be helpful in processing the statement.
The optimizer evaluates
the indexes based on their statistics. The statistics are a measure of
the distribution of data in the index. It tells SQL Server if there are a
lot of unique values in the index or if there are a lot of duplicate
values. The more unique values there are in the index, the more likely
it can be used to find a few unique values out of the many. As the data
in our tables change, we need to make sure the statistics of the data
are being updated as well. We can use the Auto Create Statistics and
Auto Update Statistics database options as well as the CREATE STATISTICS and UPDATE STATISTICS statements to ensure the optimizer has current statistics about our data.
|
If you read the execution plan information of the screenshot shown in Figure 2, you will see some interesting things. First of all, remember that the first query is the one where we did not
include a hint. Its plan then represents the plan that the optimizer
chose on its own. The second query is the one where we forced the
optimizer to use index(0). Index 0
is the table itself. In other words, we requested that the optimizer
scan the entire table (or clustered index in this case) rather than
navigating the index to find just the rows it needed. The execution plan
shows that the optimizer’s plan was 21% the total cost of the batch,
whereas our query with the forced table scan was 79% of the total batch
cost. That means that the optimizer’s plan was cheaper and therefore the
better choice.
What is important to take away from the example shown in Figure 2
is how the table hint was stated and how it did in fact have an effect
on how the query was optimized. You should refer to the SQL Server 2008
documentation to review the other table hints and their purposes and to
see examples of their use.