2. Identifying indexes to add
The analysis and potential removal of duplicate,
unused, or infrequently used indexes can be periodically scheduled as a
weekly or monthly maintenance task. In contrast, the addition of
indexes is usually the result of a performance-tuning exercise, often
started to investigate the sudden or gradual performance decline in one
or more queries.
When a query is submitted to SQL Server for
execution, the query optimizer determines the ideal index structures for
executing the query. If such indexes exist, they're used; otherwise, a
suboptimal plan is created with the details of the missing/ideal indexes
stored for later reference. Such details can be accessed using one of
two methods: the XML query plan and Dynamic Management Views.
A common performance problem is related to the
absence of foreign key indexes. In almost all cases, an index on a
foreign key column will improve performance. For example, in an orders
table, there is typically a foreign key to a customers table, with a
very common query being something like select * from orders where customerId = 623901. In this example, an index on the customerId foreign key would most likely be beneficial.
|
XML QUERY PLANS
In its simplest form, the missing-indexes feature can be used for individual queries by inspecting the <MissingIndexes>
element of the XML execution plan of a query executed in SQL Server
Management Studio. As an example, let's imagine an application change
was recently made to support searching AdventureWorks orders by a
partial string match on the PurchaseOrderNumber column. Such a query
would look like this:
-- Generate a missing index event
SELECT OrderDate, CustomerID, SubTotal, SalesPersonID
FROM Sales.SalesOrderHeader
WHERE PurchaseOrderNumber like 'PO166%'
With no clustered or nonclustered index on the
PurchaseOrderNumber column, this query is forced to use a clustered
index scan. Let's execute the query after running SET STATISTICS XML ON. The results are shown in figure 2.
Note the XML that's returned below the result
set. Clicking on the XML will open the graphical query execution plan.
Right-click the resultant plan and choose Show Execution Plan XML. An
example of the resultant XML is shown in figure 3.
The part of the XML plan we're interested in is highlighted. The <MissingIndexes>
element contains the details of the ideal index that the query
optimizer identified as being missing. In this particular case, the
suggested index contains the PurchaseOrderNumber column as an index key
with the other columns as included columns. Further, the Impact attribute of the <MissingIndexGroup> element estimates a 91.9624 percent improvement in query performance if this index is in place.
Using this information, the create statement for the suggested index would be created as follows:
CREATE NONCLUSTERED INDEX ixSalesOrderHeader_PurchaseOrderNumber
ON Sales.SalesOrderHeader(PurchaseOrderNumber)
INCLUDE (OrderDate, CustomerID, SubTotal, SalesPersonID)
Rerunning the query with the above index in
place changes the execution plan from a clustered index scan to a
nonclustered index seek without the need for key lookups (courtesy of
the included columns). Running the query with SET STATISTICS IO ON confirms the reduction of page reads and the estimated 91 percent improvement suggested in the XML we saw earlier.
Now, this does not mean we should run out and
immediately add this index. The recommendation for this particular
example is made in isolation from the rest of the workload on the
server. If this was a one-off ad hoc query, then adding this index would
likely result in a net decrease in performance, given its maintenance
requirements and the fact that it would be used infrequently. However,
if this is a common query, then perhaps it's a candidate for addition.
In SQL Server versions prior to 2005, query
execution plans could be viewed in text or graphical format. Graphical
plans are nice, except when they span more than one screen and you need
to send them to someone for analysis. On the other hand, text plans are
sometimes difficult to read, especially complex plans. XML plans offer
the best of both worlds. They can be saved as XML files for
distribution, and when viewed in SQL Server Management studio, they're
shown graphically with a right-click option for displaying the XML.
Further, the XML can be inspected for details on missing indexes and
viewing the compiled-versus-runtime parameter values for assistance in
diagnosing parameter-sniffing problems.
|
In essence, XML query plans allow us to access
missing index information for a particular query. While this is
valuable, what would be really nice would be the ability to look back on
previous production workload to identify all of the missing indexes.
This is possible using the sys.dm_db_missing_index DMVs.
SYS.DM_DB_MISSING_INDEX DMVS
As we discussed earlier, when the query
optimizer uses a suboptimal query plan, it records the details of the
missing indexes that it believes are optimal for the query. In addition
to viewing these details for a particular query using the XML plans, we
can access these details for all queries since the SQL instance last restarted through the sys.dm_db_missing_index DMVs. There are four DMVs in this group:
sys.dm_db_missing_index_details—This
DMV returns one record for each missing index and contains columns for
suggested key and included columns. Each missing index is identified
with an index_handle.
sys.dm_db_missing_index_groups—This
DMV acts as a mapping between sys.dm_db_missing_index_details and
sys.dm_db_missing_index_group_stats. It contains two columns:
index_handle and group_handle.
sys.dm_db_missing_index_group_stats—Each
time the same index is identified by the query optimizer as missing,
its potential value grows. This DMV stores the accumulated statistics on
the index's potential value. Identified with group_handle, the
accumulated stats can be traced back to the details of the missing index
via the sys.dm_db_missing_index_groups DMV.
sys.dm_db_missing_index_columns—This
dynamic management function takes an index_handle as a parameter and
returns the missing index's columns in a table format with a column
indicating the suggested index column type.
To understand how each of these DMVs operates, let's use our example query from earlier:
SELECT OrderDate, CustomerID, SubTotal, SalesPersonID
FROM Sales.SalesOrderHeader
WHERE PurchaseOrderNumber like 'PO166%'
After restarting our SQL instance to clear the
contents of the DMVs, we'll run this query, along with inspecting the
results of the four DMVs. There are a couple of points to note about
what we've done here. First, like the sys.dm_db_index_usage_stats DMV,
these DMVs are cleared with each instance restart, so any decision based
on their content should be made after the instance has been up and
running for a period of time that covers the full range of activity.
Second, the value of the sys.dm_db_missing_ index_group_stats DMV is
that it accumulates statistics on the potential value of a missing index
after repeat misses. In our case, we've missed the index only once
since the instance was started, so the figures shown for this DMV are
for a single miss. So with these points in mind, let's examine the
output from the DMVs, shown in figure 4.
First, you'll note that the equality column for
sys.dm_db_missing_index_details is empty. This is because there are no
equality predicates in our query, for example, where x=10. We used like,
so this appears as an inequality column. Second, note that
sys.dm_db_missing_index_columns is a function that takes an index_handle
as an input and returns index columns in a tabular fashion along with
the column usage (equality, inequality, include). In contrast,
sys.dm_db_missing_index_details returns one row per missing index.
The real value of these DMVs is when the
instance has been up and running for a period of time. This allows us to
take advantage of the accumulated statistics from the
sys.dm_db_missing_index_group_stats DMV.
avg_total_user_cost—This column represents the reduction in query cost if the suggested index was present.
avg_user_impact—This represents the estimated percentage improvement of the query with the index present.
user_seeks and user_scans—Each
time the query is executed without the suggested index, these values
increment to represent the number of times the index could have been
used in either a seek or scan capacity.
When selecting from the
sys.dm_db_missing_index_group_stats DMV, we can order by the above
columns to show the indexes with the greatest potential for improvement.
To simulate this, we can rerun our SalesOrderHeader query multiple times and then query the DMV ordering by the above columns. Listing 1 shows the code for this. Note the GO 10 command after the SalesOrderHeader
query, used to execute the query 10 times. In addition to this query,
we execute another query on the DatabaseLog table, which also generates a
missing index event. This allows us to view multiple missing indexes
ordered by potential value.
Example 1. Simulating activity to view missing indexes by priority
-- Simulate a weighted query workload by using GO 10 for 1 statement
SELECT OrderDate, CustomerID, SubTotal, SalesPersonID
FROM Sales.SalesOrderHeader
WHERE PurchaseOrderNumber like 'PO166%'
GO 10
SELECT DatabaseUser, TSQL
FROM dbo.DatabaseLog
WHERE Event = 'ALTER_TABLE'
GO
|
Listing 2 shows the code to view the missing index information generated by the above two queries, weighted by potential value.
Example 2. Missing index DMV query with potential weighting
-- View missing indexes Weighted by potential value
SELECT
sc.name + '.' + OBJECT_NAME(details.object_id)
, details.equality_columns
, details.inequality_columns
, details.included_columns
, stats.avg_total_user_cost
, stats.user_seeks
, stats.avg_user_impact
, stats.avg_total_user_cost * stats.avg_user_impact
* (stats.user_seeks + stats.user_scans) as potential
FROM sys.dm_db_missing_index_group_stats stats
INNER JOIN sys.dm_db_missing_index_groups groups
ON stats.group_handle = groups.index_group_handle
INNER JOIN sys.dm_db_missing_index_details details
ON details.index_handle = groups.index_handle
INNER JOIN sys.objects o
ON o.object_id = details.object_id
INNER JOIN sys.schemas sc
ON o.schema_id = sc.schema_id
ORDER BY potential desc
|
The calculated potential column takes into
account the columns from the sys.dm_db_missing_index_group_stats that we
covered earlier. By executing the query on the SalesOrderHeader table
10 times (using the GO 10 command), the potential value of the
missing index on this table is increased by virtue of the user_seeks
column, together with the avg_total_user_cost and avg_user_impact
columns. The output from the query in listing 6 is shown in figure 5.
The potential value for the missing indexes is
calculated by multiplying the sum of seeks/scans against the user cost
and impact columns. Because the SalesOrderHeaderquery was
executed 11 times, its potential weighting is greater compared to a
single execution of the query on the DatabaseLog table.
The full value of the missing index DMVs can be
exploited as part of a regular maintenance routine. For example, weekly
checks on index usage to remove duplicate/unused indexes could be
combined with checks on the potential for missing indexes using a query
similar to the previous one.
Despite the power of the missing index DMVs,
there are some limitations. The
major ones are as follows:
A maximum of 500 missing indexes will be kept.
The DMVs will be cleared when the SQL instance is restarted or the table definition changes.
While
the missing indexes report the columns to include (key and included
columns), they don't suggest a column order for the index.
Certain index types, including filtered indexes, clustered indexes, and indexed views, are not suggested.
The Database Engine Tuning Advisor, covered next, addresses some of these shortcomings.
Database Engine Tuning Advisor
The Database Engine Tuning Advisor,
accessible in the Performance Tools folder of the Microsoft SQL Server
2008 program group, analyzes workload from either a T-SQL file or SQL
Trace file/table. T-SQL files are typically used to analyze a limited
set of commands, whereas SQL traces, are used to analyze workload over a period of time, for example, a 24-hour period covering typical production activity.
Once the workload input has been specified,
options can be configured to limit the scope of recommendations that the
tuning advisor will consider. Figure 6 shows the broad range of options available for consideration.
As shown in figure 6,
the analysis options are far greater than what can be achieved using
the missing index DMVs that we covered earlier. Further, the analysis
can be based on a wide range of production activity captured in a trace
file (or table) for future analysis.
Once the workload input has been selected and a
target database chosen, analysis can begin. When the analysis is
complete, recommendations can be viewed, an example of which is shown in
figure 7
One of the clear advantages of using the tuning
advisor over other index analysis methods is the ease with which the
recommendations can be turned into actual T-SQL code for implementation.
As shown in figure 7,
you can click the Definition column to display the appropriate T-SQL
code to effect the recommendation. Further, the tool can be used for
recommendations on indexes to drop as well as those to add.
As per the recommendations from the missing
index DMVs, you should carefully consider indexes suggested by the
tuning advisor before implementing them. Among others, considerations
include these: