SQL Server 2008 : Index analysis (part 1) - Identifying indexes to drop/disable

2/20/2013 8:37:21 PM

An unfortunate, all-too-common indexing approach is to carpet bomb the database with indexes in the hope that performance will (eventually) improve. Not only will such an approach fail, but it usually ends in tears with the accumulated performance and maintenance costs of unnecessary indexes eventually having a paralyzing effect. Fortunately, there is a much more effective approach, made all the easier using several index-related Dynamic Management Views (DMVs).

In this section, we'll approach index analysis from three angles, identifying indexes to drop, add, and defragment. Let's begin with a look at using DMVs to identify indexes that are candidates for removal.

Dynamic Management Views

Dynamic Management Views return server state information useful in diagnosing server health and tuning performance. The values returned by some DMVs, including the ones in this section, are reset when the SQL Server instance restarts. Before making any decisions that are based on DMV results, consider when the instance was last restarted to ensure the DMV results are representative of the full range of database access patterns, for example, daily, weekly, and monthly reports. Taking a database offline, or using the AUTOCLOSE option, will also reset certain DMV values. Links to various DMV scripts and further details are available at http://www.sqlCrunch.com/index.

1. Identifying indexes to drop/disable

Indexes that are either not used or used infrequently not only consume additional space, but they also lengthen maintenance routines and slow performance, given the need to keep them updated in line with the base table data. As an example, consider an index on a FirstName column in a very large customer table and a query such as select * from customers where FirstName = 'John'. A lack of understanding as to why SQL Server is unlikely to use such an index may cause a junior DBA to create it in the hope that it may improve performance. Usually, such indexes are left in place without any follow-up analysis as to whether or not the index is being used.

Before we look at techniques for removing indexes, let's cover a very important DMV, sys.dm_db_index_usage_stats, to help us in this task.

SYS.DM_DB_INDEX_USAGE_STATS

As the name implies, the sys.dm_db_index_usage_stats DMV returns information on how indexes are being used. For each index, counts are kept on the number of times the index has been scanned, updated, and used for lookup or seek purposes, since the SQL instance was last restarted.

Let's cover the important ones:

user_seeks—Each time an index is used for seek purposes, that is, navigating through levels of its B-tree, this counter is incremented. A high value here usually represents an effective index.
user_scans—When a index is scanned at the leaf level (as opposed to seeking through the B-tree), this counter is incremented.
user_lookups—Each time a nonclustered index is used for a lookup into a heap or clustered index, this counter is incremented.
user_updates—Insert, update, and delete operations on a table must maintain the appropriate indexes. Every insert and delete operation will have a corresponding action for each nonclustered index, with updates effecting certain indexes, depending on the columns that have changed. Each time an index is maintained for any of these actions, this counter is incremented.

In addition to these columns, sys.dm_db_index_usage_stats returns many others, including the last date and time for each of the actions covered previously. The major value of this DMV is using it to identify indexes that are rarely used but have a high maintenance overhead. Such indexes are candidates for removal, and we'll walk through a script that uses this DMV shortly.

Unused indexes

Like many other DMVs, sys.dm_db_index_usage_stats holds index usage stats only since the SQL instance was last started. One of the implications that can be drawn from this is that any index that does not appear in this list has not been used since the instance was started. If the instance has been online for long enough to cover the full range of access patterns—for example, daily, weekly and monthly reports—and the database is not using the AUTOCLOSE option (or has been taken offline), then an index not appearing in this DMV is unlikely to be used at all.

The script in listing 1 uses sys.dm_db_index_usage_stats along with a number of system tables to return indexes not used since the last instance restart.

Example 1. Indexes not used since last instance restart

-- Identify unused indexes (since last restart)
SELECT
    sc.name + '.' + object_name(i.object_id) as objectName
    , i.name as indexName
    , i.type_desc as indexType
FROM sys.indexes i
    INNER JOIN sys.objects o on o.object_id = i.object_id
    INNER JOIN sys.schemas sc on o.schema_id = sc.schema_id
WHERE
    objectproperty(i.object_id,'IsUserTable') = 1
    AND i.index_id not in (
        SELECT s.index_id
        FROM sys.dm_db_index_usage_stats s
        WHERE
            s.object_id = i.object_id
            AND i.index_id = s.index_id
            AND database_id = db_id()
    )
ORDER BY objectName, indexName ASC

This script uses sys.indexes, sys.objects, and sys.schemas to return index information followed by a simple not in clause to exclude those indexes in the DMV. The end result includes indexes not used since the SQL instance last started.

We also need to identify indexes whose maintenance cost outweighs their value.

High-maintenance/low-value indexes

In our next script, shown in listing 2, let's use the count values returned from sys.dm_db_index_usage_stats to compare the update count to the use count, that is, their maintenance-versus-usage ratio.

Example 2. High-maintenance/low-value indexes

-- Identify high maintenance indexes with low usage
SELECT
    sc.name + '.' + object_name(i.object_id) as objectName
    , i.name as indexName
    , user_seeks
    , user_scans
    , user_lookups
    , (user_seeks + user_scans + user_lookups) as indexReads
    , user_updates as indexWrites
    , user_updates - (user_seeks + user_scans + user_lookups) as usageDiff
FROM sys.dm_db_index_usage_stats s
    INNER JOIN sys.indexes i on i.index_id = s.index_id
    INNER JOIN sys.objects o on o.object_id = i.object_id
    INNER JOIN sys.schemas sc on o.schema_id = sc.schema_id
WHERE
    database_id = db_id()
    AND objectproperty(s.object_id,'IsUserTable') = 1
    AND i.object_id = s.object_id
    AND i.type_desc = 'NONCLUSTERED'
ORDER BY
    usageDiff DESC

By ordering the results descending on the usageDiff column, this script identifies indexes with the biggest differential between their read and write counts. In the extreme cases where indexReads is zero, the index is being maintained for no read benefits at all. Note that the previous script filters for nonclustered indexes. We can remove this condition to display information on the base table/clustered index as well, including the last accessed stats to display when a table was last used in any capacity.

The result of this script, run against the AdventureWorks database, is shown in figure 1. Note that negative values for usageDiff represent cases where the index has been used for read purposes more than for updates.

Figure 1. Large differences between the update and usage counts are indicative of high-maintenance/low-value indexes.

It's also possible for some indexes, while valuable and frequently used, to be duplicated or overlapping.

Duplicate and overlapping indexes

A duplicate index, that is, an index with the same columns, defined in the same order, presents a very easy and clear case for removal. In a similar manner, an index that shares the same leading columns can also be considered a duplicate of another index. For example, Index B on lastName, firstName is really a duplicate of Index A on lastName, firstName, dateOfBirth.

Before dropping duplicate indexes, you should take care to ensure that no existing T-SQL or stored procedure code uses the index(es) as part of an index hint. Dropping such indexes without updating referencing code will obviously result in errors when the (now nonexistent) indexes are referenced.

SQL Server 2008 : Index analysis (part 3) - Identifying index fragmentation

SQL Server 2008 : Index analysis (part 2) - Identifying indexes to add

Other

ADO.NET Programming : Microsoft SQL Server CE (part 5) - Querying Schema Information

ADO.NET Programming : Microsoft SQL Server CE (part 4) - Updating a SQL Server CE Database, The SqlCeDataAdapter Class

ADO.NET Programming : Microsoft SQL Server CE (part 3) - Retrieving and Displaying Data

ADO.NET Programming : Microsoft SQL Server CE (part 2) - SQL Server CE Query Analyzer, Creating a SQL Server CE Database, Populating a SQL Server CE Database

ADO.NET Programming : Microsoft SQL Server CE (part 1) - SQL Server CE Files, SQL Server CE Syntax

SQL Server 2008 : Index design (part 3) - Indexed views

SQL Server 2008 : Index design (part 2) - Improving nonclustered index efficiency

SQL Server 2008 : Index design (part 1) - Selecting a clustered index

SQL Server 2008 : Index design and maintenance - An introduction to indexes

USB Hard Drive Hangout (Part 2) - Toshiba Canvio plus 3.0 1.5TB, Adata Dashdrive Elite 500GB HE720