SQL Server 2008 : Index design (part 2) - Improving nonclustered index efficiency

2/17/2013 6:45:21 PM

2. Improving nonclustered index efficiency

The accumulated cost of random I/O involved in key/RID lookups often leads to nonclustered indexes being ignored in favor of sequential I/O with clustered index scans. To illustrate this and explore options for avoiding the key lookup process, let's walk through a number of examples using the Person. Contact table in the sample AdventureWorks database. In demonstrating how SQL Server uses different indexes for different queries, we'll view the graphical execution plans, which use different icons, as shown in figure 1, to represent different actions (lookups, scans, seeks, and so forth).

Seek vs. scan

Several important terms are used when discussing index usage. An index seek is used when the query optimizer chooses to navigate through the levels of a clustered or nonclustered index B-tree to quickly reach the appropriate leaf level pages. In contrast, an index scan, as the name suggests, scans the leaf level, left to right, one page at a time.

Figure 1. Common icons used in graphical execution plans

The Person.Contact table, as defined below (abbreviated table definition), contains approximately 20,000 rows. For the purposes of this test, we'll create a nonunique, nonclustered index on the LastName column:

-- Create a contact table with a nonclustered index on LastName
CREATE TABLE [Person].[Contact](
   , [Title] [nvarchar](8) NULL
   , [FirstName] [dbo].[Name] NOT NULL
   , [LastName] [dbo].[Name] NOT NULL
   , [EmailAddress] [nvarchar](50) NULL

CREATE NONCLUSTERED INDEX [ixContactLastName] ON [Person].[Contact]
   ([LastName] ASC)


For our first example, let's run a query to return all contacts with a LastName starting with C:

-- Statistics indicate too many rows for an index lookup
FROM Person.Contact
WHERE LastName like 'C%'

Despite the presence of a nonclustered index on LastName, which in theory could be used for this query, SQL Server correctly ignores it in favor of a clustered index scan. If we execute this query in SQL Server Management Studio using the Include Actual Execution Plan option (Ctrl+M, or select from the Query menu), we can see the graphical representation of the query execution, as shown in figure 2.

Figure 2. A clustered index scan is favored for this query in place of a nonclustered index seek plus key lookup.

No great surprises here; SQL Server is performing a clustered index scan to retrieve the results. Using an index hint, let's rerun this query and force SQL Server to use the ixContactLastName index:

-- Force the index lookup with an index hint
FROM Person.Contact WITH (index=ixContactLastName)
WHERE LastName like 'C%'

Looking at the graphical execution plan, we can confirm that the index is being used, as per figure 3.

Figure 3. Adding an index hint to the previous query results in an index seek plus key lookup.

On a small database such as AdventureWorks, the performance difference between these two methods is negligible; both complete in under a second. To better understand how much slower the index lookup method is, we can use the SET STATISTICS IO option, which returns disk usage statistics[] alongside the query results. Consider the script in listing 12.

[] Not to be confused with index statistics, query statistics refer to disk usage, such as the number of pages read from buffer or physical disk reads.

Example 1. Comparing query execution methods
-- Compare the Disk I/O with and without an index lookup


FROM Person.Contact
WHERE LastName like 'C%'


FROM Person.Contact with (index=ixContactLastName)
WHERE LastName like 'C%'

This script will run the query with and without the index hint. Before each query, we'll clear the buffer cache using DBCC DROPCLEANBUFFERS to eliminate the memory cache effects. The STATISTICS IO option will produce, for each query, the number of logical, physical, and read-ahead pages, defined as follows:

  • Logical Reads—Represents the number of pages read from the data cache.

  • Physical Reads—If the required page is not in cache, it will be read from disk. It follows that this value will be the same or less than the Logical Reads counter.

    Figure 4. Forcing a nonclustered index seek plus key lookup significantly increases the number of pages read.
  • Read Ahead Reads—The SQL Server storage engine uses a performance optimization technique called Read Ahead, which anticipates a query's future page needs and prefetches those pages from disk into the data cache. In doing so, the pages are available in cache when required, avoiding the need for the query to wait on future physical page reads.

So with these definitions in mind, let's look at the STATISTICS IO output in figure 4.

These statistics make for some very interesting reading. Note the big increase in logical reads (3326 versus 569) for the second query, which contains the (index= ixContactLastName) hint. Why such a big increase? A quick check of sys.dm_ db_index_physical_stats, reveals there are only 570 pages in the table/clustered index. This is consistent with the statistics from the query that used the clustered index scan. So how can the query using the nonclustered index read so many more pages? The answer lies in the key lookup.

What's actually occurring here is that a number of clustered index pages are being read more than once. In addition to reading the nonclustered index pages for matching records, each key lookup reads pages from the clustered index to compete the query. In this case, a number of the key lookups are rereading the same clustered index page. Clearly a single clustered index scan is more efficient, and SQL Server was right to ignore the nonclustered index.

Let's move on to look at an example where SQL Server uses the nonclustered index without any index hints:

FROM Person.Contact
WHERE LastName like 'Carter%'

The graphical execution plan for this query is shown in figure 5, and it confirms the index is being used.

We can see that of the overall query cost, 98 percent is the key lookup. Eliminating this step will derive a further performance increase. You'll note that in our queries so far we've been using select *; what if we reduced the required columns for the query to only those actually required and included them in the index? Such an index is called a covering index.

Figure 5. This particular query uses the nonclustered index without any hints. Note the major cost of the query is the key lookup at 98 percent.

Covering indexes

Let's assume we actually need only FirstName, LastName, and EmailAddress. If we created a composite index containing these three columns, the key lookup wouldn't be required. Let's modify the index to include the columns and rerun the query:

-- Create a covering index
DROP INDEX [ixContactLastName] ON [Person].[Contact]
CREATE NONCLUSTERED INDEX [ixContactLastName] ON [Person].[Contact]
   [LastName] ASC
   , [FirstName] ASC
   , [EmailAddress] ASC

SELECT LastName, FirstName, EmailAddress
FROM Person.Contact
WHERE LastName LIKE 'Carter%'

The execution plan from the query with the new index is shown in figure 6.

As you can see, the query is now satisfied from the contents of the nonclustered index alone. No key lookups are necessary, as all of the required columns are contained in the nonclustered index. In some ways, this index can be considered a mini, alternatively clustered version of the table.

Figure 6. Covering the index eliminates the key lookup, significantly improving query performance.

Figure 7. By listing the required columns in the select clause and including them in the nonclustered index, the key lookups are eliminated, with logical reads dropping from 279 to 5.

Confirming the improvement from a disk-statistics perspective, the logical reads drop significantly, from 279 to 5, as shown in figure 7.

Including additional columns in the nonclustered index to avoid the key lookup process makes it a covering index. While this is an excellent performance-tuning technique, the one downside is that the additional columns are included at all levels of the index (root, all intermediate levels, and the leaf level). In our query above, given that we're not using the additional columns as predicates, that is, where clause conditions, they're not required at any level of the index other than the leaf level to avoid the key lookup. In small indexes, this is not really an issue. However, for very large indexes, the additional space taken up by the additional columns at each index level not only increases the index size but makes the index seek process less efficient. The included columns feature, introduced in SQL Server 2005, enhances covering indexes in several ways.

Included columns

While they're a relatively simple and very effective performance-tuning mechanism, covering indexes are not without their limitations; there can be a maximum of 16 columns in a composite index with a maximum combined size of 900 bytes. Further, columns of certain data types, including n/varchar(max), n/varbinary(max), n/text, XML, and image cannot be specified as index key columns.

Recognizing the value of covering indexes, SQL Server 2005 and above circumvent the size and data type limitations through indexes with included columns. Such indexes allow additional columns to be added to the leaf level of nonclustered indexes. In doing so, the additional columns are not counted in the 16 column and 900 byte maximum, and additional data types are allowed for these columns (n/varchar(max), n/varbinary(max), and XML). Consider the following create index statement:

-- Include columns at the leaf level of the index
ON Person.Contact (LastName)
INCLUDE (FirstName, EmailAddress)

Notice the additional INCLUDE clause at the end of the statement; this index will offer all the benefits of the previous covering index. Further, if appropriate, we could add columns with data types not supported in traditional indexes, and we wouldn't be restricted by the 16-column maximum.

When deciding whether to place a column in the index definition as a key column or as an included column, the determining factor is whether the column will be used as a predicate, that is, a search condition in the where clause of a query. If a column is added purely to avoid the key lookup because of inclusion in the select list, then it makes sense for it to be an included column. Alternatively, a column used for filtering/searching purposes should be included as a key column in the index definition.

Let's take our previous example of a surname search. If a common search condition was on the combination of surname and first name, then it would make sense for both columns to be included in the index as key columns for more efficient lookups when seeking through the intermediate index levels. If the email address column is used purely as return information, that is, in the query's select list, but not as a predicate (where clause condition), then it makes sense for it to be an included column. Such an index definition would look like this:

ON Person.Contact (LastName, FirstName)
INCLUDE (EmailAddress)

In summary, included column indexes retain the power of covering indexes while minimizing the index size and therefore maximizing lookup efficiency.

Before closing our section on nonclustered index design, let's spend some time covering an important new indexing feature included in SQL Server 2008: filtered indexes.

Filtered indexes

Filtered indexes are one of my favorite new features in SQL Server 2008. Before we investigate their many advantages, consider the following table used to store customer details, including a country code:

CREATE TABLE [Person].[Customer](

, [Title] [nvarchar](8) NULL
   , [FirstName] [nvarchar](100) NOT NULL
   , [LastName] [nvarchar](100) NOT NULL
   , [EmailAddress] [nvarchar](100) NULL
   , [CountryCode] char(2) NULL

Let's imagine this table is part of a database used around the globe on a 24/7 basis. The Customer table is used predominantly by a follow-the-sun call center, where customer details are accessed by call center staff from the same country or region as the calling customers.

Creating a nonclustered index on this table similar to the one earlier in the article where we included FirstName, LastName, and EmailAddress will enable lookups on customer name to return the required details. If this was a very large table, the size of the corresponding nonclustered indexes would also be large. Maintaining large indexes that are in use 24/7 presents some interesting challenges.

In our example here, a traditional (full table) index would be created similar to what we've already seen earlier in the article ; columns would be defined as key or included index columns, ideally as part of a covering index. All is fine so far, but wouldn't it be good if we could have separate versions of the index for specific countries? That would enable, for example, the Australian version of the index to be rebuilt when it's midnight in Australia and few, if any, Australian users are being accessed. Such an index design would reduce the impact on sections of users that are unlikely to be accessed at the time of the index maintenance.

Consider the following two index-creation statements:

-- Create 2 filtered indexes on the Customer table
ON Person.Customer (LastName, FirstName)
INCLUDE (EmailAddress)
WHERE CountryCode = 'AU'

ON Person.Customer (LastName, FirstName)
INCLUDE (EmailAddress)
WHERE CountryCode = 'UK'

The indexes we've created here are similar to ones from earlier in the article with one notable exception: they have a predicate (where clause filter) as part of their definition. When a search is performed using a matching predicate and index keys, the query optimizer will consider using the index, subject to the usual considerations. For example, the ixCustomerAustralia index could be used for a query that includes the CountryCode = 'AU' predicate such as this:

SELECT FirstName, LastName, EmailAddress
FROM Person.Customer

LastName = 'Colledge'
   AND FirstName like 'Rod%'
   AND CountryCode = 'AU'

Such indexes, known as filtered indexes, enable a whole range of benefits. Let's cover the major ones:

  • Segregated maintenance—As we've discussed, creating multiple smaller versions of a single larger index enables maintenance routines such as index rebuilds to be scheduled in isolation from other versions of the index that may be receiving heavy usage.

  • Smaller, faster indexes—Filtering an index makes it smaller and therefore faster. Best of all, covered filtered indexes support optimized lookups for specialized purposes. Consider a very large product table with a ProductCategory column; filtered indexes could be created for product categories, which include the appropriate columns specific to that category. When combined with application logic, such indexes enable fast, optimized lookups for sections of data within a table.

  • Creating unique indexes on nullable columns— Consider the Social Security number (SSN) column ; to support storing records for non-U.S. residents, we couldn't define the column as NOT NULL. This would mean that a percentage of the records would have a NULL SSN, but those that do have one should be unique. By creating a filtered unique nonclustered index, we can achieve both of these goals by defining the index with a WHERE SSN IS NOT NULL predicate.

  • More accurate statistics—Unless created with the FULLSCAN option , statistics work by sampling a subset of the index. In a filtered index, all of the sampled statistics are specific to the filter; therefore they are more accurate compared to an index that keeps statistics on all table data, some of which may never be used for index lookups.

  • Lower storage costs—The ability to exclude unwanted data from indexes enables the size, and therefore storage costs, to be reduced.

Some of the advantages of filtered indexes could be achieved in earlier versions of SQL Server using indexed views. While similar, there are important differences and restrictions to be aware of when choosing one method over the other.
Most View
Full-ATX Motherboard Gigabyte G1-SNIPER A88X
Get To Know Office 365
Synology DS213air - A Great Home-Network Storage Server
Buyer’s Guide: e-Readers That Fits Your Needs Best (Part 1)
PC Hardware Buyer's Guide - Silent Graphics Card (Part 2) : SapphireRadeon HD 6450 FleX, Asus GeForce GT 620 DirectCU Silent, HIS Radeon HD 5450 Silence
ZTE Grand X - Big And Bold
Understand Telephoto Lenses
SMBs & Cloud-Based Backups
Five Top Filters (Part 1)
SteelSeries Flux Luxury Edition Gaming Headphones
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 1)

- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 2)

- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 3)
Popular Tags
Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8 BlackBerry Android Ipad Iphone iOS
Top 10
3 Tips for Maintaining Your Cell Phone Battery (part 2) - Discharge Smart, Use Smart
3 Tips for Maintaining Your Cell Phone Battery (part 1) - Charge Smart
OPEL MERIVA : Making a grand entrance
FORD MONDEO 2.0 ECOBOOST : Modern Mondeo
BMW 650i COUPE : Sexy retooling of BMW's 6-series
BMW 120d; M135i - Finely tuned
PHP Tutorials : Storing Images in MySQL with PHP (part 2) - Creating the HTML, Inserting the Image into MySQL
PHP Tutorials : Storing Images in MySQL with PHP (part 1) - Why store binary files in MySQL using PHP?
Java Tutorials : Nested For Loop (part 2) - Program to create a Two-Dimensional Array
Java Tutorials : Nested For Loop (part 1)