SQL Azure : Tuning Techniques (part 3) - Indexing

2/12/2011 4:13:16 PM

4. Indexing

Creating the right indexes can be complex; it can take a long time to fully understand indexing and fine-tune database queries. One of the most important things to remember with indexing is that its primary purpose is to help SQL Azure find the data it needs quickly.

Indexes are like smaller tables that contain a subset of the primary table. The tradeoff is that indexes consume space and must be maintained by the SQL Azure engine as the primary data changes, which can impact performance under certain scenarios.

Let's quickly review a simplified syntax to create an index:

CREATE INDEX [index_name] ON [table_name]

(col1, col2...)
INCLUDE (col3, col4...)

Although creating an index is simple, it's important to verify that it's being used by SQL Azure and that it has the desired effect. To continue the previous example (from Figure 2 ), you can create an index on the TestUsers table. But first, let's back up and review a few things about the table and the statement you wish to optimize.

It's important to realize that the columns included in the first part of the index are used as a key when searching for data in the index and when joining tables. And because it's acceptable to have multiple columns as part of the key, their order is absolutely critical! At times, you pick the columns that are included in the WHERE clause first and then those that are part of the JOIN. You may find, however, that sometimes it's best to start with the JOIN columns; this depends on your statement. Next come the columns in the INCLUDE section of the CREATE INDEX command; these columns are here for only one reason: they help avoid a lookup into the primary table for data that is needed by the SELECT clause. Performing lookups isn't always a performance issue, but it can become an issue if you have a lookup operation in a large batch of records.

NOTE

An index that contains all the columns needed to serve a query is called a covering index.

If you dissect the previous SELECT query, you obtain the columns in the following list (in order of placement in the index) that belong to the TestUsers table:

WHERE. Contains the AgeGroup field from TestUsers.
JOIN. Contains the UserType field from TestUsers.
SELECT. Contains the Name and UserType fields. The UserType column is already part of the JOIN clause; there is no need to count it twice.

Let's begin by creating an index that looks like this:

CREATE INDEX idx_testusers_001 ON TestUsers
   (AgeGroup, UserType)
INCLUDE (Name)

Running the statement now yields the execution plan shown in Figure 5 . This is a better plan than the one in Figure 12-2 because SQL Azure is performing an index seek instead of an index scan. Using an index seek means SQL Azure is able to locate the first record needed by the statement very quickly. However, you still have an index scan on the TestUserType table. Let's fix that.

Figure 5. Effect of adding an index on the TestUsers table

To remove the index scan on TestUserType, you add another index following the previous guidelines. Because no columns are needed in the SELECT clause, you can leave the INCLUDE section alone. Here's the index:

CREATE INDEX idx_testusertype_001 ON TestUserType
(UserTypeKey, UserType)

NOTE

Although it's minimal, there is a risk associated with adding new indexes in production systems. Certain routines, especially batch programs, typically depend on data being properly ordered to calculate running sums or carry out specific tasks. If an index is favored over another to run a query, it's possible for the new index to be used, which has the potential to change the order in which the data is returned. If all your statements include an ORDER BY clause, this problem won't affect you. But if some of your programs rely on the natural order of records, beware!

Your execution plan should now look like that in Figure 6 . Notice that the physical operator has been changed to a loop. Also notice that the cost of the query has shifted away from the JOIN operator: the highest relative cost (76%) of the plan is spent reading the data from the TestUserType index.

Figure 6. Effect of indexing on the physical operators

But the tuning exercise isn't over just yet. If you hover your cursor on the TestUserType_001 index, you see that the loop performed a lookup (index seek) on that index 50 times (look for Number of Executions in Figure 7 )! This isn't great, but it's probably better than without the index, because SQL Azure picked this new execution plan.

To reduce the number of lookups, you can change the order of the fields by creating a new index. Let's run this statement:

CREATE INDEX idx_testusers_002 ON TestUsers
       (UserType, AgeGroup) INCLUDE (Name)

Figure 7. Number of executions of an operation

NOTE

Instead of creating a new index, you could change the one previously created. However, when troubleshooting database performance issues, it's important to see how the SQL Azure query engine behaves; SQL Azure chooses the index it deems the most effective. So go ahead—create as many indexes as you want until you have a good execution plan, and then clean up the unnecessary indexes when you're finished.

If you run the statement again, you see a result close to Figure 8 . The execution plan is now well balanced. The data search is virtually equal on both tables (49%), with seek operations on both; the number of executions is 1 for both; and the cost of the loop operation is minimal (2%) because there is no real looping taking place.

Figure 8. Well-balanced execution plan

If you pay attention to the Actual Number of Rows value, you see that this statement fetches only 25 records from the underlying table, instead of 50; this reduces disk reads.

Last but not least, if you were to look at the SELECT statement from a design standpoint, you could ask whether the UserTypeKey value should be unique. The table schema doesn't carry a unique index on that field, but should it? Can there be only one Manager user type? If the answer is yes, then you know the TestUserType table will always return a single record for a Manager user type, in which case you may be able to remove the JOIN entirely and apply the WHERE clause on the TestUsers table directly:

SELECT T.Name, T.UserType
   FROM TestUsers T
   WHERE T.AgeGroup > 0 AND T.UserType = 1

Not only is this statement much simpler, but the execution plan becomes trivial, meaning that SQL Azure can serve this request without spending much time optimizing it. This change means you're moving the filter from a table with few records (only 3 in TestUserType) to a table with many records (100 in TestUsers). And whenever you have the option to make such a move, you should. SQL Azure spends far fewer resources this way. Of course, such a move isn't always possible, but you need to know the importance of having a proper database design before you begin tuning.

NOTE

Performance tuning can be fun. However, you may end up tuning forever if you don't set yourself performance objectives.

SQL Azure : Tuning Techniques (part 5) - Provider Statistics & Application Design

SQL Azure : Tuning Techniques (part 4) - Indexed Views & Stored Procedures

SQL Azure : Tuning Techniques (part 2) - Connection Pooling & Execution Plans

SQL Azure : Tuning Techniques (part 1) - Dynamic Management Views

Other

Synching an On-Premises Database with SQL Azure

SQL Server 2008 : Managing Query Performance - Finding Similar Queries

SQL Server 2008 : Correlating Profiler and the Performance Monitor

SQL Server 2008 : Explaining XML - XML Indexes

SQL Server 2008 : Explaining XML - XQuery and XPath

SQL Azure : Managing a Shard (part 2) - Working with Partial Shards

SQL Server 2008 : OPENXML, sp_xml_preparedocument, and sp_xml_removedocument

SQL Server 2008 : Retrieving and Transforming XML Data

SQL Azure: Building a Shard (part 4) - Updating and Deleting Records in the Shard & Adding Records to the Shard

SQL Azure: Building a Shard (part 3) - Reading Using the Shard & Caching