4. Indexing
Creating the right indexes
can be complex; it can take a long time to fully understand indexing and
fine-tune database queries. One of the most important things to
remember with indexing is that its primary purpose is to help SQL Azure
find the data it needs quickly.
Indexes are like smaller
tables that contain a subset of the primary table. The tradeoff is that
indexes consume space and must be maintained by the SQL Azure engine as
the primary data changes, which can impact performance under certain
scenarios.
Let's quickly review a simplified syntax to create an index:
CREATE INDEX [index_name] ON [table_name]
(col1, col2...)
INCLUDE (col3, col4...)
Although creating an
index is simple, it's important to verify that it's being used by SQL
Azure and that it has the desired effect. To continue the previous
example (from Figure 2),
you can create an index on the TestUsers table. But first, let's back
up and review a few things about the table and the statement you wish to
optimize.
It's important to realize that the columns included in the first part of
the index are used as a key when searching for data in the index and
when joining tables. And because it's acceptable to have multiple
columns as part of the key, their order is absolutely critical! At
times, you pick the columns that are included in the WHERE clause first and then those that are part of the JOIN. You may find, however, that sometimes it's best to start with the JOIN columns; this depends on your statement. Next come the columns in the INCLUDE section of the CREATE INDEX
command; these columns are here for only one reason: they help avoid a
lookup into the primary table for data that is needed by the SELECT
clause. Performing lookups isn't always a performance issue, but it can
become an issue if you have a lookup operation in a large batch of
records.
NOTE
An index that contains all the columns needed to serve a query is called a covering index.
If you dissect the previous SELECT query, you obtain the columns in the following list (in order of placement in the index) that belong to the TestUsers table:
WHERE. Contains the AgeGroup field from TestUsers.
JOIN. Contains the UserType field from TestUsers.
SELECT. Contains the Name and UserType fields. The UserType column is already part of the JOIN clause; there is no need to count it twice.
Let's begin by creating an index that looks like this:
CREATE INDEX idx_testusers_001 ON TestUsers
(AgeGroup, UserType)
INCLUDE (Name)
Running the statement now yields the execution plan shown in Figure 5. This is a better plan than the one in Figure 12-2
because SQL Azure is performing an index seek instead of an index scan.
Using an index seek means SQL Azure is able to locate the first record
needed by the statement very quickly. However, you still have an index
scan on the TestUserType table. Let's fix that.
To remove the index scan on
TestUserType, you add another index following the previous guidelines.
Because no columns are needed in the SELECT clause, you can leave the INCLUDE section alone. Here's the index:
CREATE INDEX idx_testusertype_001 ON TestUserType
(UserTypeKey, UserType)
NOTE
Although it's minimal, there
is a risk associated with adding new indexes in production systems.
Certain routines, especially batch programs, typically depend on data
being properly ordered to calculate running sums or carry out specific
tasks. If an index is favored over another to run a query, it's possible
for the new index to be used, which has the potential to change the
order in which the data is returned. If all your statements include an ORDER BY clause, this problem won't affect you. But if some of your programs rely on the natural order of records, beware!
Your execution plan should now look like that in Figure 6.
Notice that the physical operator has been changed to a loop. Also
notice that the cost of the query has shifted away from the JOIN operator: the highest relative cost (76%) of the plan is spent reading the data from the TestUserType index.
But the tuning exercise
isn't over just yet. If you hover your cursor on the TestUserType_001
index, you see that the loop performed a lookup (index seek) on that
index 50 times (look for Number of Executions in Figure 7)! This isn't great, but it's probably better than without the index, because SQL Azure picked this new execution plan.
To reduce the number of lookups, you can change the order of the fields by creating a new index. Let's run this statement:
CREATE INDEX idx_testusers_002 ON TestUsers
(UserType, AgeGroup) INCLUDE (Name)
NOTE
Instead of creating a new
index, you could change the one previously created. However, when
troubleshooting database performance issues, it's important to see how
the SQL Azure query engine behaves; SQL Azure chooses the index it deems
the most effective. So go ahead—create as many indexes as you want
until you have a good execution plan, and then clean up the unnecessary
indexes when you're finished.
If you run the statement again, you see a result close to Figure 8.
The execution plan is now well balanced. The data search is virtually
equal on both tables (49%), with seek operations on both; the number of
executions is 1 for both; and the cost of the loop operation is minimal
(2%) because there is no real looping taking place.
If you pay attention to the
Actual Number of Rows value, you see that this statement fetches only 25
records from the underlying table, instead of 50; this reduces disk
reads.
Last but not least, if you were to look at the SELECT
statement from a design standpoint, you could ask whether the
UserTypeKey value should be unique. The table schema doesn't carry a
unique index on that field, but should it? Can there be only one Manager
user type? If the answer is yes, then you know the TestUserType table
will always return a single record for a Manager user type, in which
case you may be able to remove the JOIN entirely and apply the WHERE clause on the TestUsers table directly:
SELECT T.Name, T.UserType
FROM TestUsers T
WHERE T.AgeGroup > 0 AND T.UserType = 1
Not only is this
statement much simpler, but the execution plan becomes trivial, meaning
that SQL Azure can serve this request without spending much time
optimizing it. This change means you're moving the filter from a table
with few records (only 3 in TestUserType) to a table with many records
(100 in TestUsers). And whenever you have the option to make such a
move, you should. SQL Azure spends far fewer resources this way. Of
course, such a move isn't always possible, but you need to know the
importance of having a proper database design before you begin tuning.
NOTE
Performance tuning can be fun. However, you may end up tuning forever if you don't set yourself performance objectives.