How you write T-SQL queries can often have an effect
on the performance of those queries. The following sections provide some
general guidelines to keep in mind to help ensure that you are getting
optimal performance from your queries.
UNION Versus UNION ALL Performance
You should use UNION ALL instead of UNION if there is no need to eliminate duplicate result rows from the result sets being combined with the UNION operator. The UNION statement has to combine the result sets into a worktable to remove any duplicate rows from the result set. UNION ALL simply concatenates the result sets together, without the overhead of putting them into a worktable to remove duplicate rows.
Use IF EXISTS Instead of SELECT COUNT(*)
You should use IF EXISTS instead of SELECT COUNT(*)
when checking only for the existence of any matching data values and
when determining the number of matching rows is not required. IF EXISTS stops the processing of the select query as soon as the first matching row is found, whereas SELECT COUNT(*) continues searching until all matches are found, wasting I/O and CPU cycles. For example, you could replace
if (SELECT count(*) FROM dbo.sales WHERE stor_id = '6380') > 0
with an IF EXISTS check similar to
if exists (SELECT * FROM dbo.sales WHERE stor_id = '6380')
Avoid Unnecessary ORDER BY or DISTINCT Clauses
When a T-SQL query contains an ORDER BY or DISTINCT
clause, a worktable is often required to process the final result of
the query if it cannot determine that the rows will already be retrieved
in the desired sort order or that a unique key in the result makes the
rows distinct. If a query requires a worktable, that adds extra overhead
and I/O to put the results into the worktable in tempdb
and do the sorting necessary to order the results or to eliminate
duplicate rows. This can result in extended processing time for the
query, which can delay the time it takes for the final result to be
returned to the client application.
If it is not absolutely necessary
for the rows returned to the application to be in a specific order (for
example, returning rows to a grid control where the contents can be re-sorted by any column in the grid control itself), you should leave off the ORDER BY clause in your queries.
Likewise, you should not arbitrarily include the DISTINCT clause in all your queries unless it is absolutely necessary to eliminate any duplicate rows from the result set.
Temp Tables Versus Table Variables Versus Common Table Expressions
SQL Server 2008 provides multiple options for working with temporary result sets in T-SQL code:
Temporary tables
Table variables
Derived tables
Common table expressions
One of the questions you
may consider is “Which method should I use and when?” Whether you use a
temporary table, table variable, derived table, or common table
expression depends, in part, on how often and for how long you intend to
use it. This section provides some general recommendations to consider.
You should use table
variables instead of temporary tables in stored procedures whenever
possible or feasible. Table variables are memory resident and do not
incur the I/O overhead and system table and I/O contention that can
occur in tempdb with normal temporary
tables. However, remember that table variables exist only for the
duration of the SQL batch or stored procedure in which they are defined.
In SQL Server 2005 and
later, you also have the option of using derived tables or common table
expressions in your queries to generate and hold intermediate result
sets that can be further processed by the main query. A derived table is
a subquery contained in a FROM
clause that can be referred to by an alias and used as a table in the
query. Derived tables and common table expressions can be thought of as
sort of dynamic views that exist only for the duration of the query.
Derived tables are handy if you don’t need to use a result set more than
once in multiple queries. You should consider using derived tables or
common table expressions when possible to completely avoid the use of
table variables or temporary tables, especially if the temporary table
or table variable is used only once by a single query.
You should generally
consider using temporary tables only when you need to share data between
an application and stored procedures or between stored procedures.
Also, if the temporary result set is going to be very large (that is,
larger than can be held in SQL Server cache memory), you should consider
storing it in a temporary table rather than a table variable.
Note
In SQL Server 2008, you can define table
data types, which makes it possible to pass table variables to stored
procedures as table parameters, so temp tables aren’t the only way to
share data between stored procedures. However, there are some
limitations: primarily, the contents of a table parameter passed to a
stored procedure are read only and cannot be modified within the stored
procedure. If you want to share data between stored procedures and have
the ability to add, remove, or modify rows in any of the stored
procedures, temporary tables are still the best solution.
If you need to use temporary tables, you can follow these general guidelines to help improve their performance:
- Select only the columns actually required by the subsequent SQL statements into the temp table (that is, avoid using select *). This helps reduce the size of the temp table, thereby reducing the number of writes to tempdb
and also speeding up access of the data within the temp table because
more rows will fit on a data page, reducing the number of data pages
that need to be accessed by the query.
- Select only the
rows needed by the subsequent queries, again to help limit the size of
the temp table and reduce the amount of I/O in tempdb.
- If
the temporary table will be accessed multiple times by queries using
search arguments (SARGs), consider creating an index on the temporary
table if it can be used to speed up the queries against the temp table
and reduce I/O. Of course, this option should be considered only if the
time and I/O saved by having an index on the temporary table
significantly exceeds the time and I/O required to create the index.
Avoid Unnecessary Function Executions
If you call a SQL Server function (for example, suser_name(), getdate())
repeatedly within a procedure or in T-SQL code, you should consider
using a local variable to hold the value returned by the function and
use the local variable repeatedly throughout your SQL statements rather
than repeatedly executing the SQL Server function. This saves CPU cycles
within your stored procedure and T-SQL code.
Cursors and Performance
In contrast to most other
programming languages, SQL is a set-based processing language. You
retrieve sets of rows, update sets of rows, and delete sets of rows. The
set of rows affected is determined by the search conditions specified
in the query. Unfortunately, most programmers are used to doing
record-oriented operations on data and often want to apply the same
technique to SQL Server data. Admittedly, at times, processing rows as a
single result set with a single query can seem difficult or impossible.
However, because of the performance implications, cursors should not be
used just because it’s easier to program that way.
Note
SQL Server 2008 introduces the new MERGE
statement, which provides another set-oriented option for processing a
set of input rows and making a row-by-row determination which rows to
ignore or which to insert, update, or delete in the target table.
When to Use Cursors
Application performance can
sometimes be slow due to the improper use of cursors. You should always
try to write your T-SQL code so SQL Server can perform what it is good
at: set-based operations. It makes little sense to have an advanced
relational database management system (RDBMS) and use it only for
one-row-at-a-time retrievals. For example, many update operations
performed using cursors can be performed with a single UPDATE statement using the CASE expression. Consider the cursor shown in Listing 1.
Listing 1. Updating the titles Table by Using a Cursor
/* This is a SQL script to update book prices dependent on current price and ytd_sales */
/*declare cursor*/ declare titles_curs cursor for select ytd_sales, price from dbo.titles for update of price
declare @ytd_sales int, @price money open titles_curs fetch next from titles_curs into @ytd_sales, @price
if (@@fetch_status = -1) begin print 'No books found' close titles_curs deallocate titles_curs return end while (@@fetch_status = 0) begin if @ytd_sales < 500 update titles set price = @price * .75 where current of titles_curs else if @price > $15 update titles set price = @price * .9 where current of titles_curs else update titles set price = @price * 1.15 where current of titles_curs fetch next from titles_curs into @ytd_sales, @price end if (@@fetch_status = -2) raiserror ('Attempt to fetch a row failed', 16, 1) close titles_curs deallocate titles_curs
|
This cursor can be replaced with a simple, single UPDATE statement, using the CASE expression, as shown in Listing 2
Listing 2. The titles Cursor Example Performed with a Single UPDATE Statement Using the CASE Expression
update titles set price = case when ytd_sales < 500 then price *.75 when price > $15 then price * .90 else price * 1.15 end
|
The advantages with
this approach are significant performance improvement and much cleaner
and simpler code. In testing the performance of the single update versus
the cursor using the bigpubs2008 database, the cursor required on average around 100 milliseconds (ms) to complete. The single update
statement required, on average, about 10 ms. (Your results may vary
depending on hardware capabilities.) Although both of these completed
within a subsecond response time, consider that the cursor took 10 times
longer to complete than the single update. Factor that out over
hundreds of thousands or millions of rows, and you could be looking at a
significant performance difference.
Why is the cursor so much slower? Well, for one thing, a table scan performed by an UPDATE, a DELETE, or a SELECT
uses internal, compiled C code to loop through the result set. A cursor
uses interpreted SQL code. In addition, with a cursor, you are
performing multiple lines of code per row retrieved. The titles
cursor example is a relatively simple one; it performs one or two
conditional checks and a single update per row, but it is still three or
four times slower. Because of the overhead required to process cursors,
set-oriented operations typically run much faster, even if multiple
passes of the table are required.
Although set-oriented
operations are almost always faster than cursor operations, the one
possible disadvantage of using a single update is locking concurrency.
Even though a single update runs faster than a cursor, while it is
running, the single update might end up locking the entire table for an
extended period of time. This would prevent other users from accessing
the table during the update. If concurrent access to the table is more
important than the time it takes for the update to complete, you might
want to consider using a cursor. A cursor locks the data only a row at a
time instead of locking the entire table (as long as each row is
committed individually and the entire cursor is not in a transaction).
Another situation in which
you might want to consider using cursors is for scrolling applications
when the result sets can be quite large. Consider a customer service
application. The customer representative might need to pull up a list of
cases and case contacts associated with a customer. If the result sets
are small, you can just pull the entire result set down into a list box
and let the user scroll through them and not need to use a cursor. However,
if thousands of rows of information are likely, you might want to pull
back only a block of rows at a time, especially if the user needs to
look at only a few of the rows to get the information he or she needs.
It probably wouldn’t be worth pulling back all that data across the
network just for a few rows.
In this type of situation,
you might want to use a scrollable API server cursor. This way, you can
retrieve the appropriate number of rows to populate the list box and
then use the available scrolling options to quickly fetch to the bottom
of the list, using the LAST or ABSOLUTE n options, or you can go backward or forward by using the RELATIVE option.
Note
You need to be careful
using the scrollable API server cursor approach in a multitier
environment. Many multitier architectures include a middle data layer
that often uses connection sharing for multiple clients, and the users
are typically assigned any available connection when they need to access
SQL Server. Users do not necessarily use the same connection each time.
Therefore, if a user created a cursor in one connection, the next time
the user submitted a fetch through the data layer, he or she might get a
different connection, and the cursor will not be available.
One solution for this problem is to go back to retrieving the entire
result set down to the client application. Another possible solution is
to use a global temp table as a type of homemade insensitive cursor to
hold the result set and grab the data from the global temp table in
chunks, as needed. With the temp table approach, you need to make sure a
sequential key is on the table so you can quickly grab the block of
rows you need. You need to be aware of the potential impact on tempdb performance and the size requirements of tempdb if the result sets are large and you have many concurrent users.
As a general rule, you should use
cursors only as a last resort when no set-oriented solution is feasible.
If you have decided that a cursor is the appropriate solution, you
should try to make it as efficient as possible by limiting the number of
commands to be executed within the cursor loop as much as possible.
Also, you should try to keep the cursor processing on the server side
within stored procedures. If you will be performing multiple fetches
over the network (for example, to support a scrolling application), you
should use an API server cursor. You should avoid using client-side
cursors that will be performing many cursor operations in the client
application; otherwise, you will find your application making excessive
requests to the server, and the volume of network roundtrips will make
for a sloth-like application.
Variable Assignment in UPDATE Statements
One commonly overlooked feature in T-SQL is the ability to assign values to local variables in the SET clause of the UPDATE
statement. This capability can help improve query performance by
reducing locking and CPU contention and reducing the number of
statements required in a T-SQL batch.
The simplified syntax of the SET clause for assigning values to variables is as follows:
SET
{ column_name = { expression | DEFAULT | NULL }
| @variable = expression
| @variable = column = expression [ ,...n ]
} [ ,...n ]
One common use of variable assignment in UPDATE
statements is when you have a table that is used for storing and
generating unique key values. To demonstrate this, you can create the keygen table and populate it as shown in Listing 3.
Listing 3. Creating and Populating the keygen Table
create table keygen (keytype char(1), keyval int) go insert keygen(keytype, keyval) values ('x', 1) go
|
The typical approach often used to perform the task of retrieving a key value and updating the keygen table to generate the next key is to issue a SELECT statement and UPDATE statement within a transaction. Listing 4 shows an example of this.
Listing 4. Retrieving and Updating keyval with SELECT and UPDATE
begin tran declare @newkey int -- Select current keyval into @newkey select @newkey = keyval from keygen (XLOCK) where keytype = 'x' update keygen set keyval = keyval + 1 where keytype = 'x' commit select @newkey as newkey go
newkey ----------- 1
|
Tip
In Listing 4, the XLOCK hint is specified in the SELECT
statement. This prevents two separate user processes from running this
T-SQL batch at the same time and both acquiring the same keyval. With the XLOCK
hint, only one of the processes can acquire an exclusive lock, and the
other process waits until the lock is released and acquires the next keyval.
The use of XLOCK is definitely preferable to HOLDLOCK because the use of HOLDLOCK in this type of scenario often leads to a deadlock situation.
By using variable assignment in an UPDATE statement, you can eliminate the SELECT statement altogether and capture the keyval in the same statement you use to update the keygen table, as shown in Listing 5.
Listing 5. Using Variable Assignment in an UPDATE to Update and Retrieve keyval
declare @newkey int update keygen set keyval = keyval + 1, @newkey = keyval where keytype = 'x' select @newkey as newkey go
newkey ----------- 2
|
Notice that the value assigned to the local variable using the syntax shown in Listing 5 is the value of the keyval column prior to the update. If you prefer to assign the value of the column after the column is updated, you use the @variable = column = expression syntax, as shown in Listing 6.
Listing 6. Using Variable Assignment in an UPDATE to Update and Retrieve keyval After Update
declare @newkey int update keygen set @newkey = keyval = keyval + 1 where keytype = 'x' select @newkey as newkey go
newkey ----------- 4
|
You need to be
aware that the variable assignment is performed for every row that
qualifies in the update. The resulting value of the local variable is
the value of the last row updated.
Another use for variable assignment in UPDATE
statements is to accumulate the sum of a column into a local variable
for all the rows being updated. The alternative approach would be to use
a cursor, as shown in Listing 7.
Listing 7. Using a Cursor to Accumulate a Sum of a Column for Each Row Updated
declare c1 cursor for select isnull(ytd_sales, 0) from titles where type = 'business' for update of price go declare @ytd_sales_total int, @ytd_sales int select @ytd_sales_total = 0 open c1 fetch c1 into @ytd_sales while @@fetch_status = 0 begin update titles set price = price where current of c1 select @ytd_sales_total = @ytd_sales_total + @ytd_sales fetch c1 into @ytd_sales end select @ytd_sales_total as ytd_sales_total close c1 deallocate c1 go
ytd_sales_total --------------- 30788
|
By using variable assignment in an UPDATE statement, you can replace the cursor in Listing 7 with a single UPDATE statement, as shown in Listing 8.
Listing 8. Using Variable Assignment in an UPDATE Statement to Accumulate a Sum of a Column for Each Row Updated
declare @ytd_sales_total int set @ytd_sales_total = 0 update titles set price = price, @ytd_sales_total = @ytd_sales_total + isnull(ytd_sales, 0) where type = 'business' select @ytd_sales_total as ytd_sales_total go
ytd_sales_total --------------- 30788
|