SQL Server 2008 : General T-SQL Performance Recommendations

5/17/2011 3:51:38 PM

How you write T-SQL queries can often have an effect on the performance of those queries. The following sections provide some general guidelines to keep in mind to help ensure that you are getting optimal performance from your queries.

UNION Versus UNION ALL Performance

You should use UNION ALL instead of UNION if there is no need to eliminate duplicate result rows from the result sets being combined with the UNION operator. The UNION statement has to combine the result sets into a worktable to remove any duplicate rows from the result set. UNION ALL simply concatenates the result sets together, without the overhead of putting them into a worktable to remove duplicate rows.

Use IF EXISTS Instead of SELECT COUNT(*)

You should use IF EXISTS instead of SELECT COUNT(*) when checking only for the existence of any matching data values and when determining the number of matching rows is not required. IF EXISTS stops the processing of the select query as soon as the first matching row is found, whereas SELECT COUNT(*) continues searching until all matches are found, wasting I/O and CPU cycles. For example, you could replace

if (SELECT count(*) FROM dbo.sales WHERE stor_id = '6380') > 0

with an IF EXISTS check similar to

if exists (SELECT * FROM dbo.sales WHERE stor_id = '6380')

Avoid Unnecessary ORDER BY or DISTINCT Clauses

When a T-SQL query contains an ORDER BY or DISTINCT clause, a worktable is often required to process the final result of the query if it cannot determine that the rows will already be retrieved in the desired sort order or that a unique key in the result makes the rows distinct. If a query requires a worktable, that adds extra overhead and I/O to put the results into the worktable in tempdb and do the sorting necessary to order the results or to eliminate duplicate rows. This can result in extended processing time for the query, which can delay the time it takes for the final result to be returned to the client application.

If it is not absolutely necessary for the rows returned to the application to be in a specific order (for example, returning rows to a grid control where the contents can be re-sorted by any column in the grid control itself), you should leave off the ORDER BY clause in your queries.

Likewise, you should not arbitrarily include the DISTINCT clause in all your queries unless it is absolutely necessary to eliminate any duplicate rows from the result set.

Temp Tables Versus Table Variables Versus Common Table Expressions

SQL Server 2008 provides multiple options for working with temporary result sets in T-SQL code:

Temporary tables
Table variables
Derived tables
Common table expressions

One of the questions you may consider is “Which method should I use and when?” Whether you use a temporary table, table variable, derived table, or common table expression depends, in part, on how often and for how long you intend to use it. This section provides some general recommendations to consider.

You should use table variables instead of temporary tables in stored procedures whenever possible or feasible. Table variables are memory resident and do not incur the I/O overhead and system table and I/O contention that can occur in tempdb with normal temporary tables. However, remember that table variables exist only for the duration of the SQL batch or stored procedure in which they are defined.

In SQL Server 2005 and later, you also have the option of using derived tables or common table expressions in your queries to generate and hold intermediate result sets that can be further processed by the main query. A derived table is a subquery contained in a FROM clause that can be referred to by an alias and used as a table in the query. Derived tables and common table expressions can be thought of as sort of dynamic views that exist only for the duration of the query. Derived tables are handy if you don’t need to use a result set more than once in multiple queries. You should consider using derived tables or common table expressions when possible to completely avoid the use of table variables or temporary tables, especially if the temporary table or table variable is used only once by a single query.

You should generally consider using temporary tables only when you need to share data between an application and stored procedures or between stored procedures. Also, if the temporary result set is going to be very large (that is, larger than can be held in SQL Server cache memory), you should consider storing it in a temporary table rather than a table variable.

Note

In SQL Server 2008, you can define table data types, which makes it possible to pass table variables to stored procedures as table parameters, so temp tables aren’t the only way to share data between stored procedures. However, there are some limitations: primarily, the contents of a table parameter passed to a stored procedure are read only and cannot be modified within the stored procedure. If you want to share data between stored procedures and have the ability to add, remove, or modify rows in any of the stored procedures, temporary tables are still the best solution.

If you need to use temporary tables, you can follow these general guidelines to help improve their performance:

Select only the columns actually required by the subsequent SQL statements into the temp table (that is, avoid using select *). This helps reduce the size of the temp table, thereby reducing the number of writes to tempdb and also speeding up access of the data within the temp table because more rows will fit on a data page, reducing the number of data pages that need to be accessed by the query.
Select only the rows needed by the subsequent queries, again to help limit the size of the temp table and reduce the amount of I/O in tempdb.
If the temporary table will be accessed multiple times by queries using search arguments (SARGs), consider creating an index on the temporary table if it can be used to speed up the queries against the temp table and reduce I/O. Of course, this option should be considered only if the time and I/O saved by having an index on the temporary table significantly exceeds the time and I/O required to create the index.

Avoid Unnecessary Function Executions

If you call a SQL Server function (for example, suser_name(), getdate()) repeatedly within a procedure or in T-SQL code, you should consider using a local variable to hold the value returned by the function and use the local variable repeatedly throughout your SQL statements rather than repeatedly executing the SQL Server function. This saves CPU cycles within your stored procedure and T-SQL code.

Cursors and Performance

In contrast to most other programming languages, SQL is a set-based processing language. You retrieve sets of rows, update sets of rows, and delete sets of rows. The set of rows affected is determined by the search conditions specified in the query. Unfortunately, most programmers are used to doing record-oriented operations on data and often want to apply the same technique to SQL Server data. Admittedly, at times, processing rows as a single result set with a single query can seem difficult or impossible. However, because of the performance implications, cursors should not be used just because it’s easier to program that way.

Note

SQL Server 2008 introduces the new MERGE statement, which provides another set-oriented option for processing a set of input rows and making a row-by-row determination which rows to ignore or which to insert, update, or delete in the target table.

When to Use Cursors

Application performance can sometimes be slow due to the improper use of cursors. You should always try to write your T-SQL code so SQL Server can perform what it is good at: set-based operations. It makes little sense to have an advanced relational database management system (RDBMS) and use it only for one-row-at-a-time retrievals. For example, many update operations performed using cursors can be performed with a single UPDATE statement using the CASE expression. Consider the cursor shown in Listing 1.

Listing 1. Updating the titles Table by Using a Cursor

/* This is a SQL script to update book prices dependent on current price and
ytd_sales */

/*declare cursor*/
declare titles_curs cursor for
  select ytd_sales, price from dbo.titles
  for update of price

declare @ytd_sales int, @price money
open titles_curs
fetch next from titles_curs into @ytd_sales, @price

if (@@fetch_status = -1)
begin
    print 'No books found'
    close titles_curs
    deallocate titles_curs
    return
end
while (@@fetch_status = 0)
begin
    if @ytd_sales < 500
       update titles set price = @price * .75
          where current of titles_curs
    else
       if @price > $15
           update titles set price = @price * .9
              where current of titles_curs
       else
           update titles set price = @price * 1.15
              where current of titles_curs
       fetch next from titles_curs into @ytd_sales, @price
end
if (@@fetch_status = -2)
    raiserror ('Attempt to fetch a row failed', 16, 1)
close titles_curs
deallocate titles_curs

This cursor can be replaced with a simple, single UPDATE statement, using the CASE expression, as shown in Listing 2

Listing 2. The titles Cursor Example Performed with a Single UPDATE Statement Using the CASE Expression

update titles
   set price = case when ytd_sales < 500 then price *.75
                    when price > $15 then price * .90
                    else price * 1.15
               end

The advantages with this approach are significant performance improvement and much cleaner and simpler code. In testing the performance of the single update versus the cursor using the bigpubs2008 database, the cursor required on average around 100 milliseconds (ms) to complete. The single update statement required, on average, about 10 ms. (Your results may vary depending on hardware capabilities.) Although both of these completed within a subsecond response time, consider that the cursor took 10 times longer to complete than the single update. Factor that out over hundreds of thousands or millions of rows, and you could be looking at a significant performance difference.

Why is the cursor so much slower? Well, for one thing, a table scan performed by an UPDATE, a DELETE, or a SELECT uses internal, compiled C code to loop through the result set. A cursor uses interpreted SQL code. In addition, with a cursor, you are performing multiple lines of code per row retrieved. The titles cursor example is a relatively simple one; it performs one or two conditional checks and a single update per row, but it is still three or four times slower. Because of the overhead required to process cursors, set-oriented operations typically run much faster, even if multiple passes of the table are required.

Although set-oriented operations are almost always faster than cursor operations, the one possible disadvantage of using a single update is locking concurrency. Even though a single update runs faster than a cursor, while it is running, the single update might end up locking the entire table for an extended period of time. This would prevent other users from accessing the table during the update. If concurrent access to the table is more important than the time it takes for the update to complete, you might want to consider using a cursor. A cursor locks the data only a row at a time instead of locking the entire table (as long as each row is committed individually and the entire cursor is not in a transaction).

Another situation in which you might want to consider using cursors is for scrolling applications when the result sets can be quite large. Consider a customer service application. The customer representative might need to pull up a list of cases and case contacts associated with a customer. If the result sets are small, you can just pull the entire result set down into a list box and let the user scroll through them and not need to use a cursor. However, if thousands of rows of information are likely, you might want to pull back only a block of rows at a time, especially if the user needs to look at only a few of the rows to get the information he or she needs. It probably wouldn’t be worth pulling back all that data across the network just for a few rows.

In this type of situation, you might want to use a scrollable API server cursor. This way, you can retrieve the appropriate number of rows to populate the list box and then use the available scrolling options to quickly fetch to the bottom of the list, using the LAST or ABSOLUTE n options, or you can go backward or forward by using the RELATIVE option.

Note

You need to be careful using the scrollable API server cursor approach in a multitier environment. Many multitier architectures include a middle data layer that often uses connection sharing for multiple clients, and the users are typically assigned any available connection when they need to access SQL Server. Users do not necessarily use the same connection each time. Therefore, if a user created a cursor in one connection, the next time the user submitted a fetch through the data layer, he or she might get a different connection, and the cursor will not be available.

One solution for this problem is to go back to retrieving the entire result set down to the client application. Another possible solution is to use a global temp table as a type of homemade insensitive cursor to hold the result set and grab the data from the global temp table in chunks, as needed. With the temp table approach, you need to make sure a sequential key is on the table so you can quickly grab the block of rows you need. You need to be aware of the potential impact on tempdb performance and the size requirements of tempdb if the result sets are large and you have many concurrent users.

As a general rule, you should use cursors only as a last resort when no set-oriented solution is feasible. If you have decided that a cursor is the appropriate solution, you should try to make it as efficient as possible by limiting the number of commands to be executed within the cursor loop as much as possible. Also, you should try to keep the cursor processing on the server side within stored procedures. If you will be performing multiple fetches over the network (for example, to support a scrolling application), you should use an API server cursor. You should avoid using client-side cursors that will be performing many cursor operations in the client application; otherwise, you will find your application making excessive requests to the server, and the volume of network roundtrips will make for a sloth-like application.

Variable Assignment in UPDATE Statements

One commonly overlooked feature in T-SQL is the ability to assign values to local variables in the SET clause of the UPDATE statement. This capability can help improve query performance by reducing locking and CPU contention and reducing the number of statements required in a T-SQL batch.

The simplified syntax of the SET clause for assigning values to variables is as follows:

SET
    { column_name = { expression | DEFAULT | NULL }
      | @variable = expression
      | @variable = column = expression [ ,...n ]
    } [ ,...n ]

One common use of variable assignment in UPDATE statements is when you have a table that is used for storing and generating unique key values. To demonstrate this, you can create the keygen table and populate it as shown in Listing 3.

Listing 3. Creating and Populating the keygen Table

create table keygen (keytype char(1), keyval int)
go
insert keygen(keytype, keyval) values ('x', 1)
go

The typical approach often used to perform the task of retrieving a key value and updating the keygen table to generate the next key is to issue a SELECT statement and UPDATE statement within a transaction. Listing 4 shows an example of this.

Listing 4. Retrieving and Updating keyval with SELECT and UPDATE

begin tran
declare @newkey int
-- Select current keyval into @newkey
select @newkey = keyval
   from keygen (XLOCK)
   where keytype = 'x'
update keygen
   set keyval = keyval + 1
   where keytype = 'x'
commit
select @newkey as newkey
go

newkey
-----------
1

Tip

In Listing 4, the XLOCK hint is specified in the SELECT statement. This prevents two separate user processes from running this T-SQL batch at the same time and both acquiring the same keyval. With the XLOCK hint, only one of the processes can acquire an exclusive lock, and the other process waits until the lock is released and acquires the next keyval.

The use of XLOCK is definitely preferable to HOLDLOCK because the use of HOLDLOCK in this type of scenario often leads to a deadlock situation.

By using variable assignment in an UPDATE statement, you can eliminate the SELECT statement altogether and capture the keyval in the same statement you use to update the keygen table, as shown in Listing 5.

Listing 5. Using Variable Assignment in an UPDATE to Update and Retrieve keyval

declare @newkey int
update keygen
   set keyval = keyval + 1,
       @newkey = keyval
   where keytype = 'x'
select @newkey as newkey
go

newkey
-----------
2

Notice that the value assigned to the local variable using the syntax shown in Listing 5 is the value of the keyval column prior to the update. If you prefer to assign the value of the column after the column is updated, you use the @variable = column = expression syntax, as shown in Listing 6.

Listing 6. Using Variable Assignment in an UPDATE to Update and Retrieve keyval After Update

declare @newkey int
update keygen
   set @newkey = keyval = keyval + 1
   where keytype = 'x'
select @newkey as newkey
go

newkey
-----------
4

You need to be aware that the variable assignment is performed for every row that qualifies in the update. The resulting value of the local variable is the value of the last row updated.

Another use for variable assignment in UPDATE statements is to accumulate the sum of a column into a local variable for all the rows being updated. The alternative approach would be to use a cursor, as shown in Listing 7.

Listing 7. Using a Cursor to Accumulate a Sum of a Column for Each Row Updated

declare c1 cursor for
   select isnull(ytd_sales, 0)
     from titles where type = 'business'
     for update of price
go
declare @ytd_sales_total int,
        @ytd_sales int
select @ytd_sales_total = 0
open c1
fetch c1 into @ytd_sales
while @@fetch_status = 0
begin
    update titles
       set price = price
       where current of c1
    select @ytd_sales_total = @ytd_sales_total + @ytd_sales
    fetch c1 into @ytd_sales
end
select @ytd_sales_total as ytd_sales_total
close c1
deallocate c1
go

ytd_sales_total
---------------
30788

By using variable assignment in an UPDATE statement, you can replace the cursor in Listing 7 with a single UPDATE statement, as shown in Listing 8.

Listing 8. Using Variable Assignment in an UPDATE Statement to Accumulate a Sum of a Column for Each Row Updated

declare @ytd_sales_total int
set @ytd_sales_total = 0
update titles
   set price = price,
       @ytd_sales_total = @ytd_sales_total + isnull(ytd_sales, 0)
   where type = 'business'
select @ytd_sales_total as ytd_sales_total
go

ytd_sales_total
---------------
30788

Other

SQL Server 2008 : General T-SQL Coding Recommendations (part 2) - Avoid SQL Injection Attacks When Using Dynamic SQL & Comment Your T-SQL Code

SQL Server 2008 : General T-SQL Coding Recommendations (part 1) - Provide Explicit Column Lists & Qualify Object Names with a Schema Name

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Using Extended Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Installing and Using .NET CLR Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Using Dynamic SQL in Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Stored Procedure Performance

SQL Server 2008 : Using Remote Stored Procedures

SQL Server 2008 : Using Temporary Tables in Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Nested Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Using Cursors in Stored Procedures