SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 1) - Optional Parameters via Static T-SQL

10/14/2012 7:26:37 PM

The primary use case for dynamic SQL is the ability to write stored procedures that can support optional parameters for queries in an efficient, maintainable manner. Although it is quite easy to write static stored procedures that handle optional query parameters, these are generally grossly inefficient or highly unmaintainable—as a developer, you can take your pick.

Optional Parameters via Static T-SQL

Before presenting the dynamic SQL solution to the optional parameter problem, a few demonstrations are necessary to illustrate why static SQL is not the right tool for the job. There are a few different methods of varying complexity and effectiveness, but none deliver consistently.

As a baseline, consider the following query, which selects data from the HumanResources.Employee table in the AdventureWorks database:

SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee
WHERE
    EmployeeId = 1
    AND NationalIdNumber = N'14417807'

This query uses predicates to filter on both the EmployeeId and NationalIdNumber columns. Executing the query produces the execution plan shown in Figure 1 , which has an estimated cost of 0.0032831, and which requires two logical reads. This plan involves a seek of the table's clustered index, which uses the EmployeeId column as its key.

Base execution plan with seek on EmployeeId clustered index

Since the query uses the clustered index, it does not need to do a lookup to get any additional data. Furthermore, since EmployeeId is the primary key for the table, the NationalIdNumber predicate is not used when physically identifying the row. Therefore, the following query, which uses only the EmployeeId predicate, produces the exact same query plan with the same cost and same number of reads:

SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee
WHERE
    EmployeeId = 1

Another form of this query involves removing EmployeeId and querying based only on NationalIdNumber:

SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee
WHERE
    NationalIdNumber = N'14417807'

This query results in a very different plan from the other two, due to the fact that a different index must be used to satisfy the query. Figure 2 shows the resultant plan, which involves a seek on a nonclustered index on the NationalIdNumber column, followed by a lookup to get the additional rows for the SELECT list. This plan has an estimated cost of 0.0065704, and does four logical reads.

Base execution plan with seek on NationalIdNumber nonclustered index followed by a lookup into the clustered index

The final form of the base query has no predicates at all:

SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee

As shown in Figure 3 , the query plan is a simple clustered index scan, with an estimated cost of 0.0080454, and nine logical reads. Since all of the rows need to be returned and no index covers every column required, a clustered index scan is the most efficient way to satisfy this query.

Base execution plan with scan on the clustered index

These baseline numbers will be used to compare the relative performance of various methods of creating a dynamic stored procedure that returns the same columns, but which optionally enables one or both predicates. To begin with, the query can be wrapped in a stored procedure:

CREATE PROCEDURE GetEmployeeData
    @EmployeeId INT = NULL,
    @NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
    SET NOCOUNT ON

    SELECT
        ContactId,
        LoginId,
        Title
    FROM HumanResources.Employee
    WHERE
        EmployeeId = @EmployeeId
        AND NationalIdNumber = @NationalIdNumber
END

This stored procedure uses the parameters @EmployeeId and @NationalIdNumber to support the predicates. Both of these parameters are optional, with NULL default values. However, this stored procedure does not really support the parameters optionally; not passing one of the parameters will mean that no rows will be returned by the stored procedure at all, since any comparison with NULL in a predicate will not result in a true answer.

As a first shot at making this stored procedure optionally enable the predicates, a developer might try control of flow and rewrite the procedure as follows:

CREATE PROCEDURE GetEmployeeData
    @EmployeeId INT = NULL,
    @NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
    SET NOCOUNT ON

    IF (@EmployeeId IS NOT NULL
        AND @NationalIdNumber IS NOT NULL)
    BEGIN
        SELECT
            ContactId,

Code View: Scroll / Show All

LoginId,
            Title
        FROM HumanResources.Employee
        WHERE
            EmployeeId = @EmployeeId
            AND NationalIdNumber = @NationalIdNumber
    END
    ELSE IF (@EmployeeId IS NOT NULL)
    BEGIN
        SELECT
            ContactId,
            LoginId,
            Title
        FROM HumanResources.Employee
        WHERE
            EmployeeId = @EmployeeId
    END
    ELSE IF (@NationalIdNumber IS NOT NULL)
    BEGIN
        SELECT
            ContactId,
            LoginId,
            Title
        FROM HumanResources.Employee
        WHERE
            NationalIdNumber = @NationalIdNumber
    END
    ELSE
    BEGIN
        SELECT
            ContactId,
            LoginId,
            Title
        FROM HumanResources.Employee
    END
END

Although executing this stored procedure produces the exact same query plans—and, therefore, the exact same performance—as the test batches, it has an unfortunate problem. Namely, taking this approach turns what was a very simple 10-line stored procedure into a 42-line monster. Consider that when adding a column to the SELECT list for this procedure, a change would have to be made in four places. Now consider what would happen if a third predicate were needed—the number of cases would jump from four to eight, meaning that any change such as adding or removing a column would have to be made in eight places. Now consider 10 or 20 predicates, and it's clear that this method has no place in the SQL Server developer's toolbox. It is simply not a manageable solution.

The next most common technique is one that has appeared in articles on several SQL Server web sites over the past few years. As a result, a lot of code has been written against it by developers who don't seem to realize that they're creating a performance time bomb. This technique takes advantage of the COALESCE function, as shown in the following rewritten version of the stored procedure:

CREATE PROCEDURE GetEmployeeData
    @EmployeeId INT = NULL,
    @NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
    SET NOCOUNT ON

    SELECT
        ContactId,
        LoginId,
        Title
    FROM HumanResources.Employee
    WHERE
        EmployeeId = COALESCE(@EmployeeId, EmployeeId)
        AND NationalIdNumber = COALESCE(@NationalIdNumber, NationalIdNumber)
END

This version of the stored procedure looks great and is easy to understand. The COALESCE function returns the first non-NULL value passed into its parameter list. So if either of the arguments to the stored procedure are NULL, the COALESCE will "pass through," comparing the value of the column to itself—and at least in theory, that seems like it should be a no-op.

Unfortunately, because the COALESCE function uses a column from the table as an input, it cannot be evaluated deterministically before execution of the query. The result is that the function is evaluated once for every row of the table, resulting in a table scan. This means consistent results, but probably not in a good way; all four combinations of parameters result in the same query plan, a clustered index scan with an estimated cost of 0.0080454 and nine logical reads. This is over four times the I/O for the queries involving the EmployeeId column—quite a performance drain.

Similar to the version that uses COALESCE is a version that uses OR to conditionally set the parameter only if the argument is not NULL:

CREATE PROCEDURE GetEmployeeData
    @EmployeeId INT = NULL,
    @NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
    SET NOCOUNT ON

    SELECT
        ContactId,
        LoginId,
        Title
    FROM HumanResources.Employee
    WHERE
        (@EmployeeId IS NULL

OR EmployeeId = @EmployeeId)
        AND (@NationalIdNumber IS NULL
            OR @NationalIdNumber = NationalIdNumber)
END

This version, while similar in idea to the version that uses COALESCE, has some interesting performance traits. Depending on which parameters you use the first time you call it, you'll see vastly different results. If you're lucky enough to call it the first time with no arguments, the result will be an index scan, producing nine logical reads—and the same number of reads will result for any combination of parameters passed in thereafter. If, however, you first call the stored procedure using only the @EmployeeId parameter, the resultant plan will use only four logical reads—until you happen to call the procedure with no arguments, and it produces a massive 582 reads.

Given the surprisingly huge jump in I/O that the bad plan can produce, as well as the unpredictable nature of what performance characteristics you'll end up with, this is undoubtedly the worst possible choice.

The final method that can be used is a bit more creative, and also can result in somewhat-better results. The following version of the stored procedure shows how it is implemented:

CREATE PROCEDURE GetEmployeeData
    @EmployeeId INT = NULL,
    @NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
    SET NOCOUNT ON

    SELECT
        ContactId,
        LoginId,
        Title
    FROM HumanResources.Employee
    WHERE
        EmployeeId BETWEEN
            COALESCE(@EmployeeId, −2147483648)
                AND COALESCE(@EmployeeId, 2147483647)
        AND NationalIdNumber LIKE COALESCE(@NationalIdNumber, N'%')
END

If you're a bit confused by the logic of this stored procedure, you're now familiar with the first reason that I don't recommend this technique: it's relatively unmaintainable if you don't understand exactly how it works. Using it almost certainly guarantees that you will produce stored procedures that will stump others who attempt to maintain them in the future. And while that might be good for job security, using it for that purpose is probably not a virtuous goal.

This stored procedure operates by using COALESCE to cancel out NULL arguments by substituting in minimum and maximum conditions for the integer predicate (EmployeeId) and a LIKE expression that will match anything for the string predicate (NationalIdNumber).

If @EmployeeId is NULL, the EmployeeId predicate effectively becomes EmployeeId BETWEEN −2147483648 AND 2147483647—in other words, all possible integers. If @EmployeeId is not NULL, the predicate becomes EmployeeId BETWEEN @EmployeeId AND @EmployeeId. This is equivalent with EmployeeId=@EmployeeId.

The same basic logic is true for the NationalIdNumber predicate, although because it's a string instead of an integer, LIKE is used instead of BETWEEN. If @NationalIdNumber is NULL, the predicate becomes NationalIdNumber LIKE N'%'. This will match any string in the NationalIdNumber column. On the other hand, if @NationalIdNumber is not NULL, the predicate becomes NationalIdNumber LIKE @NationalIdNumber, which is equivalent with NationalIdNumber=@NationalIdNumber—assuming that @NationalIdNumber contains no string expressions. This predicate can also be written using BETWEEN to avoid the string expression issue (for instance: BETWEEN N'' AND REPLICATE(NCHAR(1000), 15)). However, that method is both more difficult to read than the LIKE expression and fraught with potential problems due to collation issues (which is why I only went up to NCHAR(1000) instead of NCHAR(65535) in the example).

The real question, of course, is one of performance. Unfortunately, this stored procedure manages to confuse the query optimizer, resulting in the same plan being generated for every invocation. The plan, in every case, involves a clustered index seek on the table, with an estimated cost of 0.0048592, as shown in Figure 4 . Unfortunately, this estimate turns out to be highly inconsistent, as the number of actual logical reads varies widely based on the arguments passed to the procedure.

Every set of arguments passed to the stored procedure results in the same execution plan.

If both arguments are passed or @EmployeeId is passed but @NationalIdNumber is not, the number of logical reads is three. While this is much better than the nine logical reads required by the previous version of the stored procedure, it's still 50% more I/O than the two logical reads required by the baseline in both of these cases. This estimated plan really breaks down when passing only @NationalIdNumber, since there is no way to efficiently satisfy a query on the NationalIdNumber column using the clustered index. In both that case and when passing no arguments, nine logical reads are reported. For the NationalIdNumber predicate this is quite a failure, as the stored procedure does over twice as much work for the same results as the baseline.

SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 4) - sp_executesql: A Better EXECUTE

SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 3) - SQL Injection

SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 2) - Going Dynamic: Using EXECUTE

Other

SQL Server 2005 : Dynamic T-SQL - Why Go Dynamic?

MySQL for Python : Passing a query to MySQL

MySQL for Python : Forming a query in MySQL

SQL Injection : Platform-Level Defenses - Securing the Database

SQL Injection : Platform-Level Defenses - Using Runtime Protection (part 2) - Intercepting Filters

SQL Injection : Platform-Level Defenses - Using Runtime Protection (part 1) - Web Application Firewalls

SQL Injection : Code-Level Defenses - Designing to Avoid the Dangers of SQL Injection

SQL Injection : Code-Level Defenses - Canonicalization

SQL Injection : Code-Level Defenses - Encoding Output

Upgrading and Converting to Access 2010 : INSTALLING MULTIPLE VERSIONS OF ACCESS ON ONE PC, CHANGING FILE FORMATS