SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 2) - Going Dynamic: Using EXECUTE

10/14/2012 7:28:16 PM

Going Dynamic: Using EXECUTE

The solution to all of the static SQL problems is, of course, to go dynamic. Building dynamic SQL inside of a stored procedure is simple, the code is relatively easy to understand, and as I'll show, it can provide excellent performance. However, there are various potential issues to note, not the least of which being security concerns.

The real benefit of dynamic SQL is that the execution plans generated for each invocation of the query will be optimized for only the predicates that are actually being used at that moment. The main issue with the static SQL solutions, aside from maintainability, was that the additional predicates confused the query optimizer, causing it to create inefficient plans. Dynamic SQL gets around this issue by not including anything extra in the query.

The simplest way to implement dynamic SQL in a stored procedure is with the EXECUTE statement. This statement takes as input a string, and executes whatever SQL the string contains. The following batch shows this in its simplest—and least effective—form:

EXEC('SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee')

Note that in this example , I use the truncated form of EXECUTE. This seems to be a de facto standard for SQL Server code; I very rarely see code that uses the full form with the added "UTE." Although this is only a savings of three characters, I am very used to seeing it, and for some reason it makes a lot more sense to me when reading SQL than seeing the full EXECUTE keyword.

In this case, a string literal is passed to EXECUTE, and this doesn't really allow for anything very "dynamic." For instance, to add a predicate on EmployeeId to the query, the following would not work:

DECLARE @EmployeeId INT
SET @EmployeeId = 1

EXEC('SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee
WHERE EmployeeId = ' + CONVERT(VARCHAR, @EmployeeId))

This fails (with an "incorrect syntax" exception) because of the way EXECUTE is parsed by the SQL Server engine. SQL Server does only one pass to parse the syntax, and then tries to concatenate and execute the SQL in a second step. But due to the fact that the first step does not include a stage for inline expansion, the CONVERT is still a CONVERT, rather than a literal, when it's time for concatenation.

The solution to this issue is quite simple. Define a variable and assign the dynamic SQL to it, and then call EXECUTE:

DECLARE @EmployeeId INT
SET @EmployeeId = 1

DECLARE @sql NVARCHAR(MAX)

SET @sql = 'SELECT
    ContactId,

LoginId,
    Title
FROM HumanResources.Employee
WHERE EmployeeId = ' + CONVERT(VARCHAR, @EmployeeId)

EXECUTE(@sql)

The string variable, @sql, can be manipulated in any way in order to form the desired dynamic SQL string, and since it's a variable, various code paths can be created using control-of-flow statements. In other words, forming the dynamic SQL is now limited only by the tools available within the T-SQL language for string manipulation.

A first shot at optional inclusion of both the EmployeeId and NationalIdNumber predicates follows:

DECLARE @EmployeeId INT
SET @EmployeeId = 1

DECLARE @NationalIdNumber NVARCHAR(15)
SET @NationalIdNumber = N'14417807'

DECLARE @sql NVARCHAR(MAX)

SET @sql = 'SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee '

IF (@EmployeeId IS NOT NULL
    AND @NationalIdNumber IS NOT NULL)
BEGIN
    SET @sql = @sql +
        'WHERE EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId) +
            ' AND NationalIdNumber = N''' + @NationalIdNumber + ''''
END
ELSE IF (@EmployeeId IS NOT NULL)
BEGIN
    SET @sql = @sql +
        'WHERE EmployeeId = ' +
            CONVERT(NVARCHAR, @EmployeeId)
END
ELSE IF (@NationalIdNumber IS NOT NULL)
BEGIN
    SET @sql = @sql +
        'WHERE NationalIdNumber = N''' + @NationalIdNumber + ''''
END

EXEC(@sql)

If this looks sickeningly familiar, you've been doing a good job of paying attention; this example has the same maintenance issues as the first shot at a static SQL stored procedure. Adding additional parameters will create a combinatorial explosion, making this solution completely unmaintainable. In addition, the SQL statement has been broken up into two component parts, making it lack a good sense of flow. Think about how bad this might get if you had to add ORDER BY or GROUP BY clauses.

To solve this problem, I like to concatenate my dynamic SQL in one shot, using CASE expressions instead of control-of-flow statements in order to optionally concatenate sections. The following example should serve to illustrate how this works:

DECLARE @EmployeeId INT
SET @EmployeeId = 1

DECLARE @NationalIdNumber NVARCHAR(15)
SET @NationalIdNumber = N'14417807'

DECLARE @sql NVARCHAR(MAX)

SET @sql = 'SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee
WHERE 1=1' +
CASE
    WHEN @EmployeeId IS NULL THEN ''
    ELSE 'AND EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId)
END +
CASE
    WHEN @NationalIdNumber IS NULL THEN ''
    ELSE 'AND NationalIdNumber = N''' + @NationalIdNumber + ''''
END

EXEC(@sql)

In this example, the CASE expressions concatenate an empty string if one of the parameters is NULL. Otherwise, the parameter is formatted as a string and concatenated to the predicate.

Thanks to the CASE expressions, the code is much more compact, and the query is still generally formatted like a query instead of like procedural code. But the real trick here is the addition of 1=1 to the WHERE clause, in order to avoid the combinatorial explosion problem. The query optimizer will "optimize out" (i.e., discard) 1=1 in a WHERE clause, so it has no effect on the resultant query plan. What it does do is allow the optional predicates to use AND without having to be aware of whether other optional predicates are being concatenated. Each predicate can therefore be listed only once in the code, and combinations are not a problem.

The final maintainability issue with this code is one of formatting, and this is an area that I feel is extremely important when working with dynamic SQL. Careful, consistent formatting can mean the difference between quick one-minute changes to stored procedures, instead of several hours of trying to decipher messy code.

To see the problem with the way the code is currently formatted, add PRINT @sql to the end of the batch, to see the final string:

SELECT
    ContactId,
    LoginId,
    Title
FROM HumanResources.Employee
WHERE 1=1AND EmployeeId = 1AND NationalIdNumber = N'14417807'

Although this SQL is valid and executes as-is without exception, it has the potential for problems due to the lack of spacing between the predicates. Debugging spacing issues in dynamic SQL can be maddening, so I have developed a formatting standard that works for me to combat the issue. When I am working with dynamic SQL, I concatenate every line separately, ensuring that each line is terminated with a space. This adds a bit more complexity to the code, but I've found that it makes it much easier to debug. Following is an example of how I like to format my dynamic SQL:

DECLARE @EmployeeId INT
SET @EmployeeId = 1

DECLARE @NationalIdNumber NVARCHAR(15)
SET @NationalIdNumber = N'14417807'

DECLARE @sql NVARCHAR(MAX)

SET @sql = '' +
    'SELECT ' +
        'ContactId, ' +
        'LoginId, ' +
        'Title ' +
    'FROM HumanResources.Employee ' +
    'WHERE 1=1 ' +
    CASE
        WHEN @EmployeeId IS NULL THEN ''
        ELSE 'AND EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId) + ' '
    END +
    CASE
        WHEN @NationalIdNumber IS NULL THEN ''
        ELSE 'AND NationalIdNumber = N''' + @NationalIdNumber + ''' '
    END

EXEC(@sql)

NOTE

I developed this style when working with older versions of SQL Server, which did not have the MAX data types and therefore had stringent variable size limitations. Cutting everything up into individual tokens greatly reduced the amount of white space, meaning that I could fit a lot more code in each variable. Removal of extraneous white space is not necessary in SQL Server 2005, but I still feel that this technique is great for ensuring proper spacing.

Now that the code fragment is properly formatted, it can be transferred into a new version of the GetEmployeeData stored procedure:

CREATE PROCEDURE GetEmployeeData
    @EmployeeId INT = NULL,
    @NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
    SET NOCOUNT ON

    DECLARE @sql NVARCHAR(MAX)

    SET @sql = '' +
        'SELECT ' +
            'ContactId, ' +
            'LoginId, ' +
            'Title ' +
        'FROM HumanResources.Employee ' +
        'WHERE 1=1 ' +
        CASE
            WHEN @EmployeeId IS NULL THEN ''
            ELSE 'AND EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId) + ' '
        END +
        CASE
            WHEN @NationalIdNumber IS NULL THEN ''
            ELSE 'AND NationalIdNumber = N''' + @NationalIdNumber + ''' '
        END

    EXEC(@sql)
END

So that's it—a dynamic stored procedure with optional parameters. At first glance, this might look like a great solution, but it is still fraught with problems.

From a performance point of view, this procedure appears to be great when taken for a few test runs. Each set of input parameters produces the same execution plan as the baseline examples, with the same estimated costs and number of reads. However, under the covers, a major issue still exists: parameterization is not occurring. To illustrate this, start with the following T-SQL, which clears the query plan cache and then runs the procedure with the same optional parameter, for three different input values:

DBCC FREEPROCCACHE
GO

EXEC GetEmployeeData
    @EmployeeId = 1
GO

EXEC GetEmployeeData
    @EmployeeId = 2
GO

EXEC GetEmployeeData
    @EmployeeId = 3
GO

Now, query the sys.dm_exec_cached_plans view, and you will see the output shown in Figure 5 . In this image, notice that there is one cached plan for the procedure itself—which is expected for any stored procedure—and an additional ad hoc plan cached for each invocation of the stored procedure. This means that every time a new argument is passed, a compilation occurs, which is clearly going to kill performance.

The dynamic query is not being parameterized and is therefore producing duplicate query plans for different arguments.

The other issue with this stored procedure, as it currently stands, is a serious security hole. A stored procedure implemented similarly to this one but with a minor modification would open a simple attack vector that a hacker could exploit to easily pull information out of the database, or worse.

SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 4) - sp_executesql: A Better EXECUTE

SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 3) - SQL Injection

SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 1) - Optional Parameters via Static T-SQL

Other

SQL Server 2005 : Dynamic T-SQL - Why Go Dynamic?

MySQL for Python : Passing a query to MySQL

MySQL for Python : Forming a query in MySQL

SQL Injection : Platform-Level Defenses - Securing the Database

SQL Injection : Platform-Level Defenses - Using Runtime Protection (part 2) - Intercepting Filters

SQL Injection : Platform-Level Defenses - Using Runtime Protection (part 1) - Web Application Firewalls

SQL Injection : Code-Level Defenses - Designing to Avoid the Dangers of SQL Injection

SQL Injection : Code-Level Defenses - Canonicalization

SQL Injection : Code-Level Defenses - Encoding Output

Upgrading and Converting to Access 2010 : INSTALLING MULTIPLE VERSIONS OF ACCESS ON ONE PC, CHANGING FILE FORMATS