Going Dynamic: Using EXECUTE
The solution to all of
the static SQL problems is, of course, to go dynamic. Building dynamic
SQL inside of a stored procedure is simple, the code is relatively easy
to understand, and as I'll show, it can provide excellent performance.
However, there are various potential issues to note, not the least of
which being security concerns.
The real benefit of
dynamic SQL is that the execution plans generated for each invocation of
the query will be optimized for only the predicates that are actually
being used at that moment. The main issue with the static SQL solutions,
aside from maintainability, was that the additional predicates confused
the query optimizer, causing it to create inefficient plans. Dynamic
SQL gets around this issue by not including anything extra in the query.
The simplest way to implement dynamic SQL in a stored procedure is with the EXECUTE
statement. This statement takes as input a string, and executes
whatever SQL the string contains. The following batch shows this in its
simplest—and least effective—form:
EXEC('SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee')
Note that in this example , I use the truncated form of EXECUTE.
This seems to be a de facto standard for SQL Server code; I very rarely
see code that uses the full form with the added "UTE." Although this is
only a savings of three characters, I am very used to seeing it, and
for some reason it makes a lot more sense to me when reading SQL than
seeing the full EXECUTE keyword.
In this case, a string literal is passed to EXECUTE, and this doesn't really allow for anything very "dynamic." For instance, to add a predicate on EmployeeId to the query, the following would not work:
DECLARE @EmployeeId INT
SET @EmployeeId = 1
EXEC('SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE EmployeeId = ' + CONVERT(VARCHAR, @EmployeeId))
This fails (with an "incorrect syntax" exception) because of the way EXECUTE
is parsed by the SQL Server engine. SQL Server does only one pass to
parse the syntax, and then tries to concatenate and execute the SQL in a
second step. But due to the fact that the first step does not include a
stage for inline expansion, the CONVERT is still a CONVERT, rather than a literal, when it's time for concatenation.
The solution to this issue is quite simple. Define a variable and assign the dynamic SQL to it, and then call EXECUTE:
DECLARE @EmployeeId INT
SET @EmployeeId = 1
DECLARE @sql NVARCHAR(MAX)
SET @sql = 'SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE EmployeeId = ' + CONVERT(VARCHAR, @EmployeeId)
EXECUTE(@sql)
The string variable, @sql,
can be manipulated in any way in order to form the desired dynamic SQL
string, and since it's a variable, various code paths can be created
using control-of-flow statements. In other words, forming the dynamic
SQL is now limited only by the tools available within the T-SQL language
for string manipulation.
A first shot at optional inclusion of both the EmployeeId and NationalIdNumber predicates follows:
DECLARE @EmployeeId INT
SET @EmployeeId = 1
DECLARE @NationalIdNumber NVARCHAR(15)
SET @NationalIdNumber = N'14417807'
DECLARE @sql NVARCHAR(MAX)
SET @sql = 'SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee '
IF (@EmployeeId IS NOT NULL
AND @NationalIdNumber IS NOT NULL)
BEGIN
SET @sql = @sql +
'WHERE EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId) +
' AND NationalIdNumber = N''' + @NationalIdNumber + ''''
END
ELSE IF (@EmployeeId IS NOT NULL)
BEGIN
SET @sql = @sql +
'WHERE EmployeeId = ' +
CONVERT(NVARCHAR, @EmployeeId)
END
ELSE IF (@NationalIdNumber IS NOT NULL)
BEGIN
SET @sql = @sql +
'WHERE NationalIdNumber = N''' + @NationalIdNumber + ''''
END
EXEC(@sql)
If this looks sickeningly
familiar, you've been doing a good job of paying attention; this example has the same maintenance issues as the
first shot at a static SQL stored procedure. Adding additional
parameters will create a combinatorial explosion, making this solution
completely unmaintainable. In addition, the SQL statement has been
broken up into two component parts, making it lack a good sense of flow.
Think about how bad this might get if you had to add ORDER BY or GROUP BY clauses.
To solve this problem, I like to concatenate my dynamic SQL in one shot, using CASE
expressions instead of control-of-flow statements in order to
optionally concatenate sections. The following example should serve to
illustrate how this works:
DECLARE @EmployeeId INT
SET @EmployeeId = 1
DECLARE @NationalIdNumber NVARCHAR(15)
SET @NationalIdNumber = N'14417807'
DECLARE @sql NVARCHAR(MAX)
SET @sql = 'SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE 1=1' +
CASE
WHEN @EmployeeId IS NULL THEN ''
ELSE 'AND EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId)
END +
CASE
WHEN @NationalIdNumber IS NULL THEN ''
ELSE 'AND NationalIdNumber = N''' + @NationalIdNumber + ''''
END
EXEC(@sql)
In this example, the CASE expressions concatenate an empty string if one of the parameters is NULL. Otherwise, the parameter is formatted as a string and concatenated to the predicate.
Thanks to the CASE
expressions, the code is much more compact, and the query is still
generally formatted like a query instead of like procedural code. But
the real trick here is the addition of 1=1 to the WHERE clause, in order to avoid the combinatorial explosion problem. The query optimizer will "optimize out" (i.e., discard) 1=1 in a WHERE clause, so it has no effect on the resultant query plan. What it does do is allow the optional predicates to use AND
without having to be aware of whether other optional predicates are
being concatenated. Each predicate can therefore be listed only once in
the code, and combinations are not a problem.
The final maintainability
issue with this code is one of formatting, and this is an area that I
feel is extremely important when working with dynamic SQL. Careful,
consistent formatting can mean the difference between quick one-minute
changes to stored procedures, instead of several hours of trying to
decipher messy code.
To see the problem with the way the code is currently formatted, add PRINT @sql to the end of the batch, to see the final string:
SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE 1=1AND EmployeeId = 1AND NationalIdNumber = N'14417807'
Although this SQL is valid
and executes as-is without exception, it has the potential for problems
due to the lack of spacing between the predicates. Debugging spacing
issues in dynamic SQL can be maddening, so I have developed a formatting
standard that works for me to combat the issue. When I am working with
dynamic SQL, I concatenate every line separately, ensuring that each
line is terminated with a space. This adds a bit more complexity to the
code, but I've found that it makes it much easier to debug. Following is
an example of how I like to format my dynamic SQL:
DECLARE @EmployeeId INT
SET @EmployeeId = 1
DECLARE @NationalIdNumber NVARCHAR(15)
SET @NationalIdNumber = N'14417807'
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT ' +
'ContactId, ' +
'LoginId, ' +
'Title ' +
'FROM HumanResources.Employee ' +
'WHERE 1=1 ' +
CASE
WHEN @EmployeeId IS NULL THEN ''
ELSE 'AND EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId) + ' '
END +
CASE
WHEN @NationalIdNumber IS NULL THEN ''
ELSE 'AND NationalIdNumber = N''' + @NationalIdNumber + ''' '
END
EXEC(@sql)
NOTE
I developed this
style when working with older versions of SQL Server, which did not have
the MAX data types and therefore had stringent variable size
limitations. Cutting everything up into individual tokens greatly
reduced the amount of white space, meaning that I could fit a lot more
code in each variable. Removal of extraneous white space is not
necessary in SQL Server 2005, but I still feel that this technique is
great for ensuring proper spacing.
Now that the code fragment is properly formatted, it can be transferred into a new version of the GetEmployeeData stored procedure:
CREATE PROCEDURE GetEmployeeData
@EmployeeId INT = NULL,
@NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
SET NOCOUNT ON
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT ' +
'ContactId, ' +
'LoginId, ' +
'Title ' +
'FROM HumanResources.Employee ' +
'WHERE 1=1 ' +
CASE
WHEN @EmployeeId IS NULL THEN ''
ELSE 'AND EmployeeId = ' + CONVERT(NVARCHAR, @EmployeeId) + ' '
END +
CASE
WHEN @NationalIdNumber IS NULL THEN ''
ELSE 'AND NationalIdNumber = N''' + @NationalIdNumber + ''' '
END
EXEC(@sql)
END
So that's it—a dynamic
stored procedure with optional parameters. At first glance, this might
look like a great solution, but it is still fraught with problems.
From a
performance point of view, this procedure appears to be great when taken
for a few test runs. Each set of input parameters produces the same
execution plan as the baseline examples, with the same estimated costs
and number of reads. However, under the covers, a major issue still
exists: parameterization is not occurring. To illustrate this, start
with the following T-SQL, which clears the query plan cache and then
runs the procedure with the same optional parameter, for three different
input values:
DBCC FREEPROCCACHE
GO
EXEC GetEmployeeData
@EmployeeId = 1
GO
EXEC GetEmployeeData
@EmployeeId = 2
GO
EXEC GetEmployeeData
@EmployeeId = 3
GO
Now, query the sys.dm_exec_cached_plans view, and you will see the output shown in Figure 5.
In this image, notice that there is one cached plan for the procedure
itself—which is expected for any stored procedure—and an additional ad
hoc plan cached for each invocation of the stored procedure. This means
that every time a new argument is passed, a compilation occurs, which is
clearly going to kill performance.