sp_executesql: A Better EXECUTE
In the previous sections, I identified two major problems with building dynamic SQL statements and executing them using EXECUTE:
First of all, there is the issue of extraneous compilation and query
plan caching, which makes performance drag and uses up valuable system
resources. Second, and perhaps more important, is the threat of SQL
injection attacks.
Query parameterization, is the key to fixing both of these problems.
Parameterization is a way to build a query such that any parameters are
passed as strongly typed variables, rather than formatted as strings and
appended to the query. In addition to the performance benefits this can
bring by allowing SQL Server to do less work when processing the query,
parameterization also has the benefit of virtually eliminating SQL
injection attacks.
The first step in
parameterizing a query is to replace literals with variable names. For
instance, the injection-vulnerable query from the previous section could
be rewritten in a parameterized manner as follows (I've removed the
stored procedure creation code for simplicity):
DECLARE @String NVARCHAR(60)
SET @String = 'Stone'
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT AddressId ' +
'FROM Person.Address ' +
'WHERE AddressLine1 LIKE ''%'' + @String + ''%'''
The only thing that has
changed about this query compared to the version in the last section is
two additional single quotes added such that the literal value of @String is no longer concatenated with the rest of the query. Previously, the literal value of @sql after concatenation would have been as follows:
SELECT AddressId FROM Person.Address WHERE AddressLine1 LIKE '%Stone%'
As a result of this change, the literal value after concatenation is now the following:
SELECT AddressId FROM Person.Address WHERE AddressLine1 LIKE '%' + @Stone + '%'
Due to the fact that no
variables are getting formatted into strings for concatenation with the
rest of the query, the type of SQL injection described in the previous
section is impossible in this scenario. The only thing such an attempt
would yield for a hacker is a search in which no results are returned!
Trying to execute this SQL using EXECUTE results in the following exception:
Msg 137, Level 15, State 2, Line 1
Must declare the scalar variable "@String".
The reason for this is that EXECUTE runs the SQL in a different context than that in which it was created. In the other context, the variable @String has not been declared and is therefore unknown.
The solution to this problem is to use the sp_executesql
system stored procedure, which allows you to pass parameters to dynamic
SQL, much as you can to a stored procedure. The parameters for sp_executesql are a Unicode (NVARCHAR or NCHAR)
string containing a dynamic SQL batch, a second Unicode string that
defines the data types of the variables referenced in the dynamic SQL,
and a list of values or variables from the calling scope that correspond
to the variables defined in the data type list. The following T-SQL
shows how to execute the Person.Address query using sp_executesql:
DECLARE @String NVARCHAR(60)
SET @String = 'Stone'
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT AddressId ' +
'FROM Person.Address ' +
'WHERE AddressLine1 LIKE ''%'' + @String + ''%'''
EXEC sp_executesql
@sql,
N'@String NVARCHAR(60)',
@String
Running this batch will produce the same results as calling FindAddressByString and passing the string "Stone". The parameters to sp_executesql serve to map the @String
variable from the outer scope, into the new scope spawned when the
dynamic SQL is executed—without having to concatenate the literal value
of the variable.
For an example that uses multiple parameters, consider again the GetEmployeeData stored procedure, now rewritten to use sp_executesql instead of EXECUTE:
CREATE PROCEDURE GetEmployeeData
@EmployeeId INT = NULL,
@NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
SET NOCOUNT ON
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT ' +
'ContactId, ' +
'LoginId, ' +
'Title ' +
'FROM HumanResources.Employee ' +
'WHERE 1=1 ' +
CASE
WHEN @EmployeeId IS NULL THEN ''
ELSE 'AND EmployeeId = @EmployeeId '
END +
CASE
WHEN @NationalIdNumber IS NULL THEN ''
ELSE 'AND NationalIdNumber = @NationalIdNumber '
END
EXEC sp_executesql
@sql,
N'@EmployeeId INT, @NationalIdNumber NVARCHAR(60)',
@EmployeeId,
@NationalIdNumber
END
For multiple parameters,
simply comma-delimit their data type definitions in the second
parameter, and then pass as many outer parameters as necessary to define
every variable listed in the second parameter. Note that you can use a
string variable for the second parameter, which might make sense if you
are defining a long list—but I usually keep the list in a string literal
so that I can easily match the definitions with the variables passed in
from the outer scope.
Another important thing to
note here is that even though both parameters are optional, they will
both get passed to the query every time it is executed. This is
perfectly okay! There is very little overhead in passing parameters into
sp_executesql, and trying to work
around this issue would either bring back the combinatorial explosion
problem or require some very creative use of nested dynamic SQL. Neither
solution is maintainable or worth the time required, so save your
energy for more interesting pursuits.
Performance Comparison
As a first step in evaluating the relative performance of the sp_executesql solution against other solutions mentioned, you can verify that sp_executesql really is reusing query plans as expected. To verify, run the same code that was used to show that the EXECUTE method was not reusing plans:
DBCC FREEPROCCACHE
GO
EXEC GetEmployeeData
@EmployeeId = 1
GO
EXEC GetEmployeeData
@EmployeeId = 2
GO
EXEC GetEmployeeData
@EmployeeId = 3
GO
After running this code, query the sys.dm_exec_cached_plans view as before. The results should be similar to those shown in Figure 9. One plan is cached for the procedure itself, and one for the invocation of the dynamic query with the @EmployeeId
parameter. Invoking the query with a different combination of
parameters will result in creation of more cached plans, because the
resultant query text will be different. However, the maximum number of
plans that can be cached for the stored procedure is five: one for the
procedure itself and one for each possible combination of parameters.
To further
validate performance, open up the SQLQueryStress tool for some simple
load testing. But before configuring SQLQueryStress, create a renamed
version of the best performing (but worst for maintenance) static SQL
version of the stored procedure. Call it GetEmployeeData_Static:
CREATE PROCEDURE GetEmployeeData_Static
@EmployeeId INT = NULL,
@NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
SET NOCOUNT ON
IF (@EmployeeId IS NOT NULL
AND @NationalIdNumber IS NOT NULL)
BEGIN
SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE
EmployeeId = @EmployeeId
AND NationalIdNumber = @NationalIdNumber
END
ELSE IF (@EmployeeId IS NOT NULL)
BEGIN
SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE
EmployeeId = @EmployeeId
END
ELSE IF (@NationalIdNumber IS NOT NULL)
BEGIN
SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
WHERE
NationalIdNumber = @NationalIdNumber
END
ELSE
BEGIN
SELECT
ContactId,
LoginId,
Title
FROM HumanResources.Employee
END
END
This version produces the
best possible query plans, but of course has the issue of being
impossible to maintain. It also has no additional overhead associated
with context switching, which may make it slightly faster than a dynamic
SQL solution if the queries are very simple. For more complex queries
that take longer, any context switching overhead will be overshadowed by
the actual runtime of the query.
Once the stored procedure is created, enter the following into the SQLQueryStress Query textbox:
EXEC GetEmployeeData_Static
@EmployeeId,
@NationalIdNumber
Next, click the
Database button and configure your database connection to use
AdventureWorks as the default database. Once finished, click the
Parameter Substitution button and enter the following T-SQL into the
Parameter Query textbox:
SELECT EmployeeId, NationalIdNumber
FROM HumanResources.Employee
UNION ALL
SELECT NULL, NationalIdNumber
FROM HumanResources.Employee
UNION ALL
SELECT NULL, NationalIdNumber
FROM HumanResources.Employee
UNION ALL
SELECT NULL, NULL
This code selects one row
for every possible combination of input parameters to the stored
procedure, which will allow testing of every type of invocation. Click
the Get Columns button, and map the EmployeeId column to @EmployeeId, and the NationalIdNumber column to @NationalIdNumber. Then, click OK to go back to the main SQLQueryStress screen.
To keep the test
simple, configure the tool to use five threads and to do 8,710
iterations. Since there are 871 rows returned by the parameter
substitution query, this means that each thread will run through every
possible combination ten times. These numbers are somewhat arbitrary,
but the point is to simply validate the solutions against each other
with a consistent test, not to completely stress them until they break.
The results of the test as run on my system are shown in Figure 10.
Next, create a renamed version of the EXECUTE solution, called GetEmployeeData_Execute:
CREATE PROCEDURE GetEmployeeData_Execute
@EmployeeId INT = NULL,
@NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
SET NOCOUNT ON
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT ' +
'ContactId, ' +
'LoginId, ' +
'Title ' +
'FROM HumanResources.Employee ' +
'WHERE 1=1 ' +
CASE
WHEN @EmployeeId IS NULL THEN ''
ELSE
'AND EmployeeId = ' +
CONVERT(NVARCHAR, @EmployeeId) + ' '
END +
CASE
WHEN @NationalIdNumber IS NULL THEN ''
ELSE
'AND NationalIdNumber = N''' +
@NationalIdNumber + ''' '
END
EXEC(@sql)
END
Testing this stored procedure against the static solution and, later, the sp_executesql
solution will create a nice means by which to compare static SQL
against both parameterized and nonparameterized dynamic SQL, and will
show the effects of parameterization on performance. Back in
SQLQueryStress, change the text in the Query textbox to the following:
EXEC GetEmployeeData_Execute
@EmployeeId,
@NationalIdNumber
Once finished, click the
GO button to begin the test; do not reconfigure the parameter
substitution, number of threads, or number of iterations, in order to
keep things consistent. The results of the test as run on my system are
shown in Figure 11.
The final stored procedure to test is, of course, the sp_executesql solution. Once again, create a renamed version of the stored procedure in order to differentiate it. This time, call it GetEmployeeData_sp_executesql:
CREATE PROCEDURE GetEmployeeData_sp_executesql
@EmployeeId INT = NULL,
@NationalIdNumber NVARCHAR(15) = NULL
AS
BEGIN
SET NOCOUNT ON
DECLARE @sql NVARCHAR(MAX)
SET @sql = '' +
'SELECT ' +
'ContactId, ' +
'LoginId, ' +
'Title ' +
'FROM HumanResources.Employee ' +
'WHERE 1=1 ' +
CASE
WHEN @EmployeeId IS NULL THEN ''
ELSE 'AND EmployeeId = @EmployeeId '
END +
CASE
WHEN @NationalIdNumber IS NULL THEN ''
ELSE 'AND NationalIdNumber = @NationalIdNumber '
END
EXEC sp_executesql
@sql,
N'@EmployeeId INT, @NationalIdNumber NVARCHAR(60)',
@EmployeeId,
@NationalIdNumber
END
In SQLQueryStress, change the text of the Query textbox to the following:
EXEC GetEmployeeData_sp_executesql
@EmployeeId,
@NationalIdNumber
When finished, click GO to begin the test. Figure 12 shows the results from my system.
Interestingly, the results between the two dynamic SQL solutions are very close together, with the sp_executesql solution beating the EXECUTE
solution by only just over a second, even given the benefits of
parameterization for performance. Runs with a lower number of iterations
or against stored procedures that are more expensive for SQL Server to
compile will highlight the benefits more clearly.
The static SQL version, in
this case, clearly wins from a performance point of view (although all
three are extremely fast). Again, more complex stored procedures with
longer run times will naturally overshadow the difference between the
dynamic SQL and static SQL solutions, leaving the dynamic SQL vs. static
SQL question purely one of maintenance.
NOTE
When running these tests
on my system, I restarted my SQL Server service between each run in
order to ensure absolute consistency. Although this may be overkill for
this case, you may find it interesting to experiment on your end with
how restarting the service affects performance. This kind of test can
also be useful for general scalability testing, especially in clustered
environments. Restarting the service before testing is a technique that
you can use to find out how the application will behave if a failover
occurs, without having to have a clustered testing environment.
Output Parameters
Although it is somewhat of an aside to this discussion, I would like to point out one other feature that sp_executesql brings to the table compared with EXECUTE, which is often overlooked by users who are just getting started using it. sp_executesql allows you to pass parameters to dynamic SQL just like to a stored procedure—and this includes output parameters.
Output parameters
become quite useful when you need to use the output of a dynamic SQL
statement that perhaps only returns a single scalar value. An output
parameter is a much cleaner solution than having to insert the value
into a table and then read it back into a variable.
To define an output parameter, simply append the OUTPUT
keyword in both the parameter definition list and the parameter list
itself. The following T-SQL shows how to use an output parameter with sp_executesql:
DECLARE @SomeVariable INT
EXEC sp_executesql
N'SET @SomeVariable = 123',
N'@SomeVariable INT OUTPUT',
@SomeVariable OUTPUT
As a result of this T-SQL, the @SomeVariable variable will have a value of 123.
Since this is an especially contrived example, I will add that in practice I often use output parameters with sp_executesql
in search stored procedures with optional parameters. A common user
interface requirement is to return the number of total rows found by the
selected search criteria, and an output parameter is a quick way to get
the data back to the caller.