Changes
to the keys in your tables should, hopefully, be rare, but they can
cause trouble to the unwary when they happen. In a broader context, the
defensive programmer should always fully document and test any
assumptions about the underlying uniqueness of the column data. The following examples demonstrate what can
happen to perfectly correct code when changes are made to the
underlying unique or primary keys, thus invalidating assumptions in the
code regarding the uniqueness of the column data. We'll then discuss
how a query against the system views, or use of @@ROWCOUNT, can detect if such assumptions are still true.
In Listing 1, we create a table, Customers, using a UNIQUE constraint to guarantee the uniqueness of phone numbers, and then populate it with some test data.
We need to implement a simple stored procedure, shown in Listing 2,
which will allow users to find a customer based on their phone number,
and set their customer status (regular, preferred, or VIP). If no
customer exists for a given phone number, we don't need to raise an
exception; we simply do nothing.
This implementation assumes
that at most one customer has any given phone number. Clearly, right
now, this assumption is true as it is guaranteed by the UNQ_Customers constraint.
Suppose, however, that at some
later time we need to store data about customers from different
countries. At this point, the phone number alone no longer uniquely
indentifies a customer, but the combination of country code and phone
number does. In order to accommodate this requirement, our Customers table is altered to add the new column, CountryCode, and our UNQ_Customers constraint is modified so that it enforces uniqueness based on a combination of the CountryCode and PhoneNumber columns. These alterations are shown in Listing 3.
Note that, in reality, we should have added a lookup table, dbo.CountryCodes, referred to by a FOREIGN KEY constraint. However, I've avoided a lookup table in this case, in favor of keeping the example simple.
At this point, our constraint is no longer enforcing the uniqueness of values in the PhoneNumber
column, so we can insert a customer with an identical phone number to
an existing customer, but with a different country code, as shown in Listing 4.
Our Stored procedure, however, is still working on the assumption that a customer can be uniquely
identified by their phone number alone. Since this assumption is no
longer valid, the stored procedure, in its current form, could
erroneously update more than one row of data, as demonstrated in Listing 5.
Perhaps the most prevalent and damaging mistake made during the
development of SQL code is a failure to define or recognize the
assumptions on which the implementation relies. The result, as
demonstrated here, is code that is brittle, and liable to behave
unpredictably when these assumptions are invalidated by changes to the
underlying database objects.
Of course, the most
obvious lesson to be learned here is that whenever we change our unique
and/or primary keys, we need to review all the procedures that depend
on the modified tables. However, the manual process of reviewing the
potentially affected code is, like all manual processes, slow and prone
to error. It may be more efficient to automate the process of
identifying the modules that rely on particular assumptions about the
underlying schema. Unit tests allow us to accomplish exactly that; we
can easily, for example, write a unit test that succeeds if there is a UNIQUE constraint on the PhoneNumber column alone, and fails when this is no longer the case.
Using unit tests to document and test assumptions
Let's translate the assumption that the PhoneNumber
column uniquely identifies a customer into a query against the system
views. The query is rather complex, so we'll develop it in several
steps. First of all, we need to know if there are any constraints on
the PhoneNumber column, as shown in Listing 6.
This query returns 1,
confirming that there is a constraint on that column. Next, we need to
verify that the constraint is either a primary key or a unique
constraint:
Finally, we need to make sure that no other columns are included in that UNIQUE or PRIMARY KEY constraint, as follows:
When we run this query against the original database schema, with a UNIQUE constraint on the PhoneNumber column, it returns a value of 1 indicating that there is indeed a constraint built only on the PhoneNumber column. However, when we run it after the column CountryCode has been added to the definition of the unique constraint, the second subquery returns the value 2, which means that the UNIQUE constraint UNQ_Customers is built on two columns, and so the outer query returns a value of 0.
In short, this query provides us with a means to verify the validity of the assumption that the PhoneNumber column uniquely identifies a customer. By incorporating this query into our unit test harness, we can accomplish two goals:
our assumption is documented – the code in Listing 8 clearly documents the fact that the dbo.SetCustomerStatus stored procedure needs a unique or primary constraint on a single column, PhoneNumber
our assumption is tested
– if the required constraint is dropped, or includes more than one
column, we shall get a clear warning, because the unit test will fail.
Of course, we should wrap
this query in a stored procedure and reuse it, because there will
surely be other cases when we rely on the uniqueness of a column used
in our search condition.
We can use a similar
technique to verify whether or not a combination of columns, considered
together, are guaranteed to be unique. Implementing this query is left
as an exercise to the reader.
Using @@ROWCOUNT to verify assumptions
Alternatively, instead
of documenting our assumption as a unit test, we can have our stored
procedure detect how many rows it modified, and roll back if it updated
more than one row, as shown in Listing 9.
To see it in action, run Listing 10; the stored procedure raises an error and does not modify the data.
In general, this approach
could be useful although, in this particular case, it is less
preferable than a unit test. The reason is very simple: a unit test
will alert us about a problem before deployment,
allowing us fix the problem early, and to deploy without this
particular bug. The altered stored procedure might not indicate a
problem until the code has been deployed to production, which means
troubleshooting a production system and redeploying a fix; a situation
we usually want to avoid.
Using SET instead of SELECT when assigning variables
In the previous chapter, we discussed how important it is to understand the different behavior of SET and SELECT
when assigning values to variables. That same knowledge will help you
write application code that is resistant to changes to the underlying
schema objects.
Let's consider a
second example whereby a search condition contains an implied
assumption regarding the uniqueness of the underlying data column. The
search condition, and subsequent variable assignment, shown in Listing 11 assumes, again, that the PhoneNumber column can uniquely identify a customer.
In our original database schema, before we added CountryCode column to the Customers table, the result of this assignment was predictable. However, in our new schema, the UNQ_Customers constraint only guarantees the uniqueness of the values in the PhoneNumber and CountryCode
columns, considered together. As a result, we have two customers with
this phone number and so the variable assignment is unpredictable; we
do not, and cannot, know which of the two CustomerId values, 1 or 3, will populate the variable.
In most cases, such ambiguity is not acceptable. The simplest fix is to use SET instead of SELECT to populate the variable, as shown in Listing 12.