The word obfuscation is defined by the American Heritage Dictionary as follows:
"To make so confused or opaque as to be difficult to perceive or understand ... to render indistinct or dim; darken."
The word obfuscation, at times, can be used interchangeably with the term obscurity, meaning "the quality or condition of being unknown".
However, there is a subtle difference between the two terms and the
former definition is more appropriate since obscurity implies that the
hidden condition can be achieved without any additional effort.
Many methods of "disguise", or obfuscation, are
available to the Database Administrator that can contribute a level of
control to how sensitive data is stored and disclosed, in both
production and development environments.
Many of these methods rely on SQL Server's built-in system functions for string manipulation, such as SUBSTRING, REPLACE, and REPLICATE.
Prior to diving into the details of these
obfuscation methods we need to explore the unique value of another
system function, called RAND.
The Value of RAND
The RAND system function is not one that
directly manipulates values for the benefit of obfuscation, but its
ability to produce a reasonably random value makes it a valuable asset
when implementing character scrambling or producing a numeric variance.
One special consideration of the RAND
system function is that when it is included in a user defined function
an error will be returned when the user defined function is created.
This can be overcome by creating a view that contains the RAND system function and referencing the view in the user defined function. The script in Listing 1 will create a view in the HomeLending database that returns a random value, using the RAND system function. Since this view holds no security threat, we will make this available to the Sensitive_high, Sensitive_medium and Sensitive_low database roles with SELECT permissions on this view.
Now, we can obtain a random number in any user defined function with a simple call to our new view. In Listing 2, an example is provided that produces a random number between the values of 1 and 100.
Character Scrambling
Character scrambling is a process by which the
characters contained within a given statement are re-ordered in such a
way that its original value is obfuscated. For example, the name "Jane
Smith" might be scrambled into "nSem Jatih".
This option does have its vulnerabilities. The
process of cracking a scrambled word is often quite straightforward,
and indeed is a source of entertainment for many, as evidenced by
newspapers, puzzle publications and pre-movie entertainment.
Cracking a scrambled word can be made more
challenging by, for example, eliminating any repeating characters and
returning only lower case letters. However, not all values will contain
repeating values, so this technique may not be sufficient for
protecting highly sensitive data.
The Character Scrambling UDF
In the HomeLending database we will create a user defined function called Character_Scramble that performs character scrambling, which is shown in Listing 3. It will be referenced as needed in views and stored procedures that are selected to use this method of data obfuscation.
Included in this user defined function is a reference to the vwRandom view that was created in Listing 1. In essence, this user defined function will loop through all of the characters of the value that is passed through the @OrigVal
argument replacing each character with other randomly selected
characters from the same string. For example, the value of "John" may
result as "nJho".
In order for the appropriate users to utilize this user defined function permissions must be assigned. The GRANT EXECUTE command is included in the following script.
This user defined function takes advantage of system functions such as DATALENGTH which provides the length of a value, SUBSTRING which is used to obtain a portion of a value, REPLACE which replaces a value with another value and LOWER which returns the value in lowercase characters. All are valuable to string manipulation.
This UDF will be referenced in any views
and stored procedures that are selected to use this method of data
obfuscation, an example of which we'll see in the next section.