DATABASE

Protecting SQL Server Data : Obfuscation Methods (part 4) - Truncation,Encoding, Aggregation

- How To Install Windows Server 2012 On VirtualBox

- How To Bypass Torrent Connection Blocking By Your ISP

- How To Install Actual Facebook App On Kindle Fire

6/21/2014 9:48:36 PM

Truncation

Truncation is a method of protecting sensitive data where a portion of its value is removed. The concept is very similar to the repeating character masking covered earlier except that rather than replacing values with a "mask", such as an "x" or "*", truncating simply discards those values. For example, a Social Security Number, "555-86-1234", that is stored in plain text might be truncated to the value of "1234".

One way to apply this method is to permanently modify the stored value in the database by executing an `UPDATE` statement using the `LEFT`, `RIGHT` or `SUBSTRING` system function to define the remaining portion of the value.

For example, the script in Listing 9 uses the `LEFT` function to truncate all but the last four digits from the `Identification_Value` column.

Listing 9. Permanently truncating the `Identification_Value` column.

Alternatively, in order to maintain the original value but perform the truncation for viewing, we can simply reference the column in views and stored procedures that use the `LEFT`, `RIGHT` or `SUBSTRING` system functions to define the remaining portion of the value. For example, Listing 10 returns only the last four digits of the values of the `Identification_Value` column.

Listing 10. Returning a truncated value.

Encoding

Encoding is a technique in which a series of characters is used to represent another value. This technique can be used to camouflage sensitive data, since the code used has no meaning outside the system in which the code is defined.

NOTE

There are many benefits to encoding, beyond securing sensitive data, such as overcoming language barriers when working in an international environment and providing an expedient means of entering data.

Encoding is a practice that is found in abundance in the health care industry. The World Health Organization maintains the International Classification of Diseases (ICD), which is an industry standard that defines codes that represent diseases and health problems. These codes are used in health records and death certificates. For example, the ICD code for bacterial pneumonia is J15.9.

In the establishment of foreign keys in the `HomeLending` database we have, at a basic level, implemented encoding. The `Loan` table, for example, contains two columns that are named `Purpose_Type_ID` and `Mortgage_Type_ID` as illustrated in Figure 1.

Figure 1. Loan table with Foreign Key Relationships

The `Purpose_Type_ID` and `Mortgage_Type_ID` columns are foreign keys to the `Purpose_Type` table and `Mortgage_Type` table. These tables contain, respectively, the list of potential purposes for a loan and the types of available mortgage, as defined in the Uniform Residential Loan Application, developed by the Federal National Mortgage Association, commonly known as Fannie Mae.

In the `Purpose_Type` table, we have used a sequence of numbers to indicate these purposes. So, for example, when a new loan record is created, the value of "2" is captured instead of the value "Refinance".

To further enhance this encoding, we may choose to either utilize a higher starting number in our sequence, such as "5,000", so that the options can be organized into logical groups. For example, we may have various types of refinance options for our borrowers. Through a higher starting number we could use the value range of 5,000 through 5,100 to represent the available refinance options, while construction loans might be found in the 2,000 through 2,100 range.

Aggregation

Aggregation is a technique in which identifying details of data are obfuscated through its provision in a summarized format. A few examples of presenting data as an aggregation are as follows:

As an average: 40% of the loans originated in the `HomeLending` database during the past quarter were refinance loans.
As a calculated sum: $2.5 million in loans were originated in the `HomeLending` during the past quarter.
As a geographical statistic: The median home value in the city of Indianapolis, Indiana is $150,000.

Aggregating is a common technique used to populate data warehouses for data analysis. This not only protects the underlying sensitive data, but also reduces the storage requirements for the data.

An advantage of this approach is that the data that is provided to the user is only that which they need for their reporting and analysis requirements, so the potential for the leakage of sensitive data is greatly reduced.

A disadvantage to this approach is that if the aggregations are determined to be inaccurate, the detail data is not available to identify the cause. Another challenge to this approach is that a given aggregation may not meet everyone's needs, resulting in requests for different views of the same aggregated data, which increases your maintenance footprint.

Within the `HomeLending` database, aggregation may be beneficial in the collection of the borrower's liabilities. The current design of the `Borrower_Liability` table requires the capture of the monthly payment amount and remaining balance. As shown in Figure 2, the `Borrower_Liability` table is related to the `Liability_Account` table, which reveals the creditor and account number of the liability.

Figure 2. The Borrower_Liability and Liability_Account Tables.

An alternate approach would be to dispose of the `Liability_Account` table and simply capture a single record for the loan application, indicating the sum of their monthly payments and remaining balances for all liabilities, as shown in Figure 3.

Figure 3. Alternate approach for Borrower_Liability table.

This level of detail would suffice for most users of this database and would protect this sensitive information from being inappropriately disclosed. The Underwriters, who may need access to the detailed liability data for qualification purposes, would refer to the credit report data, which is stored in a separate database, to determine whether or not the borrower can be approved for the loan.

Related

Protecting SQL Server Data : Obfuscation Methods (part 3) - Numeric Variance,Nulling

Protecting SQL Server Data : Obfuscation Methods (part 2) - Repeating Character Masking

Protecting SQL Server Data : Obfuscation Methods (part 1) - Character Scrambling

Other

SQL Server 2012 : Managing Resources - Limiting Resource Use, Leveraging Data Compression

SQL Server 2012 : Tuning Queries (part 3) - Using the Database Engine Tuning Advisor

SQL Server 2012 : Tuning Queries (part 2) - Gathering Query Information with Extended Events

SQL Server 2012 : Tuning Queries (part 1) - Understanding Execution Plans

SQL Server :ONE-WAY ENCRYPTION - Creating the Interface (part 2) - Setting and Verifying Permissions to the Stored Procedures

SQL Server :ONE-WAY ENCRYPTION - Creating the Interface (part 1) - Creating the View, Creating the Stored Procedures

SQL Server : Implementing One-Way Encryption (part 1) - Populate the Hash Columns, Verify the Implementation, Drop the Unencrypted Column

SQL Server : Implementing One-Way Encryption (part 1) - Create the Primary Hash Column,Create a Secondary Hash Column for Searching

Sql Server 2012 : Hierarchical Data and the Relational Database - Hierarchical Table Indexing Strategies

Sql Server 2012 : Hierarchical Data and the Relational Database - Populating the Hierarchy (part 3) - The GetAncestor Method

Top 10

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists

- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers

- Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control

- Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control

- Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control

- Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window

- Microsoft Visio 2013 : Using the Organization Chart Wizard with new data

REVIEW

- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)

site
stats