Transact-SQL in SQL Server 2008 : Sparse Columns

8/1/2012 6:10:37 PM

SQL Server 2008 provides a new space-saving storage option referred to as sparse columns. Sparse columns can provide optimized and efficient storage for columns that contain predominately NULL values. The NULL values require no storage space, but these space savings come at a cost of increased space for storing non-NULL values (an additional 2–4 bytes of space is needed for non-NULL values). For this reason, Microsoft recommends using sparse columns only when the space saved is at least 20% to 40%. However, the consensus rule of thumb that is emerging from experience with sparse columns is that it is best to use them only when more than 90% of the values are NULL.

There are a number of restrictions and limitations regarding the use of sparse columns, including the following:

Sparse columns cannot be defined with the ROWGUIDCOL or IDENTITY properties.
Sparse columns cannot be defined with a default value.
Sparse columns cannot be used in a user-defined table type.
Although sparse columns allow up to 30,000 columns per table, the total row size is reduced to 8,018 bytes due to the additional overhead for sparse columns.
If a table has sparse columns, you can’t compress it at either the row or page level.
Columns defined with the geography, geometry, text, ntext, timestamp, image, or user-defined data types cannot be defined as sparse columns.
You can’t define varbinary(max) fields that use FILESTREAM storage as sparse columns.
You can’t define a computed column as sparse, but you can use a sparse column in the calculation of a computed column.
A table cannot have more than 1,024 non-sparse columns.

Column Sets

Column sets provide an alternative way to view and work with all the sparse columns in a table. The sparse columns are aggregated into a single untyped XML column, which simplifies working with many sparse columns in a table. The XML column used for a column set is similar to a calculated column in that it is not physically stored, but unlike calculated columns, it is updateable.

There are some restrictions on column sets:

You cannot add a column set to a table that already has sparse columns.
You can define only one column set per table.
Constraints or default values cannot be defined on a column set.
Computed columns cannot contain column set columns.
A column set cannot be changed; you must delete and re-create the column set. However, sparse columns can be added to the table after a column set has been defined and is automatically included in the column set.
Distributed queries, replication, and Change Data Capture do not support column sets.
A column set cannot be part of any kind of index, including XML indexes, full-text indexes, and indexed views.

Working with Sparse Columns

Querying and manipulation of sparse columns is the same as for regular columns. There’s nothing functionally different about a table that includes sparse columns, except the way the sparse columns are stored. You can still use all the standard INSERT, UPDATE, and DELETE statements on tables with sparse columns just like a table that doesn’t have sparse columns. You can also wrap operations on a table with sparse columns in transactions as usual.

To work with sparse columns, let’s first create a table with sparse columns. Listing 1 creates a version of the Product table in the AdventureWorks2008R2 database and then populates the table with data from the Production.Product table. The Color, Weight, and SellEndDate columns are defined as sparse columns (the source data contains a significant number of NULL values for these columns). These columns are also defined as part of the column set, ProductInfo.

Listing 1. Creating a Table with Sparse Columns

USE AdventureWorks2008R2
GO
CREATE TABLE Product_sparse
(
   ProductID INT NOT NULL PRIMARY KEY,
   ProductName NVARCHAR(50) NOT NULL,
   Color NVARCHAR(15) SPARSE NULL,
   Weight DECIMAL(8,2) SPARSE NULL,
   SellEndDate DATETIME SPARSE NULL,
   ProductInfo XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)
GO
INSERT INTO Product_sparse
(ProductID, ProductName, Color, Weight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO

You can reference the sparse columns in your queries just as you would any type of column:

SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where ProductID < 320
go

productID productName           Color Weight SEllEndDate
--------- --------------------- ------------ ------------- -----------
1         Adjustable Race       NULL         NULL          NULL
2         Bearing Ball          NULL         NULL          NULL
3         BB Ball Bearing       NULL         NULL          NULL
4         Headset Ball Bearings NULL         NULL          NULL
316       Blade                 NULL         NULL          NULL
317       LL Crankarm           Black        NULL          NULL
318       ML Crankarm           Black        NULL          NULL
319       HL Crankarm           Black        NULL          NULL

Note, however, that if you use SELECT * in a query and the table has a column set defined for the sparse columns, the column set is returned as a single XML column instead of the individual columns:

SELECT *
FROM Product_sparse
where ProductID < 320
go

ProductID   ProductName            ProductInfo
----------- ---------------------- ----------------------------------
1           Adjustable Race        NULL
2           Bearing Ball           NULL
3           BB Ball Bearing        NULL
4           Headset Ball Bearings  NULL
316         Blade                  NULL
317         LL Crankarm            <Color>Black</Color>
318         ML Crankarm            <Color>Black</Color>
319         HL Crankarm            <Color>Black</Color>

You need to explicitly list the columns in the SELECT clause to have the result columns returned as relational columns.

When the column set is defined, you can also operate on the column set by using XML operations instead of relational operations. For example, the following code inserts a row into the table by using the column set and specifying a value for Weight as XML:

INSERT Product_sparse(ProductID, ProductName, ProductInfo)
  VALUES(5, 'ValveStem', '<Weight>.12</Weight>')
go

SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go

productID   productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5           ValveStem   NULL  0.12   NULL

Notice that NULL is assumed for any column omitted from the XML value, such as Color and SellEndDate in this example.

When updating a column set using an XML value, you must include values for all the columns in the column set you want to set, including any existing values. Any values not specified in the XML string are set to NULL. For example, the following query sets both Color and Weight where ProductID = 5:

Update Product_sparse
set ProductInfo = '<Color>black</Color><Weight>.20</Weight>'
where productID = 5

SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go

productID   productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5           ValveStem   black 0.20   NULL

Now, if you run another update but only specify a value for Weight in the XML string, the Color column is set to NULL:

Update Product_sparse
set ProductInfo = '<Weight>.10</Weight>'
where productID = 5

SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go

productID   productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5           ValveStem   NULL  0.10   NULL

However, if you reference the sparse columns explicitly in an UPDATE statement, the other values remain unchanged:

Update Product_sparse
set Color = 'silver'
where ProductID = 5

SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go

productID   productName Color  Weight SEllEndDate
----------- ----------- ------ ------ -----------
5           ValveStem   silver 0.10   NULL

Column sets are most useful when you have many sparse columns in a table (for example, hundreds) and operating on them individually is cumbersome. Your client applications may more easily and efficiently generate the appropriate XML string to populate the column set rather than your having to build an UPDATE statement dynamically to determine which of the sparse columns need to be included in the SET clause. Applications might actually see some performance improvement when they select, insert, or update data by using column sets on tables that have lots of columns.

Sparse Columns: Good or Bad?

There is some disagreement in the SQL Server community whether or not sparse columns are appropriate. A number of professionals are of the opinion that any table design that requires sparse columns is a bad design that does not follow good relational design guidelines. Sparse columns, by their nature, are heavily denormalized. On the other hand, many times you have to live in the real world and make the best of a bad database design that you’ve inherited. Sparse columns can help solve performance and storage issues in databases that may have been poorly designed.

Although sparse columns can solve certain kinds of problems with database design, you should never use them as an alternative to proper database and table design. As cool as sparse columns are, they aren’t appropriate for every scenario, particularly when you’re tempted to violate normalization rules to be able to cram more fields into a table.

Other

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 4) - Analysis Services CLR Support: Server-Side ADO MD.NET

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 3) - XMLA at Your Service

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 2) - OLAP Development with ADO MD.NET

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 1) - Management Studio as an MDX Client

Microsoft SQL Server 2008 R2 : Using FILESTREAM Storage (part 2) - Using FILESTREAM Storage for Data Columns

Microsoft SQL Server 2008 R2 : Using FILESTREAM Storage (part 1) - Enabling FILESTREAM Storage

SQL Server 2005 : Using Excel (part 2) - Using PivotTables and Charts in Applications and Web Pages

SQL Server 2005 : Using Excel (part 1) - Working Within Excel

Microsoft SQL Server 2008 R2 : Hierarchyid Data Type (part 2) - Modifying the Hierarchy

Microsoft SQL Server 2008 R2 : Hierarchyid Data Type (part 1) - Creating a Hierarchy, Populating the Hierarchy, Querying the Hierarchy