SQL Server 2008 provides a new space-saving storage option referred to as sparse columns. Sparse columns can provide optimized and efficient storage for columns that contain predominately NULL values. The NULL values require no storage space, but these space savings come at a cost of increased space for storing non-NULL values (an additional 2–4 bytes of space is needed for non-NULL
values). For this reason, Microsoft recommends using sparse columns
only when the space saved is at least 20% to 40%. However, the consensus
rule of thumb that is emerging from experience with sparse columns is
that it is best to use them only when more than 90% of the values are NULL.
There are a number of restrictions and limitations regarding the use of sparse columns, including the following:
Sparse columns cannot be defined with the ROWGUIDCOL or IDENTITY properties.
Sparse columns cannot be defined with a default value.
Sparse columns cannot be used in a user-defined table type.
Although
sparse columns allow up to 30,000 columns per table, the total row size
is reduced to 8,018 bytes due to the additional overhead for sparse
columns.
If a table has sparse columns, you can’t compress it at either the row or page level.
Columns defined with the geography, geometry, text, ntext, timestamp, image, or user-defined data types cannot be defined as sparse columns.
You can’t define varbinary(max) fields that use FILESTREAM storage as sparse columns.
You can’t define a computed column as sparse, but you can use a sparse column in the calculation of a computed column.
A table cannot have more than 1,024 non-sparse columns.
Column Sets
Column sets provide an
alternative way to view and work with all the sparse columns in a table.
The sparse columns are aggregated into a single untyped XML column,
which simplifies working with many sparse columns in a table. The XML
column used for a column set is similar to a calculated column in that
it is not physically stored, but unlike calculated columns, it is
updateable.
There are some restrictions on column sets:
You cannot add a column set to a table that already has sparse columns.
You can define only one column set per table.
Constraints or default values cannot be defined on a column set.
Computed columns cannot contain column set columns.
A
column set cannot be changed; you must delete and re-create the column
set. However, sparse columns can be added to the table after a column
set has been defined and is automatically included in the column set.
Distributed queries, replication, and Change Data Capture do not support column sets.
A column set cannot be part of any kind of index, including XML indexes, full-text indexes, and indexed views.
Working with Sparse Columns
Querying and
manipulation of sparse columns is the same as for regular columns. There’s nothing
functionally different about a table that includes sparse columns,
except the way the sparse columns are stored. You can still use all the
standard INSERT, UPDATE, and DELETE
statements on tables with sparse columns just like a table that doesn’t
have sparse columns. You can also wrap operations on a table with
sparse columns in transactions as usual.
To work with sparse columns, let’s first create a table with sparse columns. Listing 1 creates a version of the Product table in the AdventureWorks2008R2 database and then populates the table with data from the Production.Product table. The Color, Weight, and SellEndDate columns are defined as sparse columns (the source data contains a significant number of NULL values for these columns). These columns are also defined as part of the column set, ProductInfo.
Listing 1. Creating a Table with Sparse Columns
USE AdventureWorks2008R2
GO
CREATE TABLE Product_sparse
(
ProductID INT NOT NULL PRIMARY KEY,
ProductName NVARCHAR(50) NOT NULL,
Color NVARCHAR(15) SPARSE NULL,
Weight DECIMAL(8,2) SPARSE NULL,
SellEndDate DATETIME SPARSE NULL,
ProductInfo XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)
GO
INSERT INTO Product_sparse
(ProductID, ProductName, Color, Weight, SellEndDate)
SELECT ProductID, Name, Color, Weight, SellEndDate
FROM Production.Product
GO
|
You can reference the sparse columns in your queries just as you would any type of column:
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where ProductID < 320
go
productID productName Color Weight SEllEndDate
--------- --------------------- ------------ ------------- -----------
1 Adjustable Race NULL NULL NULL
2 Bearing Ball NULL NULL NULL
3 BB Ball Bearing NULL NULL NULL
4 Headset Ball Bearings NULL NULL NULL
316 Blade NULL NULL NULL
317 LL Crankarm Black NULL NULL
318 ML Crankarm Black NULL NULL
319 HL Crankarm Black NULL NULL
Note, however, that if you use SELECT *
in a query and the table has a column set defined for the sparse
columns, the column set is returned as a single XML column instead of
the individual columns:
SELECT *
FROM Product_sparse
where ProductID < 320
go
ProductID ProductName ProductInfo
----------- ---------------------- ----------------------------------
1 Adjustable Race NULL
2 Bearing Ball NULL
3 BB Ball Bearing NULL
4 Headset Ball Bearings NULL
316 Blade NULL
317 LL Crankarm <Color>Black</Color>
318 ML Crankarm <Color>Black</Color>
319 HL Crankarm <Color>Black</Color>
You need to explicitly list the columns in the SELECT clause to have the result columns returned as relational columns.
When the column set is
defined, you can also operate on the column set by using XML operations
instead of relational operations. For example, the following code
inserts a row into the table by using the column set and specifying a
value for Weight as XML:
INSERT Product_sparse(ProductID, ProductName, ProductInfo)
VALUES(5, 'ValveStem', '<Weight>.12</Weight>')
go
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5 ValveStem NULL 0.12 NULL
Notice that NULL is assumed for any column omitted from the XML value, such as Color and SellEndDate in this example.
When updating a column set
using an XML value, you must include values for all the columns in the
column set you want to set, including any existing values. Any values
not specified in the XML string are set to NULL. For example, the following query sets both Color and Weight where ProductID = 5:
Update Product_sparse
set ProductInfo = '<Color>black</Color><Weight>.20</Weight>'
where productID = 5
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5 ValveStem black 0.20 NULL
Now, if you run another update but only specify a value for Weight in the XML string, the Color column is set to NULL:
Update Product_sparse
set ProductInfo = '<Weight>.10</Weight>'
where productID = 5
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ----- ------ -----------
5 ValveStem NULL 0.10 NULL
However, if you reference the sparse columns explicitly in an UPDATE statement, the other values remain unchanged:
Update Product_sparse
set Color = 'silver'
where ProductID = 5
SELECT productID, productName, Color, Weight, SEllEndDate
FROM Product_sparse
where productID = 5
go
productID productName Color Weight SEllEndDate
----------- ----------- ------ ------ -----------
5 ValveStem silver 0.10 NULL
Column sets are most useful
when you have many sparse columns in a table (for example, hundreds)
and operating on them individually is cumbersome. Your client
applications may more easily and efficiently generate the appropriate
XML string to populate the column set rather than your having to build
an UPDATE statement dynamically to determine which of the sparse columns need to be included in the SET
clause. Applications might actually see some performance improvement
when they select, insert, or update data by using column sets on tables
that have lots of columns.
Sparse Columns: Good or Bad?
There is some disagreement
in the SQL Server community whether or not sparse columns are
appropriate. A number of professionals are of the opinion that any table
design that requires sparse columns is a bad design that does not
follow good relational design guidelines. Sparse columns, by their
nature, are heavily denormalized. On the other hand, many times you have
to live in the real world and make the best of a bad database design that
you’ve inherited. Sparse columns can help solve performance and storage
issues in databases that may have been poorly designed.
Although sparse columns
can solve certain kinds of problems with database design, you should
never use them as an alternative to proper database and table design. As
cool as sparse columns are, they aren’t appropriate for every scenario,
particularly when you’re tempted to violate normalization rules to be
able to cram more fields into a table.