The hierarchyid Data TypeThe hierarchyid data type enables a robust hierarchical
structure over a relational table that can relieve you from much of
this maintenance burden. Like the self-joining approach, each row in
the table represents a node in the tree. However, instead of storing
just the parent value in a foreign key column, each row now has a
unique hierarchyid whose
encoded value describes the full path all the way from the root of the
tree down to that row’s node position within the tree. The hierarchyid value in each row therefore identifies each node’s location in the tree, relative to the hierarchyid values of its neighboring parent, sibling, and child nodes that are stored in other rows. In this manner, the hierarchyid values link rows to each other as nodes that form the tree structure. Using Transact-SQL (T-SQL) extensions for the hierarchyid
data type, you can invoke methods to arbitrarily insert, modify, and
delete nodes at any point within the structure with great efficiency.
You can also reparent entire subtrees in a single update. SQL Server
determines the nodes affected by the update and calculates the
necessary changes to their hierarchyid values automatically. And with the proper indexes in place , querying a tree-structured table based on the hierarchyid data type is significantly faster than running recursive queries on a traditional self-joining table using CTEs. SQL Server employs an algorithm that encodes the path information of a hierarchyid
value into an extremely compact variable-length format. The average
number of bits that are required to represent a node in a tree with n nodes depends on the average number of children of a node, known as the average fanout. For small fanouts (0 to 7), the size is about 6*log An
bits, where A is the average fanout. In practical terms, this means
that one node in a hierarchy of 100,000 nodes with an average fanout of
6 takes about 38 bits, which gets rounded up to 40 bits (5 bytes) for
storage. The encoded value has no meaningful display representation, of
course, but the ToString method will translate a hierarchyid value to a human-readable form that expresses a delimited node path, as you’ll see shortly. The hierarchyid type is implemented as a SQL common language runtime (CLR) user-defined type (UDT). This means that hierarchyid is hosted internally by the CLR, and not natively by SQL Server (this is also why functions invoked on it are called methods).
Due to its tight integration in SQL Server, the feature is available
even with SQL CLR disabled. So you don’t need to enable SQL CLR to use
the hierarchyid data type (or any of the other “system” CLR types, such as geometry and geography).
The SQL CLR feature increases your exposure to security risks by
allowing Microsoft .NET Framework code to run on the server, so you
should enable it only if you are creating your own SQL CLR objects
(stored procedures, user-defined functions, and so on). NoteMore Info The hierarchyid data type is available to CLR clients as the SqlHierarchyId data type in the Microsoft.SqlServer.Types namespace. This namespace is contained in the Microsoft.SqlServer.Types.dll
assembly, which can be found on the .NET tab in the Microsoft Visual
Studio Add Reference dialog box. In the database itself, the node
information represented by a hierarchyid value is stored as a varbinary data type, which can support a virtually unlimited number of child nodes per parent. Creating a Hierarchical TableLet’s begin by implementing the classic manager-employee tree scenario using the hierarchyid data type. Start with the structure of the Employee table, as shown in Example 1: Example 1. Creating a hierarchical table. USE master GO IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB') DROP DATABASE MyDB GO CREATE DATABASE MyDB GO
USE MyDB GO
CREATE TABLE Employee ( NodeId hierarchyid PRIMARY KEY CLUSTERED, NodeLevel AS NodeId.GetLevel() , EmployeeId int UNIQUE NOT NULL, EmployeeName varchar(20) NOT NULL, Title varchar(20) NULL ) GO Notice that the NodeId column is declared as a hierarchyid
type and is also defined as the table’s primary key using a clustered
index. The primary key is important for two reasons, the first of which
is that SQL Server doesn’t guarantee uniqueness of hierarchyid
values in the table. Don’t view this as a limitation—this fact actually
enables advanced scenarios in which you can house multiple distinct
hierarchies within a single table. With the single hierarchy in our
example, defining a primary key is one way to enforce unique NodeId
values. Defining an ordinary unique constraint instead would be another
way, and you can certainly do that if some column other than the hierarchyid is more suitable for the table’s primary key. The second reason for the primary key index is to support depth-first indexing, which we’ll talk about shortly. As we’ve started explaining, the value in the NodeId
column identifies each row’s unique node position within the structure.
It can serve as the table’s primary key, and you don’t technically need
to have any other identifying columns in the table. Practically,
however, you’ll most likely still want to define another unique column
to serve as a more conventional identifier, because values assigned
into the hierarchyid column as rows are added to the table based on their node positions within the tree. Those hierarchyid
values are essentially nondeterministic and will also shift
automatically as the tree’s node structure is changed using the
techniques that you’ll start learning about just ahead. For example, if
you relocate a node and all of its descendants to a new position within
the tree (also known as reparenting a subtree), the hierarchyid
values of those nodes are automatically updated by SQL Server to
reflect their new positions within the hierarchy. You therefore define EmployeeId
as a unique integer column assigned with values that you control and
that won’t change when the structure of nodes in the tree is
manipulated in the future. So when you add employees, their ID values
remain the same regardless of how their nodes get repositioned down the
line. NoteYou can also use IDENTITY values for the EmployeeId column, which puts SQL Server in control of assigning
key values using incrementing integers, and which would also not change
as the tree structure is altered. To stay focused on hierarchyid, the example here does not add the extra code for using IDENTITY, and you simply assign your own unique EmployeeId integer values explicitly. The second column defined in the Employee table is NodeLevel, which is declared as a calculated column that returns the 16-bit integer result of the GetLevel method applied to the hierarchyid value in the NodeId
column. This is optional and provides a read-only column that returns a
number indicating the level position of its row (node), extracted and
decoded from the compacted hierarchyid value in the NodeId
column. One and only one root row exists in any hierarchy and is
positioned at level zero. Its immediate child rows are positioned at
level one, grandchild rows at level two, and so on. Defining the NodeLevel column that calls the GetLevel method will allow you to later create a breadth-first index on the table. The EmployeeId column is defined with a unique constraint and doesn’t permit NULL values. You can therefore treat its integer value as an “alternate primary key” for your employees, because the hierarchyid value in the NodeId
column has already been defined as the table’s actual primary key. As
we mentioned, in many practical cases, you will want to define such an
alternate key to be used as an identifier with a value that you are in
control of and that doesn’t change unpredictably as the result of node
manipulations the way a hierarchyid value does. You’ll definitely need such identifiers if you want to establish relationships for joining with other tables,
or you might want them just for your own convenience. But it’s
important to understand that SQL Server requires no such identifier to
implement the hierarchy, and that you can certainly cast a hierarchical structure over a relational table with only the hierarchyid value as the primary key and no other unique columns.
|