Sql Server 2012 : Hierarchical Data and the Relational Database - The hierarchyid Data Type, Creating a Hierarchical Table

5/30/2014 4:31:40 AM

The hierarchyid Data Type

The hierarchyid data type enables a robust hierarchical structure over a relational table that can relieve you from much of this maintenance burden. Like the self-joining approach, each row in the table represents a node in the tree. However, instead of storing just the parent value in a foreign key column, each row now has a unique hierarchyid whose encoded value describes the full path all the way from the root of the tree down to that row’s node position within the tree. The hierarchyid value in each row therefore identifies each node’s location in the tree, relative to the hierarchyid values of its neighboring parent, sibling, and child nodes that are stored in other rows. In this manner, the hierarchyid values link rows to each other as nodes that form the tree structure.

Using Transact-SQL (T-SQL) extensions for the hierarchyid data type, you can invoke methods to arbitrarily insert, modify, and delete nodes at any point within the structure with great efficiency. You can also reparent entire subtrees in a single update. SQL Server determines the nodes affected by the update and calculates the necessary changes to their hierarchyid values automatically. And with the proper indexes in place , querying a tree-structured table based on the hierarchyid data type is significantly faster than running recursive queries on a traditional self-joining table using CTEs.

SQL Server employs an algorithm that encodes the path information of a hierarchyid value into an extremely compact variable-length format. The average number of bits that are required to represent a node in a tree with n nodes depends on the average number of children of a node, known as the average fanout. For small fanouts (0 to 7), the size is about 6*log _An bits, where A is the average fanout. In practical terms, this means that one node in a hierarchy of 100,000 nodes with an average fanout of 6 takes about 38 bits, which gets rounded up to 40 bits (5 bytes) for storage. The encoded value has no meaningful display representation, of course, but the ToString method will translate a hierarchyid value to a human-readable form that expresses a delimited node path, as you’ll see shortly.

The hierarchyid type is implemented as a SQL common language runtime (CLR) user-defined type (UDT). This means that hierarchyid is hosted internally by the CLR, and not natively by SQL Server (this is also why functions invoked on it are called methods). Due to its tight integration in SQL Server, the feature is available even with SQL CLR disabled. So you don’t need to enable SQL CLR to use the hierarchyid data type (or any of the other “system” CLR types, such as geometry and geography). The SQL CLR feature increases your exposure to security risks by allowing Microsoft .NET Framework code to run on the server, so you should enable it only if you are creating your own SQL CLR objects (stored procedures, user-defined functions, and so on).

Note

More Info The hierarchyid data type is available to CLR clients as the SqlHierarchyId data type in the Microsoft.SqlServer.Types namespace. This namespace is contained in the Microsoft.SqlServer.Types.dll assembly, which can be found on the .NET tab in the Microsoft Visual Studio Add Reference dialog box. In the database itself, the node information represented by a hierarchyid value is stored as a varbinary data type, which can support a virtually unlimited number of child nodes per parent.

Creating a Hierarchical Table

Let’s begin by implementing the classic manager-employee tree scenario using the hierarchyid data type. Start with the structure of the Employee table, as shown in Example 1:

Example 1. Creating a hierarchical table.

USE master
GO
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO
CREATE DATABASE MyDB
GO

USE MyDB
GO

CREATE TABLE Employee
(
   NodeId        hierarchyid PRIMARY KEY CLUSTERED,
   NodeLevel     AS NodeId.GetLevel(),
   EmployeeId    int UNIQUE NOT NULL,
   EmployeeName  varchar(20) NOT NULL,
   Title         varchar(20) NULL
)
GO

Notice that the NodeId column is declared as a hierarchyid type and is also defined as the table’s primary key using a clustered index. The primary key is important for two reasons, the first of which is that SQL Server doesn’t guarantee uniqueness of hierarchyid values in the table. Don’t view this as a limitation—this fact actually enables advanced scenarios in which you can house multiple distinct hierarchies within a single table. With the single hierarchy in our example, defining a primary key is one way to enforce unique NodeId values. Defining an ordinary unique constraint instead would be another way, and you can certainly do that if some column other than the hierarchyid is more suitable for the table’s primary key. The second reason for the primary key index is to support depth-first indexing, which we’ll talk about shortly.

As we’ve started explaining, the value in the NodeId column identifies each row’s unique node position within the structure. It can serve as the table’s primary key, and you don’t technically need to have any other identifying columns in the table. Practically, however, you’ll most likely still want to define another unique column to serve as a more conventional identifier, because values assigned into the hierarchyid column as rows are added to the table based on their node positions within the tree. Those hierarchyid values are essentially nondeterministic and will also shift automatically as the tree’s node structure is changed using the techniques that you’ll start learning about just ahead. For example, if you relocate a node and all of its descendants to a new position within the tree (also known as reparenting a subtree), the hierarchyid values of those nodes are automatically updated by SQL Server to reflect their new positions within the hierarchy. You therefore define EmployeeId as a unique integer column assigned with values that you control and that won’t change when the structure of nodes in the tree is manipulated in the future. So when you add employees, their ID values remain the same regardless of how their nodes get repositioned down the line.

Note

You can also use IDENTITY values for the EmployeeId column, which puts SQL Server in control of assigning key values using incrementing integers, and which would also not change as the tree structure is altered. To stay focused on hierarchyid, the example here does not add the extra code for using IDENTITY, and you simply assign your own unique EmployeeId integer values explicitly.

The GetLevel Method

The second column defined in the Employee table is NodeLevel, which is declared as a calculated column that returns the 16-bit integer result of the GetLevel method applied to the hierarchyid value in the NodeId column. This is optional and provides a read-only column that returns a number indicating the level position of its row (node), extracted and decoded from the compacted hierarchyid value in the NodeId column. One and only one root row exists in any hierarchy and is positioned at level zero. Its immediate child rows are positioned at level one, grandchild rows at level two, and so on. Defining the NodeLevel column that calls the GetLevel method will allow you to later create a breadth-first index on the table.

The EmployeeId column is defined with a unique constraint and doesn’t permit NULL values. You can therefore treat its integer value as an “alternate primary key” for your employees, because the hierarchyid value in the NodeId column has already been defined as the table’s actual primary key. As we mentioned, in many practical cases, you will want to define such an alternate key to be used as an identifier with a value that you are in control of and that doesn’t change unpredictably as the result of node manipulations the way a hierarchyid value does. You’ll definitely need such identifiers if you want to establish relationships for joining with other tables, or you might want them just for your own convenience. But it’s important to understand that SQL Server requires no such identifier to implement the hierarchy, and that you can certainly cast a hierarchical structure over a relational table with only the hierarchyid value as the primary key and no other unique columns.

Other

Personal Cloud WD My Cloud EX2 Review

Synology DS213j Your Own Cloud Entertainment Starts Here

ADATA Premier Pro SP920 256GB

Netgear ReadyNAS 314 Enterprise Network Storage

Budget Portable Hard Drivers Review

Toshiba Q Series 256GB and 512GB, Toshiba Q Series Pro 256GB

SanDisk Ultra Plus 256GB And X110 256GB

SanDisk Extreme II 480GB, Plextor M5 Pro Extreme 256GB and 512GB

Samsung SSD 840 Evo 250GB, 500GB And 1TB

PNY XLR8 240GB and XLR8 PRO 240GB