XML Indexes
You can create an XML index
on an XML column using almost the same syntax as for a standard SQL
Server index. There are four types of XML indexes: a single “primary XML
index” that must be created and three types of optional “secondary XML
index” that are created over the primary index. An XML index is a little
different than a standard SQL index; it is a clustered index on an
internal table used by SQL Server to store XML data. This table is
called the node table and cannot be accessed by programmers.
To get started with an XML index, you must first create the primary index
of all the nodes. The primary index is a clustered index (over the node
table, not the base table) that associates each node of your XML column
with the SQL Primary Key column. It does this by indexing one row in
its B+tree structure for each node in your XML column, generating an
index usually about three times as large as your XML data. For your XML
data to work properly, your table must have an ordinary clustered
Primary Key column defined. That primary key is used in a join of the
XQuery results with the base table. To create a primary XML index, you first create a table with a
primary key and an XML column, as shown here:
--XML index Examples
USE AdventureWorks--but can be any DB
--Drop the table since we used it before
Drop Table OrdersXML
--Create the table with a PK
--and an XML column
CREATE TABLE OrdersXML
(OrderDocID INT PRIMARY KEY,
xOrders XML NOT NULL)
Next we insert some data into our new table:
--Insert some data into our new table
INSERT INTO OrdersXML
Values (1, '
<Orders>
<Order>
<OrderID>5</OrderID>
<CustomerName>Stephen Forte</CustomerName>
<OrderAmount>25</OrderAmount>
</Order>
</Orders>')
INSERT INTO OrdersXML
Values (2, '
<Orders>
<Order>
<OrderID>7</OrderID>
<CustomerName>Andrew Brust</CustomerName>
<OrderAmount>45</OrderAmount>
</Order>
</Orders>')
INSERT INTO OrdersXML
Values (3, '
<Orders>
<Order>
<OrderID>2</OrderID>
<CustomerName>Bill Zack</CustomerName>
<OrderAmount>65</OrderAmount>
</Order>
</Orders>')
To create a primary XML index, we use the CREATE Primary XML INDEX syntax on the table:
CREATE Primary XML INDEX idx_1
ON OrdersXML (xOrders)
We have successfully
created a new primary XML index called idx_1 on the OrdersXML table’s
xOrders column. The primary XML index, idx_1, now has the node table
populated. If you want to look at the node table’s columns, run this
piece of T-SQL:
-- display the columns in the node table
SELECT col.column_id, col.object_id,
col.name, col.system_type_id
FROM sys.columns col
inner JOIN sys.indexes idx
ON idx.object_id = col.object_id
WHERE idx.name = 'idx_1'
AND idx.type = 1
ORDER BY column_id
The results are shown in Table 1.
Table 1. Columns in a Typical Node Table
column_id | object_id | name | System_type_id |
---|
1 | 855674096 | id | 165 |
2 | 855674096 | nid | 56 |
3 | 855674096 | tagname | 231 |
4 | 855674096 | taguri | 231 |
5 | 855674096 | tid | 56 |
6 | 855674096 | value | 98 |
7 | 855674096 | lvalue | 231 |
8 | 855674096 | lvaluebin | 165 |
9 | 855674096 | hid | 167 |
10 | 855674096 | xsinil | 104 |
11 | 855674096 | xsitype | 104 |
12 | 855674096 | pk1 | 56 |
The three secondary
XML indexes are Path, Value, and Property. You can implement a secondary
XML index only after you have created a primary XML index because they
are both actually indexes over the node table. These indexes further
optimize XQuery statements made against the XML data.
A path index creates an index on the Path ID (HID in Table 4-1) and Value columns of the primary XML index, using the FOR PATH
keyword. This type of index is best when you have a fairly complex
document type and want to speed up XQuery XPath expressions that
reference a particular node in your XML data with an explicit value (as
explained in the later discussion of XQuery). If you are more concerned
about the values of the nodes queried with wildcards, you can create a
value index using the FOR VALUE XML index. The VALUE index contains the same index columns as the PATH index, Value, and Path ID (HID), but in the reverse order (as shown in Table 4-1). Using the property type index with the PROPERTY keyword optimizes hierarchies of elements or attributes that are name/value pairs. The PROPERTY
index contains the primary key of the base table, Path ID (HID), and
Value, in that order. The syntax to create these indexes is shown here;
you must specify that you are using the primary XML index by using the USING XML INDEX syntax, as shown here:
--structural (Path)
CREATE XML INDEX idx_a ON OrdersXML (xOrders)
USING XML INDEX idx_1 FOR PATH
--value
CREATE XML INDEX idx_b ON OrdersXML (xOrders)
USING XML INDEX idx_1 FOR VALUE
--property
CREATE XML INDEX idx_c ON OrdersXML (xOrders)
USING XML INDEX idx_1 FOR PROPERTY
There are some additional restrictions that you must be aware of:
An XML index can contain only one XML column, so you cannot create a composite XML index (an index on more than one XML column).
Using
XML indexes requires that the primary key be clustered, and because you
can have only one clustered index per table, you cannot create a
clustered XML index.
SQL Server 2005
Management Studio allows you to view the indexes on a table by drilling
down from the Database node to the table name and down to the Indexes
node, as shown in Figure 3.
Using this dialog box, you can also create a new primary XML index by
right-clicking on the Indexes node and clicking New Index.
Armed with this new
index type, you can write some very efficient queries using XQuery.
Before we get to XQuery, however, let’s take a look at some other XML
features that will help you get XML data in and out of the database.