XML Schemas
One powerful feature of XML
is its ability to strongly type the data in an XML document. XML Schema
Definition (XSD) defines a standard set of data types that must be
supported in all XML documents. You can use XSD to create an XML schema
for your data, requiring that your data conform to a set of rules that
you specify. This gives XML an advantage over just about all other data
transfer/data description methods and is a major contributing factor to
the success of the XML standard.
Without XML Schema, XML
would just be a more modern, more verbose text-delimited format. An XML
schema defines what your data should look like, what elements are
required, and of what data types those elements will be. Analogous to
how a table definition in SQL Server provides structure and type
validation for relational data, an XML schema provides structure and
type validation for the XML data.
We
won’t fully describe all the features of XML Schema here—that would
require a book of its own. You can find the XSD specifications at the
W3C at http://www.w3.org/2001/XMLSchema.
Several popular XSDs are publicly available, including Really Simple
Syndication (RSS), the protocol that powers weblogs, blogcasts, and
other forms of binary and text syndication.
You can choose how to
structure your XSD. XSD can create required elements and set limits on
what data types and ranges are allowed. It can even allow document
fragments.
SQL Server Schema Collections
SQL Server 2005 allows
you to create your own schemas and store them in the database as
database objects. For example, you can create the following simple and
common XSD and add it to the schemas collection in AdventureWorks:
USE AdventureWorks
GO
--new syntax for adding a schema
--schema can disallow fragments if you say so
CREATE XML SCHEMA COLLECTION dbo.customer_xsd
AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://corzen.com/customer"
targetNamespace="http://corzen.com/customer" >
<xsd:simpleType name="customerAge" >
<xsd:restriction base="xsd:int" >
<xsd:maxInclusive value="65" />
<xsd:minExclusive value="18" />
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="custage" type="tns:customerAge" />
</xsd:schema>'
GO
This schema is named
customer_xsd, and you can use it on any XML type, including variables,
parameters, return values, and especially columns in tables. This schema
defines an element named custage that uses the customerAge type, which is defined as an int
data type whose minimum value is anything greater than (but not
including) 18 and whose maximum value is 65 inclusive. Next create a
simple table and apply the schema to the XML column by referring to the
schema name in parentheses after your XML data type in the CREATE TABLE statement shown here:
USE AdventureWorks
GO
--create table with xml column
--and use schema as a rule for
--a particular column
CREATE TABLE xml_schematest (
Customer_ID int primary key,
CustomerAge_XML XML(customer_xsd) –-XML Schema Name
)
GO
As you can see in this example, the CustomerAge_XML column is defined not as simply XML but as XML(customer_xsd). The XML
data type has an optional parameter that allows you to specify the
bound schema. This same usage also applies if you want to bind a schema
to another use of an XML
data type, such as a variable or a parameter. SQL Server now allows
only a strongly typed XML document in the CustomerAge_XML column. This
is much better than our CHECK
constraint (which you can still add to this column). An advantage of
using an XML schema is that your data is validated against it and you
can enforce both data types (at the XML level) and make sure that only
valid data is allowed into the particular elements. If you were using a CHECK constraint, for example, you would need a separate CHECK constraint for each validation you wanted to perform. This example requires a few CHECK
constraints just to enforce the minimum and maximum ages. You need a
constraint requiring the element and then another constraint to verify
the allowed low end of the range (18) and another one to verify the high
end of the allowed range (65).
Note
When
you create an XML column, any defaults you define must be valid for and
not conflict with the bound XML schema; otherwise, the default value
will always fail. Also, be sure that any defined constraints are valid
for and do not conflict with the bound XML schema; otherwise, all
attempts to insert or update data will likely fail. |
Let’s take a look at the schema in action.
USE AdventureWorks
GO
-- works
INSERT INTO xml_schematest VALUES(1,
'<p:custage xmlns:p="http://corzen.com/customer">55</p:custage>')
GO
--works
INSERT INTO xml_schematest VALUES(2,
'<p:custage xmlns:p="http://corzen.com/customer">45</p:custage>')
GO
SQL Server enforces
the schema on insert and updates, ensuring data integrity. The data we
provided conforms to the schema, so the inserts work smoothly. If you
attempt to insert or update an invalid piece of data, you will receive
an error:
USE AdventureWorks
GO
-- fails, age > 65
INSERT INTO xml_schematest VALUES(3,
'<p:custage xmlns:p="http://corzen.com/customer">105</p:custage>')
GO
SQL Server reports back the following error:
Msg 6926, Level 16, State 1, Line 1
XML Validation: Invalid simple type value: '105'. Location: /*:custage[1]
We
have only touched the surface with using XML schemas in SQL Server
2005. These schemas can get quite complex, and further discussion is
beyond the scope of this book. You can easily enforce sophisticated XML
schemas in your database once you master the syntax. We believe that you
should always use an XML schema with your XML data to guarantee
consistency in your XML data.
More Info
See SQL Server 2005 Books Online and Michael J. Young’s XML Step by Step, Second Edition (Microsoft Press, 2002) for more information on schemas. |
The same error is reported back when you try to update already valid data with an invalid piece of XML:
-- fails
--column is validated on update also
UPDATE xml_schematest
SET CustomerAge_XML = '
<p:custage xmlns:p="http://corzen.com/customer">105</p:custage>'
WHERE Customer_id = 1