The syntax for using CTEs is broken down into two
parts: the declaration of the CTE itself and the statement that
references the CTE that was just declared:
-- CTE DECLARATION
WITH CTE_name [ ( column_name [,...n] ) ]
AS
( CTE_query_definition )
-- CTE USE
SELECT <column_list>
FROM CTE_NAME
The entire block of syntax in this example represents a single statement, and some syntax rules particular to CTEs should be pointed out:
When the CTE is not
the first statement in a batch, the statement preceding the CTE must
have been terminated with a semicolon (“;”). See the sidebar titled “The
Transact-SQL Statement Terminator” for more information.
The list of “column_name” values in the CTE declaration is optional unless there are expressions in the CTE_query_definition that do not have explicit column names provided.
The
CTE definition is valid only for a single statement. If we want to use
the same CTE for multiple statements, we must restate the CTE
declaration for each of the multiple queries.
The semicolon (“;”) Transact-SQL
statement terminator has historically been an optional syntax element
in SQL Server. However as the language becomes more involved, the parser
needs some help in properly interpreting our code. SQL Server 2008
Documentation states that the semicolon will be required in a future
version. It would be in our best interest then to begin the habit of
terminating all SQL statements with a semicolon now.
CTEs are an example of where the semicolon is required on the statement that precedes the CTE. A CTE’s definition begins with a WITH clause. As you will see later in this chapter, the SELECT statement can also have a WITH clause for a completely different purpose when using table locking and index hints. To help the SQL parser understand your intended use of WITH
when creating CTEs, you must make sure that any previous statements in
the same batch were properly terminated with a semicolon. It is for this
reason that you may see some developers precede the CTE’s WITH clause using a semicolon like this, “;WITH
”. It’s important to understand that the semicolon is not a part of the
CTE’s declaration but rather the explicit termination of the statement
that precedes the CTE’s declaration in the batch.
If you fail to
properly terminate the previous statement before declaring a CTE, SQL
Server will let you know with the following error message:
Msg 319, Level 15, State 1, Line 4 Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
|
Let’s look at a
sample CTE. The following CTE and its referencing query answer the
question, “How many products have sold a total quantity of greater than
5000 items across all orders?”
USE AdventureWorks2008;
GO
--CTE DECLARATION
WITH ProductsOver5000 AS
(
SELECT ProductID, SUM(OrderQty) AS TotalQty
FROM Sales.SalesOrderDetail
GROUP BY ProductID
HAVING SUM(OrderQty) > 5000
)
--CTE USE
SELECT COUNT(*) AS NumProductsOver5000
FROM ProductsOver5000;
The query that references
the CTE is clean and easy to read. It simply references the
ProductsOver5000 CTE declared above as its source. It is this clarity in
the final query that often makes CTEs much easier to understand than a
functionally equivalent statement written using subqueries. Let’s try the same query using a subquery rather than a CTE:
USE AdventureWorks2008;
GO
SELECT COUNT(*) AS NumProductsOver5000
FROM
(
SELECT ProductID, SUM(OrderQty) AS TotalQty
FROM Sales.SalesOrderDetail
GROUP BY ProductID
HAVING SUM(OrderQty) > 5000
) AS ProductsOver5000;
Many people find
the subquery version to be more difficult to understand because the
“inner” query is stated in the middle of the “outer” query. It can make
it difficult to figure out where one query starts and the other stops.
A single statement can have
multiple CTEs declared. If that is the case, all CTE declarations must
occur before the statement that references them. When defining multiple
CTEs in a single statement, the WITH
keyword only appears once, and each CTE declaration is separated by a
comma (“,”). A benefit of multiple CTEs is that one CTE can build on the
results of a previous CTE, much in the same way that nested subqueries
are used. Here is an example of a query that uses multiple CTEs to
answer the question, “What products were sold on the last day of
business, and what was the total quantity sold for each product?”:
USE AdventureWorks2008;
GO
WITH
-- First CTE: LastDayOrderIDs
-- Get the SalesOrderID values for all orders placed on the last day of
-- business (MAX(OrderDate))
LastDayOrderIDs AS
(
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE OrderDate = (SELECT MAX(OrderDate) FROM Sales.SalesOrderHeader)
),
-- Second CTE: LastDayProductQuantities
-- Now find the distinct ProductIDs and their total quantities for all
-- of the SalesOrderID values retrieve in the previous CTE
LastDayProductQuantities AS
(
SELECT ProductID, SUM(OrderQty) AS LastDayQuantity
FROM Sales.SalesOrderDetail
JOIN LastDayOrderIDs
ON Sales.SalesOrderDetail.SalesOrderID = LastDayOrderIDs.SalesOrderID
GROUP BY ProductID
),
-- Third CTE: LastDayProductDetails
-- Take the ProductIDs and last day quantities from the previous
-- CTE and join it to the Product table to retrieve the product names
LastDayProductDetails AS
(
SELECT
Product.ProductID,
Product.Name,
LastDayQuantity
FROM Production.Product
JOIN LastDayProductQuantities
ON Product.ProductID = LastDayProductQuantities.ProductID
)
-- Finally return the results of the last CTE
SELECT * FROM LastDayProductDetails
ORDER BY LastDayQuantity DESC;
The
preceding query may not be the most efficient syntax for retrieving the
desired result, but it does give us a way to break what may seem like a
complex query down into smaller bite-size pieces (CTEs). It also allows
us to do it in a top-down style notation rather than the sometimes more
confusing inside-out subquery syntax. It is just another way to
approach solving a problem and may allow you think around obstacles you
encounter in other methods.
Tip
When taking a test you are
provided with a note sheet and a pen. A helpful tip might be to write
down key things you want to remember onto the note sheet before you
start the test. This can help you focus your thoughts before the test as
well as provide you with reminders once the test starts.
Whether
we choose to write our statements using CTEs or subqueries is often
just a matter of personal preference. If it is possible for us to write a
statement using either a CTE or a subquery, we will probably find
little difference between the CTE version of the statement when compared
to the subquery version of the statement as far as SQL Server’s
optimizer is concerned. Assuming they both imply the same intentional
result, we will likely get the same execution plan returned for both
styles of syntax. Of course, that is a generalization, and when we do
have a choice we should always test to see if one version of the
statement has a better execution plan than the other. The CTE becomes a
requirement and not an option when we want to reference the CTE multiple
times in the same query (as in a self-join situation), when we want the
multiple CTEs to build on the single “base” CTE, and when we want the
CTE’s query to be recursive.
Creating Recursive CTEs
Recursive queries
are a powerful feature of CTEs. With recursive CTEs we have the ability
to perform in a single set-based statement what in previous versions has
typically required the implementation of procedures or functions. The
recursive nature of CTEs comes from their ability to be
self-referencing; something neither subqueries nor views can do.
Recursive queries, though powerful, can sometimes be difficult to
visualize, and if we aren’t careful can generate “runaway” code. SQL
Server prevents this from happening by default, but we need to be aware
of what recursion means and learn how to control it. We will cover
controlling recursion when we discuss the MAXRECURSION
option at the end of this section. If you are new to the concept of
recursion, read the sidebar titled “Recursion” for more information.
Recursion is a useful
programming practice for writing code that calls itself. Recursion
provides a powerful mechanism for processing trees and hierarchies,
building running totals, calculating factorials, and so on.
Recursion
is a common tool used in many languages. SQL Server has supported
recursion in stored procedures and triggers in previous versions, but
CTEs now give us the ability to create recursive queries. As explained
in this chapter, CTEs are named queries that can be referenced from a
SQL statement. The name of a CTE can in fact be referenced from within
its own query definition. By following some specific syntax rules, SQL
Server can process the self-referencing CTE query definitions and
traverse the parent/child relationships that are so common in relational
databases.
A common example of a
recursive CTE in the SQL world is one that uses the relationship between
an employee and a manager. In this scenario a manager has employees,
but those employees may also be managers of other employees, and so on
and so forth. With a recursive query, we can retrieve the top-level
(Level 0) manager in the “Anchor query” and then in the recursive
operation find all the employees that report to that manager (Level 1),
iterate the recursive query again to find all the employees for that
report to the Level 1 employees (Level 2), and again, and again.
|
The structure of a
recursive CTE is similar to that of a regular CTE with the exception of
the CTE’s query definition. Here is the basic layout of a recursive CTE
query:
-- CTE DECLARATION
WITH CTE_name [ ( column_name [,...n] ) ]
AS
(
CTE_Anchor_member_query_definition
UNION ALL
CTE_Recursive_member_query_definition
)
-- CTE USE
SELECT <column_list>
FROM CTE_NAME
The preceding syntax
shows that a recursive CTE’s query definition has two primary
components: the “anchor member” and the “recursive member.” The anchor
member is the query that returns rows that represent the top-level items
in a hierarchy. For example, when using CTEs to recursively iterate
through the
manager/employee relationships in an organization, the anchor member
query will return only the top-level employee (or employees if there are
multiple people at the very top of the organizational chart). Anchor
member queries specifically may not reference the CTE. Referencing the CTE is what the recursive member does.
The “recursive member”
is the query that returns the details for the lower level items and
maps to the “parents” of those records by joining back to the CTE
itself. The recursive member is the query that references the CTE by its
name and causes SQL server to perform the recursion.
The anchor member and the recursive member are separated by a UNION ALL operator. This continuously attaches the rows from each recursion to the final result set.
If you have read up
on recursive CTE queries before, you have likely run across the standard
manager / employee recursive query using the AdventureWorks. HumanResources.Employee table from SQL Server 2005. SQL Server 2008 has significantly changed the structure of the HumanResources.Employee table and how the hierarchy itself is represented. The most notable differences are the new BusinessEntityID and OrganizationNode columns. The OrganizationNode column uses the new HierarchyID
data type. A discussion of it is beyond the scope of this chapter, but
you can refer to the SQL Server 2008 documentation for details on it.
Without further ado, here is a sample recursive query:
USE AdventureWorks2008;
GO
WITH OrgChart --Notice the name of the CTE is "OrgChart"
(OrganizationNode,LoginID, ManagerLoginID, Level)
AS
(
--Anchor Member query:
SELECT OrganizationNode, LoginID, CAST(NULL AS nvarchar(256)),
OrganizationNode.GetLevel()
FROM HumanResources.Employee
WHERE OrganizationNode = HierarchyID::GetRoot()
UNION ALL
--Recursive Member Query:
SELECT E.OrganizationNode, E.LoginID, M.LoginID,
E.OrganizationNode.GetLevel()
FROM HumanResources.Employee AS E
JOIN OrgChart AS M
ON E.OrganizationNode.GetAncestor(1) = M.OrganizationNode
)
SELECT * FROM OrgChart;
Let’s dissect this query a little bit. First, we see that the name of the CTE is OrgChart.
Next is the explicit list of column names that the query will return.
In this example it is useful to state the column aliases here rather
than in the query definitions inside the CTE’s declaration. It just
helps clean things up a little. Also in this query, only the LoginIDs
of the employees and managers are being retrieved. This keeps the query
simple and clean, yet still gives us a sense of “who” the managers and
employees are. Remember, this is just an example.
The anchor member
query returns details for employees who are at the top level of the
organization. In this case we are determining that an employee is at the
top of the organization by testing to see if the employee’s OrganizationNode column value is the same as the root value of the HierarchyID data type (HierarchyID::GetRoot()). The last column returns the “Level” of the employee by calling the GetLevel() method of the OrganizationNode HierarchyID value (OrganizationNode.GetLevel()), When SQL Server processes the anchor member query, a single row is returned:
OrganizationNode LoginID ManagerLoginID Level
---------------- ------------------------ ----------------------- -----
0x adventure-works\ken0 NULL 0
Note that in the row returned from the anchor member query definition, the ManagerLoginID value is NULL.
This is because members at the top level of the hierarchy by definition
do not have managers. Therefore, any manager data for those employees
is empty, or NULL.
SQL server then
moves to process the recursive member. The recursive member query joins
back to the CTE by its name. In our example, the CTE now represents the
row that was returned from the anchor member. SQL Server finds all
employees that have an OrganizationNode value with the ancestor 1 level up (OrganizationNode.GetAncestor(1)) having the same value as the manager’s OrganziationNode value. The first pass through the recursive query then returns all employees who report to “adventure-works\ken0”:
OrganizationNode LoginID ManagerLoginID Level
---------------- ---------------------- -------------------- -----
0x58 adventure-works\terri0 adventure-works\ken0 1
0x68 adventure-works\david0 adventure-works\ken0 1
0x78 adventure-works\james1 adventure-works\ken0 1
0x84 adventure-works\laura1 adventure-works\ken0 1
0x8C adventure-works\jean0 adventure-works\ken0 1
0x94 adventure-works\brian3 adventure-works\ken0 1
SQL
Server then continues to iterate the recursive member query for each
row returned by its previous iteration. For example, the next iteration
would join any employees that report to “adventure-works\brian3” to
Brian’s record and show his direct reports, giving us the following
additional rows:
OrganizationNode LoginID ManagerLoginID Level
---------------- ----------------------- --------------------- -----
... ... ... ...
0x9560 adventure-works\stephen0 adventure-works\brian3 2
0x95A0 adventure-works\syed0 adventure-works\brian3 2
0x95E0 adventure-works\amy0 adventure-works\brian3 2
... ... ... ...
SQL Server then
continues recursing into the CTE for each row returned to find that
row’s employees until all rows have been exhausted or the maximum depth
of recursion has been reached. By default, SQL Server supports 100
levels of recursion along a single “branch” in a hierarchy. If we have
hierarchies with branches that are deeper than 100 levels, we will need
to override the default using the MAXRECURSION query option. Query options will be explained in more detail later in this chapter. We can tell if we have exceeded the MAXRECURSION by the system returning the following error message:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 3 has been exhausted
before statement completion.
Warning
Don’t forget about MAXRECURSION.
Make sure to pay attention to queries that could repeat indefinitely
during the exam. Recursive queries that go deeper than 100 levels along
any branch will cause errors unless the MAXRECURSION option is used.
In Exercise 1 you will create a recursive CTE and test the effect of the MAXRECURSION option.
Exercise 1. Creating A Recursive CTE
Open a new query in SQL Server Management Studio
Enter the following code into the window:
USE AdventureWorks2008;
GO
WIH OrgChart (OrganizationNode,LoginID,MLoginID) AS
(SELECT OrganizationNode, LoginID, CAST(NULL AS nvarchar(256))
FROM HumanResources.Employee
WHERE OrganizationNode = HierarchyID::GetRoot ()
UNION ALL
SELECT E.OrganizationNode, E.LoginID, M.LoginID
FROM HumanResources.Employee AS E
JOIN OrgChart AS M
ON E.OrganizationNode.GetAncestor(1) = M.OrganizationNode)
SELECT * FROM OrgChart
OPTION(MAXRECURSION 4);
Run the query and view the results.
Try the same query that is shown in step 2, but change the MAXRECURSION value from 4 to 3.