A common table expression (CTE) closely resembles a
nonpersistent view. It is a temporary named result set that you define
in your query that will be used by the FROM
clause of the query. Each CTE is defined only once (but can be referred
to as many times as it is in scope) and lives for as long as the query
lives. You can use CTEs to perform recursive operations. Here is the
syntax to create a CTE:
WITH <name of your CTE>(<column names>)
AS
(
<actual query>
)
SELECT * FROM <name of your CTE>
Note
SQL
Server 2005 also ships with a new database called AdventureWorks, which
is like a Northwind database on steroids. Most of our examples will use
AdventureWorks, which has much more functionality than Northwind. We
will call out anything new and exciting about it as we bump into the
features, but for most of this book all you have to know is that it is
a basic order-entry system. |
An example of a simple CTE using AdventureWorks is shown here:
Listing 1. A simple CTE
WITH AllMRContacts AS (SELECT * FROM Person.Contact WHERE Title='Mr.')
SELECT LastName + ', ' + FirstName AS Contact FROM AllMRContacts ORDER BY LastName
|
The results are shown here:
Contact
-------
Abbas, Syed
Achong, Gustavo
Adams, Jay
Adams, Ben
Adina, Ronald
Agcaoili, Samuel
[more rows]
The
following example gets a count of all the sales a salesperson made in
the AdventureWorks orders system as a CTE and then executes a simple
inner join with the SalesPerson table to return more information about
the salesperson, such as his quota. This demonstrates how a CTE is
joined to your calling query. You can do this without a CTE, but think
about all the times you have created a temp table or a throwaway view
and joined back to it—now you can use a CTE instead and keep the
complexity of aggregating in the CTE only, thereby simplifying your
code. The code is shown here:
Listing 2. CTE-to-query join
USE AdventureWorks GO
WITH OrderCountCTE(SalesPersonID, OrderCount) AS ( SELECT SalesPersonID, COUNT(*) FROM Sales.SalesOrderHeader WHERE SalesPersonID IS NOT NULL GROUP BY SalesPersonID ) SELECT sp.SalesPersonID, oc.OrderCount, sp.SalesYTD FROM Sales.SalesPerson sp INNER JOIN OrderCountCTE oc ON oc.SalesPersonID=sp.SalesPersonID ORDER BY oc.OrderCount
|
The results look like this:
SalesPersonID OrderCount SalesYTD
------------------------------------------
288 16 219088.8836
284 39 636440.251
268 48 677558.4653
280 60 0.00
290 109 1758385.926
289 130 2241204.0424
287 140 1931620.1835
[more rows]
CTEs
can also eliminate self-joins in some of your queries. Take a look at
this example. We will create a table called Products and insert
duplicates into the Product Name field.
Listing 3. Inserting duplicates into AdventureWorks’ Product_Name
CREATE TABLE Products ( Product_ID int NOT NULL, Product_Name varchar (25), Price money NULL, CONSTRAINT PK_Products PRIMARY KEY NONCLUSTERED (Product_ID) ) GO INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (1, 'Widgets', 25) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (2, 'Gadgets', 50) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (3, 'Thingies', 75) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (4, 'Whoozits', 90) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (5, 'Whatzits', 5) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (6, 'Gizmos', 15) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (7, 'Widgets', 24) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (8, 'Gizmos', 36) INSERT INTO Products (Product_ID, Product_Name, Price) VALUES (9, 'Gizmos', 36) GO
|
One
common problem found in databases is having duplicate product names
with different product IDs. If you run a duplicate-finding query, that
query will return all the records (the duplicates and the good values).
This increases the difficulty of automatically deleting duplicates. If
you want to find the Product Name duplicates without also including the
first instance of the name in the table, you can use a self-join like
this:
Listing 4. Self-join without CTE
SELECT * FROM Products WHERE Product_ID NOT IN (SELECT MIN(Product_ID) FROM Products AS P WHERE Products.Product_Name = P.Product_Name)
|
The self-join returns data like this:
Product_ID Product_Name Price
------------------------------------------
8 Gizmos 36.00
9 Gizmos 36.00
7 Widgets 24.00
You
can also rewrite your query using a CTE to eliminate the
confusing-looking self-join and get the same results. This technique
does not offer a performance gain over self-joins; it is just a
convenience for code maintainability. The previous self-join example is
rewritten here as a CTE and yields the same results:
Listing 5. Self-join as a CTE
WITH CTEMinProductRecords AS ( SELECT MIN(Product_ID) AS Product_ID, Product_Name FROM Products GROUP BY Product_Name HAVING COUNT(*) > 1) SELECT * FROM Products JOIN CTEMinProductRecords ON Products.Product_Name = CTEMinProductRecords.Product_Name AND Products.Product_ID > CTEMinProductRecords.Product_ID
|
After
you investigate your duplicates using the preceding CTE, you might want
to delete the duplicate data. You might also want to update any foreign
keys in related tables to use the original Product_ID
value. If your duplicate data does not have any related child rows in
another table, or if you have updated them to the correct Product_ID, you can delete the duplicate data by just rewriting the CTE like this, replacing the SELECT * with a DELETE:
Listing 6. Deleting duplicates in a CTE
WITH CTEMinProductRecords AS ( SELECT MIN(Product_ID) AS Product_ID, Product_Name FROM Products GROUP BY Product_Name HAVING COUNT(*) > 1) DELETE Products FROM Products JOIN CTEMinProductRecords ON Products.Product_Name = CTEMinProductRecords.Product_Name AND Products.Product_ID > CTEMinProductRecords.Product_ID
|
Recursive Queries with CTEs
The
true power of CTEs emerges when you use them recursively to perform
hierarchical queries on tree-structured data. In fact, this was a major
reason that Microsoft built CTEs, in addition to SQL-92 compliance. A
recursive CTE is constructed from a minimum of two queries. The first,
the anchor member (AM), is a nonrecursive query; the second, the
recursive member (RM), is the recursive query. Within your CTE’s
parentheses (after the AS clause), you define queries that are independent or refer back to the same CTE. The AM and RM are separated by a UNION ALL
statement. AMs are invoked only once; RMs are invoked repeatedly until
the query returns no rows. You can append multiple AMs to each other
using a UNION or UNION ALL operator, depending on whether you want to eliminate duplicates. (You must append recursive members using a UNION ALL operator.) Here is the syntax:
WITH SimpleRecursive(field names)
AS
(
<Select Statement for the Anchor Member>
UNION ALL
<Select Statement for the Recursive Member>
)
SELECT * FROM SimpleRecursive
The following example demonstrates this feature. We’ll create a table of employees and a self-referencing field back to Employee_ID called ReportsTo. We’ll then write a query that returns all the employees who report to Stephen (Employee_ID=2) and all the employees who report to Stephen’s subordinates.
Listing 7. Example table for recursive CTE queries
--create a table with tree data CREATE TABLE Employee_Tree (Employee_NM nvarchar(50), Employee_ID int PRIMARY KEY, Repor tsTo int) --insert some data, build a reporting tree INSERT INTO Employee_Tree VALUES('Richard', 1, NULL) INSERT INTO Employee_Tree VALUES('Stephen', 2, 1) INSERT INTO Employee_Tree VALUES('Clemens', 3, 2) INSERT INTO Employee_Tree VALUES('Malek', 4, 2) INSERT INTO Employee_Tree VALUES('Goksin', 5, 4) INSERT INTO Employee_Tree VALUES('Kimberly', 6, 1) INSERT INTO Employee_Tree VALUES('Ramesh', 7, 5)
|
Our table looks like this:
Employee_NM | Employee_ID | ReportsTo |
---|
Richard | 1 | Null |
Stephen | 2 | 1 |
Clemens | 3 | 2 |
Malek | 4 | 2 |
Goksin | 5 | 4 |
Kimberly | 6 | 1 |
Ramesh | 7 | 5 |
Here’s the recursive query to determine which employees report to Stephen:
Listing 8. Recursive CTE query
--Recursive Query WITH SimpleRecursive(Employee_NM, Employee_ID, ReportsTo) AS (SELECT Employee_NM, Employee_ID, ReportsTo FROM Employee_Tree WHERE Employee_ID = 2 UNION ALL SELECT p.Employee_NM, p.Employee_ID, p.ReportsTo FROM Employee_Tree p INNER JOIN SimpleRecursive A ON A.Employee_ID = P.ReportsTo ) SELECT sr.Employee_NM AS Employee, et.Employee_NM AS Boss FROM SimpleRecursive sr INNER JOIN Employee_Tree et ON sr.ReportsTo = et.Employee_ID
|
Here are the results:
Employee Boss
-------------------
Stephen Richard
Clemens Stephen
Malek Stephen
Goksin Malek
Ramesh Goskin
This recursion starts where Employee_ID = 2 (the Anchor Member or the first SELECT). It picks up that record and then, using the Recursive Member (the SELECT after the UNION ALL),
picks up all the records that report to Stephen and that record’s
children. (Goksin reports to Malek, and Malek reports to Stephen.) Each
subsequent recursion tries to find more children that have as parents
the employees found by the previous recursion. Eventually the recursion
returns no results, and that is what causes the recursion to stop (the
reason why Kimberly is not returned). If the anchor member is changed
to Employee_ID = 1, then Kimberly will also be returned in the results.
By
design, the recursive member keeps looking for children and can cycle
on indefinitely. If you suspect many cycles will occur and want to
limit the number of recursive invocations, you can specify the MAXRECURSION option right after the outer query using the OPTION clause:
This
option causes SQL Server 2005 to raise an error when the CTE exceeds
the specified limit. By default, the limit is 100 (if you’ve omitted
the option). To specify no option, you must set MAXRECURSION
to 0. You can also run the same query to find direct reports and
subordinates only one level deep (including direct reports Clemens and
Malek and Malek’s subordinate Goksin but skipping Ramesh, which is
three levels deep).
Listing 9. Recursive query with MAXRECURSION
--Recursive Query with MAXRECURSION WITH SimpleRecursive(Employee_NM, Employee_ID, ReportsTo) AS (SELECT Employee_NM, Employee_ID, ReportsTo FROM Employee_Tree WHERE Employee_ID = 2 UNION ALL SELECT p.Employee_NM, p.Employee_ID, p.ReportsTo FROM Employee_Tree p INNER JOIN SimpleRecursive A ON A.Employee_ID = p.ReportsTo ) SELECT sr.Employee_NM AS Employee, et.Employee_NM AS Boss FROM SimpleRecursive sr INNER JOIN Employee_Tree et ON sr.ReportsTo = et.Employee_ID OPTION(MAXRECURSION 2)
|
Here are the results:
Employee Boss
-------------------
Stephen Richard
Clemens Stephen
Malek Stephen
Goksin Malek
You will also see that the query raises the following error message:
Msg 530, Level 16, State 1, Line 2
The statement terminated. The maximum recursion 2 has been exhausted before statement completion.
One way to avoid the exception is to use a generated column to keep track of the level you are on and include that in the WHERE clause instead of MAXRECURSION. The following example returns the same data as the previous example but without the error.
Listing 10. Controlling recursion without MAXRECURSION
WITH SimpleRecursive(Employee_NM, Employee_ID, ReportsTo, SubLevel)
AS
(SELECT Employee_NM, Employee_ID, ReportsTo, 0
FROM Employee_Tree WHERE Employee_ID = 2
UNION ALL
SELECT p.Employee_NM, p.Employee_ID, p.ReportsTo, SubLevel + 1
FROM Employee_Tree p INNER JOIN
SimpleRecursive A ON A.Employee_ID = p.ReportsTo
)
SELECT sr.Employee_NM AS Employee, et.Employee_NM AS Boss
FROM SimpleRecursive sr INNER JOIN Employee_Tree et
ON sr.ReportsTo = et.Employee_ID
WHERE SubLevel <=2