A common table expression (CTE) is an ANSI SQL-99
expression that produces a table that is referred to by name within the
context of a single query. The general syntax for a CTE is as follows:
WITH expression_name [ ( column_name [ ,...n ] ) ] AS ( CTE_query_definition )
The WITH clause, in effect, defines a table and its columns. Note that the syntax of the WITH
clause is similar to that of a view. You can think of a CTE as a
temporary view that lasts only for the life of the query that defines
the CTE. Listing 1
shows an example of a simple CTE. This CTE is used to return the
average and maximum sales quantities for each store. The CTE is then
joined to the sales table to return the average and maximum sales quantity for the store, along with sales records for a specific title_id.
Listing 1. An Example of a Simple CTE
with sales_avg (stor_id, avg_qty, max_qty) as (select stor_id, avg(qty), max(qty) from sales group by stor_id) select top 5 s.stor_id, s.ord_num, convert(varchar(10), ord_date, 101) as ord_date, qty, title_id, avg_qty, max_qty from sales s join sales_avg a on s.stor_id = a.stor_id where s.title_id = 'DR8514' go
stor_id ord_num ord_date qty title_id avg_qty max_qty ------- -------------------- ---------- ------ -------- ------- ------- A004 ONGGGGGGGGGGGGGGG 09/13/2006 1224 DR8514 1008 1716 A068 ONEEEEEEEEEEE 09/02/2007 1572 DR8514 961 1572 A071 ONWWWWWWWWWWWWWWWWWW 08/20/2006 1728 DR8514 948 1728 A161 ONDDDDDDDDDDDD 05/25/2006 624 DR8514 829 1668 A203 ONGGGGGGGGGGGGGGGGGG 11/16/2006 1572 DR8514 1056 1692
|
Note
If the WITH
clause for a CTE is not the first statement in the batch, you should
delimit it from the preceding statement by placing a semicolon (;) in front of it. The semicolon is used to avoid ambiguity with other uses of the WITH
clause (for example, for table hints). Including a semicolon is not
necessary in all cases, but it is recommended that you use it
consistently to avoid problems.
It is also possible to define
multiple CTEs in a single query, with each CTE delimited by a comma.
Each CTE is able to refer to previously defined CTEs. Listing 2 shows an example of a nested CTE that calculates the minimum, maximum, and difference of counts of store orders.
Listing 2. An Example of Multiple CTEs in a Single Query
WITH store_orders(stor_id, cnt) AS ( SELECT stor_id, COUNT(*) FROM sales GROUP BY stor_id ), MinMaxCTE(MN, MX, Diff) AS ( SELECT MIN(Cnt), MAX(Cnt), MAX(Cnt)-MIN(Cnt) FROM store_orders ) SELECT * FROM MinMaxCTE go
MN MX Diff ----------- ----------- ----------- 1 22 21
|
A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE statement that references some or all of the CTE columns. A CTE can also be specified in a CREATE VIEW statement as part of the defining SELECT statement of the view.
Listing 3 shows an example of a CTE used in a DELETE statement.
Listing 3. An Example of a CTE in a DELETE
with oldest_sales (stor_id, ord_num, ord_date) as (select top 100 stor_id, ord_num, ord_date from sales_big order by ord_date) delete sales_big from sales_big s, oldest_sales o where s.stor_id = o.stor_id and s.ord_num = o.ord_num and s.ord_date = o.ord_date go
|
Most valid SELECT statement constructs are allowed in a CTE, except the following:
Recursive Queries with CTEs
Nonrecursive CTEs
are ANSI SQL-99 compliant expressions that provide T-SQL coding
flexibility. However, for each nonrecursive CTE, there is usually
another T-SQL construct that can be used to achieve the same results
(for example, derived tables). The real power and capability of CTEs is
revealed when you use them to create recursive queries.
A recursive CTE can help simplify the code required to run a recursive query within a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW
statement. Recursive queries are often useful for expanding a hierarchy
stored in a relational table (for example, displaying employees in an
organizational chart). In previous versions of SQL Server, a recursive
query usually required using temporary tables, cursors, and logic to
control the flow of the recursive steps.
A CTE is considered
recursive when it refers to itself within the CTE definition. Recursive
CTEs are constructed from at least two queries. One is a nonrecursive
query, also referred to as the anchor member (AM). The other is the
recursive query, also referred to as the recursive member (RM). The
queries are combined using the UNION ALL operator.
The following pseudocode defines the basic structure of a recursive CTE:
WITH cte_name ( column_name [,...n] ) AS ( CTE_query_definition1 -- Anchor member (AM) is defined. UNION ALL CTE_query_definition2 -- Recursive member (RM) is referencing cte_name. ) -- Statement using the CTE SELECT col_list FROM cte_name ...
Logically, you can think of the algorithm implementing the recursive CTE as follows:
1. | The anchor member is activated, and the initial result set (R) is generated.
| 2. | The recursive member is activated, using the initial result set (Rn) as input and generating result set Rn+1.
| 3. | The logic of step 2 is run repeatedly, incrementing the step number (n) until an empty set is returned.
| 4. | The outer query is executed, getting the cumulative (UNION ALL) result of all the previous steps when referring to the recursive CTE.
|
You can have more than two members in a recursive CTE, but only the UNION ALL operator is allowed between a recursive member and another recursive or nonrecursive member. Other operators, such as UNION, are allowed only between nonrecursive members. Recursive CTEs also require an exact match of the columns in all members, including the same data type, length, and precision.
Listing 4
shows a simple recursive CTE that generates a list of sequential
numbers. Note that the AM generates the base result, and the RM
following the UNION ALL controls the recursion. It is important in this example that a valid endpoint be defined to avoid infinite recursion.
Listing 4. An Example of a Simple Recursive CTE
with numlist (val) as (select 1 union all select val + 1 from numlist where val < 10) select * from numlist go
val ----------- 1 2 3 4 5 6 7 8 9 10
|
The following sections present some examples and uses of recursive CTEs.
Using Recursive CTEs for Expanding a Hierarchy
For this hierarchy example, we use the parts table in the bigpubs2008 database. This table contains a simplified hierarchy of car parts, as shown in Figure 1.
In the parts table, any part that is a subpart of another part has the parent part ID stored in the parentpartid column. The parentpartid column is a foreign key that references the partid column. Therefore, the parentpartid must either correspond to a valid partid within the table or be NULL. For example, the car itself has NULL in the parentpartid column.
Following are some common requests that might be run on the parts table:
Return all the parts for the engine. Show me all parts that are two levels below the drivetrain. Show me all the parts in such a way that it is easy to see their hierarchical dependencies.
The first request is probably the most common one: returning a part (for example, the engine, which has partid = 2) and all subparts. The recursive CTE shown in Listing 5 provides a solution to this request.
Listing 5. A Recursive CTE to Return a Part and All Subparts
WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE partid = 2 -- Engine UNION ALL
SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P JOIN PartsCTE as PP ON P.parentpartid = PP.Partid ) SELECT PartID, Partname, ParentPartid, lvl FROM PartsCTE go
PartID Partname ParentPartid lvl ----------- ------------------------------ ------------ ----------- 2 Engine 1 0 5 Radiator 2 1 6 Intake Manifold 2 1 7 Exhaust Manifold 2 1 8 Carburetor 2 1 13 Piston 2 1 14 Crankshaft 2 1 21 Piston Rings 13 2 11 Float Valve 8 2
|
Notice that the lvl
value is repeatedly incremented with each recursive invocation of the
CTE. You can use this level counter to limit the number of iterations in
the recursion. For example, Listing 6 shows an example of a CTE that returns all parts two levels below the drivetrain.
Listing 6. A Recursive CTE to Return All Subparts Two Levels Below a Part
WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE partid = 1 -- Drivetrain UNION ALL
SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P JOIN PartsCTE as PP ON P.parentpartid = PP.Partid where lvl < 2 ) SELECT PartID, Partname, ParentPartid, lvl FROM PartsCTE where lvl = 2 go
PartID Partname ParentPartid lvl ----------- ------------------------------ ------------ ----------- 9 Flywheel 3 2 10 Clutch 3 2 16 Gear Box 3 2 5 Radiator 2 2 6 Intake Manifold 2 2 7 Exhaust Manifold 2 2 8 Carburetor 2 2 13 Piston 2 2 14 Crankshaft 2 2
|
In Listing 6, the filter WHERE lvl < 2 in the recursive member is used as a recursion termination check; recursion stops when lvl = 2. The filter on the outer query (WHERE lvl = 2) is used to remove all parts up to the second level. Logically, the filter in the outer query (lvl = 2)
is sufficient by itself to return only the desired rows, but for
performance reasons, you should include the filter in the recursive
member to stop the recursion as soon as two levels below the drivetrain
are returned.
SQL Server allows the use of local
variables in a CTE to help make the query more generic. For example,
you can use variables instead of constants for the part ID and level, as
shown in Listing 7.
Listing 7. Using Local Variables in a Recursive CTE
DECLARE @partid AS INT, @lvl AS INT; SET @partid = 22; -- Car SET @lvl = 2; -- two levels WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE partid = @partid UNION ALL
SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P JOIN PartsCTE as PP ON P.parentpartid = PP.Partid WHERE lvl < @lvl ) SELECT PartID, Partname, ParentPartid, lvl FROM PartsCTE Go
PartID Partname ParentPartid lvl ----------- ------------------------------ ------------ ----------- 22 Car NULL 0 1 DriveTrain 22 1 23 Body 22 1 24 Frame 22 1 2 Engine 1 2 3 Transmission 1 2 4 Axle 1 2 12 Drive Shaft 1 2
|
You can also use recursive
CTEs to perform aggregations, such as counting the total number of
subparts that make up each parent part, as shown in Listing 8.
Listing 8. Performing Aggregation with a Recursive CTE
WITH PartsCTE(parentpartid, lvl) AS ( SELECT parentpartid, 0 FROM PARTS WHERE parentpartid is not null UNION ALL SELECT P.parentpartid, lvl+1 FROM Parts as P JOIN PartsCTE as PP ON PP.parentpartid = P.Partid WHERE P.parentpartid is not null ) SELECT C.parentpartid, P.PartName, COUNT(*) AS cnt FROM PartsCTE C JOIN PArts P on C.ParentPartID = P.PartID GROUP BY C.parentpartid, P.PArtName go
parentpartid PartName cnt ------------ ------------------------------ ----------- 1 DriveTrain 20 2 Engine 8 3 Transmission 8 8 Carburetor 1 13 Piston 1 16 Gear Box 5 22 Car 23
|
In the example in Listing 8, the anchor member returns a row with the parentpartid for each part, being sure to filter out the NULL value in the parentpartid column because it is essentially the top of the hierarchy and represents no parent part. The recursive member returns the parentpartid of each parent of the previously returned parts, again excluding any NULL
values. Eventually, the CTE contains, for each part, as many
occurrences as their direct or indirect number of subparts. The outer
query is then left with the tasks of grouping the results by parentpartid and returning the count of occurrences. A join to Parts is included to get the corresponding partname for each parent part to provide more meaningful results.
Suppose you want to generate a
report that is a bit more readable, with the subparts sorted and
indented according to hierarchical dependencies. Listing 9 provides a way you could accomplish this.
Listing 9. Generating a Formatted Report with a Recursive CTE
WITH PartsCTE(partid, partname, parentpartid, lvl, sortcol) AS ( SELECT partid, partname, parentpartid, 0, cast(partid as varbinary(max)) FROM Parts WHERE partid = 22 UNION ALL SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1, CAST(sortcol + CAST(P.partid AS BINARY(4)) AS VARBINARY(max)) FROM Parts AS P JOIN PartsCTE AS PP ON P.parentpartID = PP.PartID ) SELECT REPLICATE('--', lvl) + right('>',lvl) + partname AS partname FROM PArtsCTE order by sortcol go
partname ----------------------------------- Car -->DriveTrain ---->Engine ------>Radiator ------>Intake Manifold ------>Exhaust Manifold ------>Carburetor -------->Float Valve ------>Piston -------->Piston Rings ------>Crankshaft ---->Transmission ------>Flywheel ------>Clutch ------>Gear Box -------->Reverse Gear -------->First Gear -------->Second Gear -------->Third Gear -------->Fourth Gear ---->Axle ---->Drive Shaft -->Body -->Frame
|
In this example, you use a varbinary string as the sortcol to sort subparts according to the partid value. The anchor member is the starting point, generating a binary value for the partid
of the root part. In each iteration, the recursive member appends the
current part ID, converted to a binary value, to the parent part ID’s sortcol. The outer query then sorts the result by sortcol, which groups the subparts under each immediate parent part.
Setting the MAXRECURSION Option
To help avoid infinite recursion in CTEs, SQL Server, by default, sets a MAXRECURSION value of 100. If a recursive CTE attempts to perform more than 100 recursions, it is aborted, with the following error message:
Msg 530, Level 16, State 1, Line 1 The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
You can override the default MAXRECURSION setting by using the OPTION(MAXRECURSION value) query hint to force termination of the query after a specific number of recursive iterations have been invoked. Listing 10 shows an example.
Listing 10. Controlling the Number of Recursions with MAXRECURSION
WITH PartsCTE(partid, partname, parentpartid, lvl) AS ( SELECT partid, partname, parentpartid, 0 FROM PARTS WHERE partid = 22 -- Car UNION ALL
SELECT P.partid, P.partname, P.parentpartid, PP.lvl+1 FROM Parts as P JOIN PartsCTE as PP ON P.partid = PP.Partid ) SELECT PartID, Partname, ParentPartid, lvl FROM PartsCTE OPTION (MAXRECURSION 10) go
Msg 530, Level 16, State 1, Line 2 The statement terminated. The maximum recursion 10 has been exhausted before statement completion.
|
Keep in mind that if you use MAXRECURSION
to control the number of levels of recursion in a CTE, your application
receives the error message. It is not considered good programming
practice to use code that returns errors in valid situations. Certain
applications may discard query results if an error message is received.
Instead, it is recommended that you use the level counter to limit
recursion, in Listing 7. You should use the MAXRECURSION hint as a safeguard against infinite loops due to bad data or as a coding safeguard.
|