Transact-SQL in SQL Server 2008 : GROUP BY Clause Enhancements

7/28/2011 3:49:17 PM

SQL Server 2008 introduces a number of enhancements and changes to the grouping aggregate relational result set. These changes include the following:

ROLLUP and CUBE operator syntax changes
New GROUPING SETS operator
New GROUPING_ID() function

ROLLUP and CUBE Operator Syntax Changes

The ROLLUP and CUBE operators produce additional aggregate groupings and are appended to the GROUP BY clause. Prior to SQL Server 2008, to include ROLLUP or CUBE groupings, you had to specify the WITH ROLLUP or WITH CUBE options in the GROUP BY clause after the list of grouping columns. In SQL Server 2008, the syntax now follows the ANSI standard for ROLLUP and CUBE; you first designate the ROLLUP or CUBE option and then provide the grouping columns to these operators as a comma-separated list enclosed in parentheses. The new syntax is

GROUP BY [ROLLUP | CUBE ( non-aggregate_column_list ) ]

Following are examples using the pre-2008 syntax:

SELECT type, pub_id, AVG(price) AS average
FROM titles
GROUP BY type, pub_id
WITH CUBE

SELECT pub_id, type, SUM(ytd_sales) as ytd_sales
   FROM dbo.titles
   where type like '%cook%' or type = 'business'
   GROUP BY type, pub_id
   WITH ROLLUP

An example of the new ANSI standard syntax supported in SQL Server 2008 is as follows:

SELECT type, pub_id, AVG(price) AS average
FROM titles
GROUP BY CUBE ( type, pub_id)

SELECT pub_id, type, SUM(ytd_sales) as ytd_sales
   FROM dbo.titles
   where type like '%cook%' or type = 'business'
   GROUP BY ROLLUP (type, pub_id)

Note

The old-style CUBE and ROLLUP syntax is still supported for backward-compatibility purposes but is being deprecated. You should convert any existing queries using the pre-2008 WITH CUBE or WITH ROLLUP syntax to the new syntax to ensure future compatibility.

GROUPING SETS

The CUBE and ROLLUP operators allow you to run a single query and generate multiple sets of groupings. However, the sets of groupings are fixed. For example, if you use GROUP BY ROLLUP (A, B, C), you get aggregates generated for the following groupings of nonaggregate columns:

GROUP BY A, B, C
GROUP BY A, B
GROUP BY A
A super-aggregate for all rows

If you use GROUP BY CUBE (A, B, C), you get aggregates generated for the following groupings of nonaggregate columns:

GROUP BY A, B, C
GROUP BY A, B
GROUP BY A, C
GROUP BY B, C
GROUP BY A
GROUP BY B
GROUP BY C
A super-aggregate for all rows

SQL Server 2008 introduces the GROUPING SETS operator in addition to the CUBE and ROLLUP operators for performing several groupings in a single query. With GROUPING SETS, only the specified groups are aggregated instead of the full set of aggregations generated by CUBE or ROLLUP. GROUPING SETS enables you to generate results with multiple groupings in a single query, without having to resort to writing multiple GROUP BY queries and combining the results using a UNION ALL statement.

The GROUPING SETS operator supports concatenating column groupings and an optional super aggregate row. The syntax for defining grouping sets is as follows:

GROUP BY [ GROUPING SETS ( ( ) | grouping_set_item | grouping_set_item_list
  [, ...n ]  ) ]

The GROUPING SETS items can be single columns or a list of columns. The null field list "( )" can also be used to generate a super-aggregate (that is, a grand total for the entire result set). A non-nested list of columns works as separate simple GROUP BY statements, which are then combined in an implied UNION ALL. A nested list of columns in parentheses within the GROUPING SETS item list works as a GROUP BY on that set of columns. Table 1 demonstrates examples of GROUPING SETS clauses and the corresponding groupings that the query generates.

Table 1. Grouping Sets Examples
GROUPING SETS Clause	Equivalent Statement
GROUP BY GROUPING SETS (A,B,C)	GROUP BY A UNION ALL GROUP BY B UNION ALL GROUP BY C
GROUP BY GROUPING SETS ((A,B,C))	GROUP BY A,B,C
GROUP BY GROUPING SETS (A,(B,C))	GROUP BY A UNION ALL GROUP BY B,C
GROUP BY GROUPING SETS ((A,C),(B,C))	GROUP BY A,C UNION ALL GROUP BY B,C

Listing 1 demonstrates how to use the GROUPING SETS operator to perform three groupings on three individual columns in a single query.

Listing 1. GROUPING SETS Example

/***
** Perform a grouping by type, grouping by pub_id, and grouping by price
***/
SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales
FROM titles
where pub_id < '9'
GROUP BY GROUPING SETS ( type, pub_id, price)
go

type         pub_id price                 ytd_sales
------------ ------ --------------------- -----------
NULL         NULL   NULL                  0
NULL         NULL   0.0006                111
NULL         NULL   0.0017                750
NULL         NULL   14.3279               4095
NULL         NULL   14.595                18972
NULL         NULL   14.9532               14294
NULL         NULL   14.9611               4095
NULL         NULL   15.894                40968
NULL         NULL   15.9329               3336
NULL         NULL   17.0884               2045
NULL         NULL   17.1675               8780
NULL         0736   NULL                  28286
NULL         0877   NULL                  44219
NULL         1389   NULL                  24941
business     NULL   NULL                  30788
mod_cook     NULL   NULL                  24278
popular_comp NULL   NULL                  12875
psychology   NULL   NULL                  9939
trad_cook    NULL   NULL                  19566

In the output in Listing 1 , the first 11 rows are the results grouped by price, the next 3 rows are grouped by pub_id, and the bottom 5 rows are grouped by type. Now, you can modify this query to include a super-aggregate for all rows by adding a null field list, as shown in Listing 2.

Listing 2. GROUPING SETS Example with Null Field List to Generate Super-Aggregate

SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales

FROM titles
where pub_id < '9'
GROUP BY GROUPING SETS ( type, pub_id, price, () )
go

type         pub_id price                 ytd_sales
------------ ------ --------------------- -----------
NULL         NULL   NULL                  0
NULL         NULL   0.0006                111
NULL         NULL   0.0017                750
NULL         NULL   14.3279               4095
NULL         NULL   14.595                18972
NULL         NULL   14.9532               14294
NULL         NULL   14.9611               4095
NULL         NULL   15.894                40968
NULL         NULL   15.9329               3336
NULL         NULL   17.0884               2045
NULL         NULL   17.1675               8780
NULL         NULL   NULL                  97446
NULL         0736   NULL                  28286
NULL         0877   NULL                  44219
NULL         1389   NULL                  24941
business     NULL   NULL                  30788
mod_cook     NULL   NULL                  24278
popular_comp NULL   NULL                  12875
psychology   NULL   NULL                  9939
trad_cook    NULL   NULL                  19566

If you look closely at the results in Listing 2 , you see there are two rows with NULL values for all three columns for type, pub_id, and price. How can you determine definitively which row is the super-aggregate of all three rows, and which is a row grouped by price where the value of price is NULL? This is where the new grouping_id() function comes in.

The grouping_id() Function

The grouping_id() function, new in SQL Server 2008, can be used to determine the level of grouping in a query using GROUPING SETS or the CUBE and ROLLUP operators. Unlike the GROUPING() function, which takes only a single column expression as an argument and returns a 1 or 0 to indicate whether that individual column is being aggregated, the grouping_id() function accepts multiple column expressions and returns a bitmap to indicate which columns are being aggregated for that row.

For example, you can add the grouping_id() and grouping() functions to the query in Listing 2 and examine the results (see Listing 3).

Listing 3. Using the grouping_id() Function

SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales,
grouping_id(type, pub_id, price) as grping_id,
grouping(type) type_rlp,
grouping(pub_id) pub_id_rlp,
grouping(price) price_rlp
FROM titles
where pub_id < '9'
GROUP BY GROUPING SETS ( type, pub_id, price, () )
go

type         pub_id price   ytd_sales grping_id type_rlp pub_id_rlp price_rlp
------------ ------ ------- --------- --------- -------- ---------- ---------
NULL         NULL   NULL    0         6         1        1          0
NULL         NULL   0.0006  111       6         1        1          0
NULL         NULL   0.0017  750       6         1        1          0
NULL         NULL   14.3279 4095      6         1        1          0
NULL         NULL   14.595  18972     6         1        1          0
NULL         NULL   14.9532 14294     6         1        1          0
NULL         NULL   14.9611 4095      6         1        1          0
NULL         NULL   15.894  40968     6         1        1          0
NULL         NULL   15.9329 3336      6         1        1          0
NULL         NULL   17.0884 2045      6         1        1          0
NULL         NULL   17.1675 8780      6         1        1          0
NULL         NULL   NULL    97446     7         1        1          1
NULL         0736   NULL    28286     5         1        0          1
NULL         0877   NULL    44219     5         1        0          1
NULL         1389   NULL    24941     5         1        0          1
business     NULL   NULL    30788     3         0        1          1
mod_cook     NULL   NULL    24278     3         0        1          1
popular_comp NULL   NULL    12875     3         0        1          1
psychology   NULL   NULL    9939      3         0        1          1
trad_cook    NULL   NULL    19566     3         0        1          1

Unlike the grouping() function, which takes only a single column name as an argument, the grouping_id() function accepts all columns that participate in any grouping set. The grouping_id() function produces an integer result that is a bitmap, where each bit represents a different column, producing a unique integer for each grouping set. The bits in the bitmap indicate whether the columns are being aggregated in the grouping set (bit value is 1) or if the column is used to determine the grouping set (bit value is 0) used to calculate the aggregate value.

The bit values are assigned to columns from right to left in the order the columns are listed in the grouping_id() function. For example, in the query in Listing 42.5, price is the rightmost bit value, bit 1; pub_id is assigned the next bit value, bit 2, and type is assigned the leftmost bit value, bit 3. When the grouping_id() value equals 6, that means the bits 2 and 3 are turned on (4 + 2 + 0 = 6). This indicates that the type and pub_id columns are being aggregated in the grouping set, and the price column defines the grouping set.

The grouping_id() column can thus be used to determine which of the two rows where type, pub_id, and price are all NULL is the row with the super-aggregate of all three columns (grouping_id = 7), and which row is an aggregate rolled up where the value of price is NULL (grouping_id = 6).

The values returned by the grouping_id() function can also be used for further filtering your grouping set results or for sorting your grouping set results, as shown in Listing 4.

Listing 4. Using the grouping_id() Function to Sort Results

SELECT type, pub_id, price, sum(isnull(ytd_sales, 0)) AS ytd_sales,
grouping_id(type, pub_id, price) as grping_id
FROM titles
where pub_id < '9'
GROUP BY GROUPING SETS ( type, pub_id, price, () )
order by grping_id
go

type         pub_id price    ytd_sales   grping_id
------------ ------ -------- ----------- -----------
business     NULL   NULL     30788       3
mod_cook     NULL   NULL     24278       3
popular_comp NULL   NULL     12875       3
psychology   NULL   NULL     9939        3
trad_cook    NULL   NULL     19566       3
NULL         0736   NULL     28286       5
NULL         0877   NULL     44219       5
NULL         1389   NULL     24941       5
NULL         NULL   NULL     0           6
NULL         NULL   0.0006   111         6
NULL         NULL   0.0017   750         6
NULL         NULL   14.3279  4095        6
NULL         NULL   14.595   18972       6
NULL         NULL   14.9532  14294       6
NULL         NULL   14.9611  4095        6
NULL         NULL   15.894   40968       6
NULL         NULL   15.9329  3336        6
NULL         NULL   17.0884  2045        6
NULL         NULL   17.1675  8780        6
NULL         NULL   NULL     97446       7