SQL Server 2008 : Transact-SQL Programming - Ranking Functions

5/30/2011 5:26:32 PM

SQL Server 2005 introduced four new ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. These functions allow you to analyze data and provide ranking values to result rows of a query. For example, you might use these ranking functions for assigning sequential integer row IDs to result rows or for presentation, paging, or scoring purposes.

All four ranking functions follow a similar syntax pattern:

function_name() OVER(
  [PARTITION BY partition_by_list]
  ORDER BY order_by_list)

The ROW_NUMBER Function

The ROW_NUMBER function allows you to provide sequential integer values to the result rows of a query, based on the order of the rows in the result. The result set must be ordered using an OVER clause, with an ORDER BY clause as a variable.

The ROW_NUMBER function has been a feature long desired by SQL Server developers. For example, suppose you want to return the publishers and total number of titles per publisher and list the result rows, in descending order, with a numeric score assigned to each row. The query shown in Listing 1 generates the desired results by using the ROW_NUMBER function, specifying ordering over the num_titles column, in descending order.

Listing 1. Using ROW_NUMBER to Rank Publishers by Number of Titles

select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go

pub_id pub_name                                 num_titles  Rank
------ ---------------------------------------- ----------- --------------------
9911   Jones Jones and Johnson                  44          1
9904   Strawberry Publications                  34          2
9907   Incandescent Imprints                    33          3
9905   Gooseberry Titles                        32          4
9909   North American Press                     30          5
9912   Landlocked Books                         30          6
9913   Blackberry's                             28          7
9914   Normanskill Printing Company             28          8
9910   Sidney's Books and More                  28          9
9906   Tomato Books                             28          10
9903   Kumquat Technical Publishing             28          11

In this example, the publishers with the highest number of titles got row number 1, and the publisher with the tenth-highest number of titles got row number 10. The ROW_NUMBER function always generates a distinct row number for each row, according to the requested sort.

If the ORDER BY list specified within the OVER() option is not on a unique key, the ordering of the row numbers is nondeterministic. For publishers that may have the same number of titles, each row would be assigned a different unique row number. The sequence of the row numbers assigned to those publishers could be different in different invocations of the query. In the results for Listing 1 , for example, five different publishers have the same number of titles (28). Because SQL Server has to assign different row numbers to the different publishers, you should assume that the row numbers were assigned in arbitrary order among those publishers.

To ensure that the result is always deterministic, specify a unique ORDER BY list. For example, adding pub_id to the ORDER BY list ensures that in the case of a tie between publishers, the lowest pub_id is always assigned the lower row number, as shown in Listing 2.

Listing 2. Using a Unique ORDER BY List for Deterministic ROW_NUMBER Results

select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go
pub_id pub_name                                 num_titles  Rank
------ ---------------------------------------- ----------- --------------------
9911   Jones Jones and Johnson                  44          1
9904   Strawberry Publications                  34          2
9907   Incandescent Imprints                    33          3
9905   Gooseberry Titles                        32          4
9909   North American Press                     30          5
9912   Landlocked Books                         30          6
9903   Kumquat Technical Publishing             28          7
9906   Tomato Books                             28          8
9910   Sidney's Books and More                  28          9
9913   Blackberry's                             28          10
9914   Normanskill Printing Company             28          11

In the previous two examples, the sequence of row numbers is generated across the entire result set as one group. You can also have ranking values calculated independently within groups of rows as opposed to being calculated for all table rows as one group by using the PARTITION BY clause.

Partitioning by ROW_NUMBER()

PARTITION BY allows you to specify a list of expressions that identify the groups of rows for which ranking values should be calculated independently. For example, the query in Listing 3 assigns row numbers within each type of book separately, in num_titles and pub_id order.

Listing 3. Using PARTITION BY to Rank Rows Within Groups

Code View: Scroll / Show All

select top 20 WITH TIES p.pub_id, pub_name, type, count(*) as num_titles,
ROW_NUMBER () OVER (partition by type order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name, type
order by type, count(*) desc
go

pub_id pub_name                                 type         num_titles  Rank
------ ---------------------------------------- ------------ ----------- --------
9906   Tomato Books                             biography    4           1
9911   Jones Jones and Johnson                  biography    4           2
9905   Gooseberry Titles                        biography    2           3
9900   Boysenberry Books                        biography    1           4
9903   Kumquat Technical Publishing             biography    1           5
9904   Strawberry Publications                  biography    1           6
9909   North American Press                     biography    1           7
9913   Blackberry's                             biography    1           8
9914   Normanskill Printing Company             biography    1           9
9916   Nordome Titles                           biography    1           10
9918   Significant Titles Company               biography    1           11
1389   Algodata Infosystems                     business     3           1
0736   New Moon Books                           business     1           2
9911   Jones Jones and Johnson                  children     21          1
9914   Normanskill Printing Company             children     13          2
9905   Gooseberry Titles                        children     12          3
9901   GGG&G                                    children     11          4
9903   Kumquat Technical Publishing             children     11          5
9915   Beanplant General                        children     9           6
9900   Boysenberry Books                        children     8           7
9913   Blackberry's                             children     8           8

The RANK and DENSE_RANK Functions

The RANK and DENSE_RANK functions are similar to the ROW_NUMBER function in the sense that they also provide ranking values according to a specified sort. The difference is that rather than assign a unique ranking value to each row, RANK and DENSE_RANK assign the same ranking value to rows with the same values in the specified sort columns when the ORDER BY list is not unique.

The difference between RANK and DENSE_RANK is that with the DENSE_RANK function, there are no gaps in the ranking. The RANK function skips the next number if there is a tie in the ranking value. Listing 4 modifies the query shown in Listing 1 by replacing the ROW_NUMBER function with RANK and DENSE_RANK and provides a good example of the differences between the two.

Listing 4. Using RANK and DENSE_RANK

select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles,
RANK() OVER (order by count(*) DESC) as Rank,
DENSE_RANK() OVER (order by count(*) DESC) as Dense_Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go

pub_id pub_name                       num_titles  Rank  Dense_Rank
------ ------------------------------ ----------- ----- ----------
9911   Jones Jones and Johnson        44          1     1
9904   Strawberry Publications        34          2     2
9907   Incandescent Imprints          33          3     3
9905   Gooseberry Titles              32          4     4
9909   North American Press           30          5     5
9912   Landlocked Books               30          5     5
9913   Blackberry's                   28          7     6
9914   Normanskill Printing Company   28          7     6
9910   Sidney's Books and More        28          7     6
9906   Tomato Books                   28          7     6
9903   Kumquat Technical Publishing   28          7     6

Notice that in this result set, all publishers with the same number of titles get the same RANK and DENSE_RANK values.

Note

If the ORDER BY list for a ranking function is unique, ROW_NUMBER, RANK, and DENSE_RANK produce exactly the same values.

The NTILE Function

The NTILE function assigns a ranking value by separating the result rows of a query into a specified number of approximately even-sized groups. Each group of rows is assigned the same ranking number, starting with 1 for the first group, 2 for the second, and so on. You specify the number of groups you want the result set divided into as the argument to the NTILE function. The number of rows in a group is determined by dividing the total number of rows in the result set by the number of groups. If there’s a remainder, n, the first n groups have an additional row assigned to them. Listing 5 provides an example of using the NTILE function, so you can compare it to the ROW_NUMBER function.

Listing 5. Using the NTILE Function

select p.pub_id, pub_name, count(*) as num_titles,
NTILE(3) OVER (order by count(*) DESC) as NTILE,
ROW_NUMBER() OVER (order by count(*) DESC) as RowNum
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go

pub_id pub_name                                 num_titles  NTILE RowNum
------ ---------------------------------------- ----------- ----- ------
9911   Jones Jones and Johnson                  44          1     1
9904   Strawberry Publications                  34          1     2
9907   Incandescent Imprints                    33          1     3
9905   Gooseberry Titles                        32          1     4
9909   North American Press                     30          1     5
9912   Landlocked Books                         30          2     6
9913   Blackberry's                             28          2     7
9914   Normanskill Printing Company             28          2     8
9910   Sidney's Books and More                  28          2     9
9906   Tomato Books                             28          2     10
9903   Kumquat Technical Publishing             28          3     11
9902   Lemon Legal Publishing                   27          3     12
9901   GGG&G                                    25          3     13
9908   Springfield Publishing                   25          3     14
9900   Boysenberry Books                        23          4     15
9916   Nordome Titles                           22          4     16
9915   Beanplant General                        21          4     17
9917   BFG Books                                17          4     18
9918   Significant Titles Company               17          5     19
0877   Binnet & Hardley                         6           5     20
1389   Algodata Infosystems                     6           5     21
0736   New Moon Books                           5           5     22

In this example, NTILE is used to divide the result set into five groups. Because there are 22 rows in the publishers table, there are 4 rows in each group, with 2 left over. The 2 extra rows are added to the first two groups.

The NTILE function provides a way to generate a histogram with an even distribution of items for each step. In the previous example, the first step represents the publishers with the highest number of titles, and the last step represents the publishers with the lowest number of titles. You can use this information in a CASE expression to provide descriptive meaningful alternatives to the ranking numbers, as shown in Listing 6.

Listing 6. Using a CASE Expression to Provide Meaningful Labels to Ranking Values

select p.pub_id, pub_name, count(*) as num_titles,
    case NTILE(5) OVER (order by count(*) DESC)
        when 1 then 'Highest'
        when 2 then 'Above Average'
        when 3 then 'Average'
        when 4 then 'Below Average'
        when 5 then 'Lowest'
        end as Ranking
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by pub_id
go

pub_id pub_name                                 num_titles  Ranking
------ ---------------------------------------- ----------- -------------
0736   New Moon Books                           5           Lowest
0877   Binnet & Hardley                         6           Lowest
1389   Algodata Infosystems                     6           Lowest
9900   Boysenberry Books                        23          Below Average
9901   GGG&G                                    25          Average
9902   Lemon Legal Publishing                   27          Average
9903   Kumquat Technical Publishing             28          Average
9904   Strawberry Publications                  34          Highest
9905   Gooseberry Titles                        32          Highest
9906   Tomato Books                             28          Above Average
9907   Incandescent Imprints                    33          Highest
9908   Springfield Publishing                   25          Average
9909   North American Press                     30          Highest
9910   Sidney's Books and More                  28          Above Average
9911   Jones Jones and Johnson                  44          Highest
9912   Landlocked Books                         30          Above Average
9913   Blackberry's                             28          Above Average
9914   Normanskill Printing Company             28          Above Average
9915   Beanplant General                        21          Below Average
9916   Nordome Titles                           22          Below Average
9917   BFG Books                                17          Below Average
9918   Significant Titles Company               17          Lowest

Using Row Numbers for Paging Results

Typical uses for row numbers are for paging through the results of a query and for selecting a specific subset of rows from within the result set. Essentially, given a page size in terms of number of rows, and a page number, you can return the rows that belong to that given page.

For example, suppose you want to return the second page of rows from a query similar to the one shown in Listing 2 . Assuming a page size of five rows, the query shown in Listing 7 uses a CTE to first calculate the row numbers according to the ranking by number of titles, and then only those rows with numbers 6 through 10, which belong to the second page, are returned.

Listing 7. Using ROW_NUMBER to Page Through Results

with pub_titles as
(
select p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
)
select * from pub_titles
where Rank between 6 and 10
go

pub_id pub_name                                 num_titles  Rank
------ ---------------------------------------- ----------- --------------------
9912   Landlocked Books                         30          6
9903   Kumquat Technical Publishing             28          7
9906   Tomato Books                             28          8
9910   Sidney's Books and More                  28          9
9913   Blackberry's                             28          10

You could make this query more generic by using local variables for the page number and page size and using them to calculate the proper set of rows to return, as shown in Listing 8.

Listing 8. Using Local Variables for Determining Page Size When Paging Through Results

Code View: Scroll / Show All

declare @pagesize tinyint,
        @pagenum tinyint;
set @pagesize = 6;
set @pagenum = 2;

with pub_titles as
(
select p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
)
select * from pub_titles
where Rank between ((@pagenum - 1) * @pagesize) + 1
               and @pagenum * @pagesize
go

pub_id pub_name                                 num_titles  Rank
------ ---------------------------------------- ----------- --------------------
9903   Kumquat Technical Publishing             28          7
9906   Tomato Books                             28          8
9910   Sidney's Books and More                  28          9
9913   Blackberry's                             28          10
9914   Normanskill Printing Company             28          11
9902   Lemon Legal Publishing                   27          12

The example in Listing 8 is adequate for ad hoc requests when you’re interested in retrieving only one specific page of the result set. However, this approach is not adequate for most applications that would issue multiple requests for individual pages of data because each invocation of the query would require a complete scan of the table to calculate the row numbers. A more efficient method for when the user might repeatedly request different pages would be to first populate a temporary table with all the base table rows, including the calculated row numbers, and then create a clustered index on the column in the temp table that contains the row numbers. An example is presented in Listing 9.

Listing 9. Using a Temp Table for Paging Through Results

select p.pub_id, pub_name, count(*) as num_titles,
       ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
   into #paging_table
   from publishers p join titles t on p.pub_id = t.pub_id
   group by p.pub_id, p.pub_name
go
create unique clustered index idx1 on #paging_table(Rank)
go
declare @pagesize tinyint,
        @pagenum tinyint;
set @pagesize = 6;
set @pagenum = 2;
SELECT *
FROM #paging_table
WHERE Rank BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize
ORDER BY Rank
set @pagesize = 6;
set @pagenum = 4;
SELECT *
FROM #paging_table
WHERE Rank BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize
ORDER BY Rank
go

pub_id pub_name                                 num_titles  Rank
------ ---------------------------------------- ----------- --------------------
9903   Kumquat Technical Publishing             28          7
9906   Tomato Books                             28          8
9910   Sidney's Books and More                  28          9
9913   Blackberry's                             28          10
9914   Normanskill Printing Company             28          11
9902   Lemon Legal Publishing                   27          12

pub_id pub_name                                 num_titles  Rank
------ ---------------------------------------- ----------- --------------------
9918   Significant Titles Company               17          19
0877   Binnet & Hardley                         6           20
1389   Algodata Infosystems                     6           21
0736   New Moon Books                           5           22

If you are limiting the result set to a specific number of rows and are using a fixed page size, an alternative to using ROW_NUMBER would be to use the NTILE function to calculate the actual page numbers. For example, if you are using TOP to limit the result set to the first 500 rows and each page contains 10 rows, the total number of pages would be 500 / 10, or 50 pages. If you use 50 as the argument to the NTILE function, the query generates 50 distinct ranking values with 10 rows each. An example of this solution is presented in Listing 10.

Listing 10. Using NTILE to Generate Page Numbers

select TOP 500 t.title_id,
       left(title, 20) as title,
       sum(qty) as total_sales,
       NTILE(50) OVER(ORDER BY sum(qty) desc) AS pagenum
   into #title_list
   from titles t join sales s on t.title_id = s.title_id
   group by t.title_id, title
go
create clustered index page_index on #title_list(pagenum)
go
select * from #title_list
where pagenum = 11
go

title_id title                total_sales pagenum
-------- -------------------- ----------- --------------------
FI1704   Journey              295872      11
FI2784   Rhoda: A Life in Sto 295836      11
FI4524   The Unconsoled       295584      11
FI4554   The Spy Who Came in  295500      11
FI0897   Polar Star           295308      11
CH0126   Little Bear          295296      11
FI5040   The Tombs of Atuan   295284      11
PS1372   Computer Phobic AND  295172      11
FI7820   Tinker, Tailor, Sold 295092      11
FI2816   Zuckerman Unbound    294960      11
CH0623   The Black Cauldron   294960      11

Other

SQL Server 2008 : Transact-SQL Programming - Common Table Expressions

SQL Server 2008 : Transact-SQL Programming - The OUTPUT Clause

SQL Server 2008 : Transact-SQL Programming - TOP Enhancements

SQL Server 2008 : Transact-SQL Programming - The max Specifier

SQL Server 2008 : T-SQL Tips and Tricks (part 3) - Generating T-SQL Statements with T-SQL & De-Duping Data with Ranking Functions

SQL Server 2008 : T-SQL Tips and Tricks (part 2) - Using CONTEXT_INFO & Working with Outer Joins

SQL Server 2008 : T-SQL Tips and Tricks (part 1) - Date Calculations & Sorting Results with the GROUPING Function

SQL Server 2008 : General T-SQL Performance Recommendations

SQL Server 2008 : General T-SQL Coding Recommendations (part 2) - Avoid SQL Injection Attacks When Using Dynamic SQL & Comment Your T-SQL Code

SQL Server 2008 : General T-SQL Coding Recommendations (part 1) - Provide Explicit Column Lists & Qualify Object Names with a Schema Name