SQL Server 2005 introduced four new ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE.
These functions allow you to analyze data and provide ranking values to
result rows of a query. For example, you might use these ranking
functions for assigning sequential integer row IDs to result rows or for
presentation, paging, or scoring purposes.
All four ranking functions follow a similar syntax pattern:
function_name() OVER(
[PARTITION BY partition_by_list]
ORDER BY order_by_list)
The ROW_NUMBER Function
The ROW_NUMBER
function allows you to provide sequential integer values to the result
rows of a query, based on the order of the rows in the result. The
result set must be ordered using an OVER clause, with an ORDER BY clause as a variable.
The ROW_NUMBER
function has been a feature long desired by SQL Server developers. For
example, suppose you want to return the publishers and total number of
titles per publisher and list the result rows, in descending order, with
a numeric score assigned to each row. The query shown in Listing 1 generates the desired results by using the ROW_NUMBER function, specifying ordering over the num_titles column, in descending order.
Listing 1. Using ROW_NUMBER to Rank Publishers by Number of Titles
select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go
pub_id pub_name num_titles Rank ------ ---------------------------------------- ----------- -------------------- 9911 Jones Jones and Johnson 44 1 9904 Strawberry Publications 34 2 9907 Incandescent Imprints 33 3 9905 Gooseberry Titles 32 4 9909 North American Press 30 5 9912 Landlocked Books 30 6 9913 Blackberry's 28 7 9914 Normanskill Printing Company 28 8 9910 Sidney's Books and More 28 9 9906 Tomato Books 28 10 9903 Kumquat Technical Publishing 28 11
|
In this example, the
publishers with the highest number of titles got row number 1, and the
publisher with the tenth-highest number of titles got row number 10. The
ROW_NUMBER function always generates a distinct row number for each row, according to the requested sort.
If the ORDER BY list specified within the OVER()
option is not on a unique key, the ordering of the row numbers is
nondeterministic. For publishers that may have the same number of
titles, each row would be assigned a different unique row number. The
sequence of the row numbers assigned to those publishers could be
different in different invocations of the query. In the results for Listing 1,
for example, five different publishers have the same number of titles
(28). Because SQL Server has to assign different row numbers to the
different publishers, you should assume that the row numbers were
assigned in arbitrary order among those publishers.
To ensure that the result is always deterministic, specify a unique ORDER BY list. For example, adding pub_id to the ORDER BY list ensures that in the case of a tie between publishers, the lowest pub_id is always assigned the lower row number, as shown in Listing 2.
Listing 2. Using a Unique ORDER BY List for Deterministic ROW_NUMBER Results
select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go pub_id pub_name num_titles Rank ------ ---------------------------------------- ----------- -------------------- 9911 Jones Jones and Johnson 44 1 9904 Strawberry Publications 34 2 9907 Incandescent Imprints 33 3 9905 Gooseberry Titles 32 4 9909 North American Press 30 5 9912 Landlocked Books 30 6 9903 Kumquat Technical Publishing 28 7 9906 Tomato Books 28 8 9910 Sidney's Books and More 28 9 9913 Blackberry's 28 10 9914 Normanskill Printing Company 28 11
|
In
the previous two examples, the sequence of row numbers is generated
across the entire result set as one group. You can also have ranking
values calculated independently within groups of rows as opposed to
being calculated for all table rows as one group by using the PARTITION BY clause.
Partitioning by ROW_NUMBER()
PARTITION BY allows you
to specify a list of expressions that identify the groups of rows for
which ranking values should be calculated independently. For example,
the query in Listing 3 assigns row numbers within each type of book separately, in num_titles and pub_id order.
Listing 3. Using PARTITION BY to Rank Rows Within Groups
select top 20 WITH TIES p.pub_id, pub_name, type, count(*) as num_titles, ROW_NUMBER () OVER (partition by type order by count(*) DESC, p.pub_id) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name, type order by type, count(*) desc go
pub_id pub_name type num_titles Rank ------ ---------------------------------------- ------------ ----------- -------- 9906 Tomato Books biography 4 1 9911 Jones Jones and Johnson biography 4 2 9905 Gooseberry Titles biography 2 3 9900 Boysenberry Books biography 1 4 9903 Kumquat Technical Publishing biography 1 5 9904 Strawberry Publications biography 1 6 9909 North American Press biography 1 7 9913 Blackberry's biography 1 8 9914 Normanskill Printing Company biography 1 9 9916 Nordome Titles biography 1 10 9918 Significant Titles Company biography 1 11 1389 Algodata Infosystems business 3 1 0736 New Moon Books business 1 2 9911 Jones Jones and Johnson children 21 1 9914 Normanskill Printing Company children 13 2 9905 Gooseberry Titles children 12 3 9901 GGG&G children 11 4 9903 Kumquat Technical Publishing children 11 5 9915 Beanplant General children 9 6 9900 Boysenberry Books children 8 7 9913 Blackberry's children 8 8
|
The RANK and DENSE_RANK Functions
The RANK and DENSE_RANK functions are similar to the ROW_NUMBER
function in the sense that they also provide ranking values according
to a specified sort. The difference is that rather than assign a unique
ranking value to each row, RANK and DENSE_RANK assign the same ranking value to rows with the same values in the specified sort columns when the ORDER BY list is not unique.
The difference between RANK and DENSE_RANK is that with the DENSE_RANK function, there are no gaps in the ranking. The RANK function skips the next number if there is a tie in the ranking value. Listing 4 modifies the query shown in Listing 1 by replacing the ROW_NUMBER function with RANK and DENSE_RANK and provides a good example of the differences between the two.
Listing 4. Using RANK and DENSE_RANK
select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles, RANK() OVER (order by count(*) DESC) as Rank, DENSE_RANK() OVER (order by count(*) DESC) as Dense_Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go
pub_id pub_name num_titles Rank Dense_Rank ------ ------------------------------ ----------- ----- ---------- 9911 Jones Jones and Johnson 44 1 1 9904 Strawberry Publications 34 2 2 9907 Incandescent Imprints 33 3 3 9905 Gooseberry Titles 32 4 4 9909 North American Press 30 5 5 9912 Landlocked Books 30 5 5 9913 Blackberry's 28 7 6 9914 Normanskill Printing Company 28 7 6 9910 Sidney's Books and More 28 7 6 9906 Tomato Books 28 7 6 9903 Kumquat Technical Publishing 28 7 6
|
Notice that in this result set, all publishers with the same number of titles get the same RANK and DENSE_RANK values.
Note
If the ORDER BY list for a ranking function is unique, ROW_NUMBER, RANK, and DENSE_RANK produce exactly the same values.
The NTILE Function
The NTILE function
assigns a ranking value by separating the result rows of a query into a
specified number of approximately even-sized groups. Each group of rows
is assigned the same ranking number, starting with 1 for the first group, 2 for the second, and so on. You specify the number of groups you want the result set divided into as the argument to the NTILE
function. The number of rows in a group is determined by dividing the
total number of rows in the result set by the number of groups. If
there’s a remainder, n, the first n groups have an additional row assigned to them. Listing 5 provides an example of using the NTILE function, so you can compare it to the ROW_NUMBER function.
Listing 5. Using the NTILE Function
select p.pub_id, pub_name, count(*) as num_titles, NTILE(3) OVER (order by count(*) DESC) as NTILE, ROW_NUMBER() OVER (order by count(*) DESC) as RowNum from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by count(*) desc go
pub_id pub_name num_titles NTILE RowNum ------ ---------------------------------------- ----------- ----- ------ 9911 Jones Jones and Johnson 44 1 1 9904 Strawberry Publications 34 1 2 9907 Incandescent Imprints 33 1 3 9905 Gooseberry Titles 32 1 4 9909 North American Press 30 1 5 9912 Landlocked Books 30 2 6 9913 Blackberry's 28 2 7 9914 Normanskill Printing Company 28 2 8 9910 Sidney's Books and More 28 2 9 9906 Tomato Books 28 2 10 9903 Kumquat Technical Publishing 28 3 11 9902 Lemon Legal Publishing 27 3 12 9901 GGG&G 25 3 13 9908 Springfield Publishing 25 3 14 9900 Boysenberry Books 23 4 15 9916 Nordome Titles 22 4 16 9915 Beanplant General 21 4 17 9917 BFG Books 17 4 18 9918 Significant Titles Company 17 5 19 0877 Binnet & Hardley 6 5 20 1389 Algodata Infosystems 6 5 21 0736 New Moon Books 5 5 22
|
In this example, NTILE is used to divide the result set into five groups. Because there are 22 rows in the publishers table, there are 4 rows in each group, with 2 left over. The 2 extra rows are added to the first two groups.
The NTILE function provides a way to
generate a histogram with an even distribution of items for each step.
In the previous example, the first step represents the publishers with
the highest number of titles, and the last step represents the
publishers with the lowest number of titles. You can use this
information in a CASE expression to provide descriptive meaningful alternatives to the ranking numbers, as shown in Listing 6.
Listing 6. Using a CASE Expression to Provide Meaningful Labels to Ranking Values
select p.pub_id, pub_name, count(*) as num_titles, case NTILE(5) OVER (order by count(*) DESC) when 1 then 'Highest' when 2 then 'Above Average' when 3 then 'Average' when 4 then 'Below Average' when 5 then 'Lowest' end as Ranking from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name order by pub_id go
pub_id pub_name num_titles Ranking ------ ---------------------------------------- ----------- ------------- 0736 New Moon Books 5 Lowest 0877 Binnet & Hardley 6 Lowest 1389 Algodata Infosystems 6 Lowest 9900 Boysenberry Books 23 Below Average 9901 GGG&G 25 Average 9902 Lemon Legal Publishing 27 Average 9903 Kumquat Technical Publishing 28 Average 9904 Strawberry Publications 34 Highest 9905 Gooseberry Titles 32 Highest 9906 Tomato Books 28 Above Average 9907 Incandescent Imprints 33 Highest 9908 Springfield Publishing 25 Average 9909 North American Press 30 Highest 9910 Sidney's Books and More 28 Above Average 9911 Jones Jones and Johnson 44 Highest 9912 Landlocked Books 30 Above Average 9913 Blackberry's 28 Above Average 9914 Normanskill Printing Company 28 Above Average 9915 Beanplant General 21 Below Average 9916 Nordome Titles 22 Below Average 9917 BFG Books 17 Below Average 9918 Significant Titles Company 17 Lowest
|
Using Row Numbers for Paging Results
Typical
uses for row numbers are for paging through the results of a query and
for selecting a specific subset of rows from within the result set.
Essentially, given a page size in terms of number of rows, and a page
number, you can return the rows that belong to that given page.
For example, suppose you want to return the second page of rows from a query similar to the one shown in Listing 2. Assuming a page size of five rows, the query shown in Listing 7
uses a CTE to first calculate the row numbers according to the ranking
by number of titles, and then only those rows with numbers 6 through 10,
which belong to the second page, are returned.
Listing 7. Using ROW_NUMBER to Page Through Results
with pub_titles as ( select p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name ) select * from pub_titles where Rank between 6 and 10 go
pub_id pub_name num_titles Rank ------ ---------------------------------------- ----------- -------------------- 9912 Landlocked Books 30 6 9903 Kumquat Technical Publishing 28 7 9906 Tomato Books 28 8 9910 Sidney's Books and More 28 9 9913 Blackberry's 28 10
|
You could make this query
more generic by using local variables for the page number and page size
and using them to calculate the proper set of rows to return, as shown
in Listing 8.
Listing 8. Using Local Variables for Determining Page Size When Paging Through Results
declare @pagesize tinyint, @pagenum tinyint; set @pagesize = 6; set @pagenum = 2;
with pub_titles as ( select p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name ) select * from pub_titles where Rank between ((@pagenum - 1) * @pagesize) + 1 and @pagenum * @pagesize go
pub_id pub_name num_titles Rank ------ ---------------------------------------- ----------- -------------------- 9903 Kumquat Technical Publishing 28 7 9906 Tomato Books 28 8 9910 Sidney's Books and More 28 9 9913 Blackberry's 28 10 9914 Normanskill Printing Company 28 11 9902 Lemon Legal Publishing 27 12
|
The example in Listing 8
is adequate for ad hoc requests when you’re interested in retrieving
only one specific page of the result set. However, this approach is not
adequate for most applications that would issue multiple requests for
individual pages of data because each invocation of the query would
require a complete scan of the table to calculate the row numbers. A
more efficient method for when the user might repeatedly request
different pages would be to first populate a temporary table with all
the base table rows, including the calculated row numbers, and then
create a clustered index on the column in the temp table that contains
the row numbers. An example is presented in Listing 9.
Listing 9. Using a Temp Table for Paging Through Results
select p.pub_id, pub_name, count(*) as num_titles, ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank into #paging_table from publishers p join titles t on p.pub_id = t.pub_id group by p.pub_id, p.pub_name go create unique clustered index idx1 on #paging_table(Rank) go declare @pagesize tinyint, @pagenum tinyint; set @pagesize = 6; set @pagenum = 2; SELECT * FROM #paging_table WHERE Rank BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize ORDER BY Rank set @pagesize = 6; set @pagenum = 4; SELECT * FROM #paging_table WHERE Rank BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize ORDER BY Rank go
pub_id pub_name num_titles Rank ------ ---------------------------------------- ----------- -------------------- 9903 Kumquat Technical Publishing 28 7 9906 Tomato Books 28 8 9910 Sidney's Books and More 28 9 9913 Blackberry's 28 10 9914 Normanskill Printing Company 28 11 9902 Lemon Legal Publishing 27 12
pub_id pub_name num_titles Rank ------ ---------------------------------------- ----------- -------------------- 9918 Significant Titles Company 17 19 0877 Binnet & Hardley 6 20 1389 Algodata Infosystems 6 21 0736 New Moon Books 5 22
|
If you are limiting the result set to a specific number of rows and are using a fixed page size, an alternative to using ROW_NUMBER would be to use the NTILE function to calculate the actual page numbers. For example, if you are using TOP
to limit the result set to the first 500 rows and each page contains 10
rows, the total number of pages would be 500 / 10, or 50 pages. If you
use 50 as the argument to the NTILE function, the query generates 50 distinct ranking values with 10 rows each. An example of this solution is presented in Listing 10.
Listing 10. Using NTILE to Generate Page Numbers
select TOP 500 t.title_id,
left(title, 20) as title,
sum(qty) as total_sales,
NTILE(50) OVER(ORDER BY sum(qty) desc) AS pagenum
into #title_list
from titles t join sales s on t.title_id = s.title_id
group by t.title_id, title
go
create clustered index page_index on #title_list(pagenum)
go
select * from #title_list
where pagenum = 11
go
title_id title total_sales pagenum
-------- -------------------- ----------- --------------------
FI1704 Journey 295872 11
FI2784 Rhoda: A Life in Sto 295836 11
FI4524 The Unconsoled 295584 11
FI4554 The Spy Who Came in 295500 11
FI0897 Polar Star 295308 11
CH0126 Little Bear 295296 11
FI5040 The Tombs of Atuan 295284 11
PS1372 Computer Phobic AND 295172 11
FI7820 Tinker, Tailor, Sold 295092 11
FI2816 Zuckerman Unbound 294960 11
CH0623 The Black Cauldron 294960 11