programming4us
programming4us
DATABASE

SQL Server 2008 : Transact-SQL Programming - Ranking Functions

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
5/30/2011 5:26:32 PM
SQL Server 2005 introduced four new ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. These functions allow you to analyze data and provide ranking values to result rows of a query. For example, you might use these ranking functions for assigning sequential integer row IDs to result rows or for presentation, paging, or scoring purposes.

All four ranking functions follow a similar syntax pattern:

function_name() OVER(
[PARTITION BY partition_by_list]
ORDER BY order_by_list)

The ROW_NUMBER Function

The ROW_NUMBER function allows you to provide sequential integer values to the result rows of a query, based on the order of the rows in the result. The result set must be ordered using an OVER clause, with an ORDER BY clause as a variable.

The ROW_NUMBER function has been a feature long desired by SQL Server developers. For example, suppose you want to return the publishers and total number of titles per publisher and list the result rows, in descending order, with a numeric score assigned to each row. The query shown in Listing 1 generates the desired results by using the ROW_NUMBER function, specifying ordering over the num_titles column, in descending order.

Listing 1. Using ROW_NUMBER to Rank Publishers by Number of Titles
select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go

pub_id pub_name num_titles Rank
------ ---------------------------------------- ----------- --------------------
9911 Jones Jones and Johnson 44 1
9904 Strawberry Publications 34 2
9907 Incandescent Imprints 33 3
9905 Gooseberry Titles 32 4
9909 North American Press 30 5
9912 Landlocked Books 30 6
9913 Blackberry's 28 7
9914 Normanskill Printing Company 28 8
9910 Sidney's Books and More 28 9
9906 Tomato Books 28 10
9903 Kumquat Technical Publishing 28 11



In this example, the publishers with the highest number of titles got row number 1, and the publisher with the tenth-highest number of titles got row number 10. The ROW_NUMBER function always generates a distinct row number for each row, according to the requested sort.

If the ORDER BY list specified within the OVER() option is not on a unique key, the ordering of the row numbers is nondeterministic. For publishers that may have the same number of titles, each row would be assigned a different unique row number. The sequence of the row numbers assigned to those publishers could be different in different invocations of the query. In the results for Listing 1, for example, five different publishers have the same number of titles (28). Because SQL Server has to assign different row numbers to the different publishers, you should assume that the row numbers were assigned in arbitrary order among those publishers.

To ensure that the result is always deterministic, specify a unique ORDER BY list. For example, adding pub_id to the ORDER BY list ensures that in the case of a tie between publishers, the lowest pub_id is always assigned the lower row number, as shown in Listing 2.

Listing 2. Using a Unique ORDER BY List for Deterministic ROW_NUMBER Results
select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go
pub_id pub_name num_titles Rank
------ ---------------------------------------- ----------- --------------------
9911 Jones Jones and Johnson 44 1
9904 Strawberry Publications 34 2
9907 Incandescent Imprints 33 3
9905 Gooseberry Titles 32 4
9909 North American Press 30 5
9912 Landlocked Books 30 6
9903 Kumquat Technical Publishing 28 7
9906 Tomato Books 28 8
9910 Sidney's Books and More 28 9
9913 Blackberry's 28 10
9914 Normanskill Printing Company 28 11



In the previous two examples, the sequence of row numbers is generated across the entire result set as one group. You can also have ranking values calculated independently within groups of rows as opposed to being calculated for all table rows as one group by using the PARTITION BY clause.

Partitioning by ROW_NUMBER()

PARTITION BY allows you to specify a list of expressions that identify the groups of rows for which ranking values should be calculated independently. For example, the query in Listing 3 assigns row numbers within each type of book separately, in num_titles and pub_id order.

Listing 3. Using PARTITION BY to Rank Rows Within Groups
select top 20 WITH TIES p.pub_id, pub_name, type, count(*) as num_titles,
ROW_NUMBER () OVER (partition by type order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name, type
order by type, count(*) desc
go

pub_id pub_name type num_titles Rank
------ ---------------------------------------- ------------ ----------- --------
9906 Tomato Books biography 4 1
9911 Jones Jones and Johnson biography 4 2
9905 Gooseberry Titles biography 2 3
9900 Boysenberry Books biography 1 4
9903 Kumquat Technical Publishing biography 1 5
9904 Strawberry Publications biography 1 6
9909 North American Press biography 1 7
9913 Blackberry's biography 1 8
9914 Normanskill Printing Company biography 1 9
9916 Nordome Titles biography 1 10
9918 Significant Titles Company biography 1 11
1389 Algodata Infosystems business 3 1
0736 New Moon Books business 1 2
9911 Jones Jones and Johnson children 21 1
9914 Normanskill Printing Company children 13 2
9905 Gooseberry Titles children 12 3
9901 GGG&G children 11 4
9903 Kumquat Technical Publishing children 11 5
9915 Beanplant General children 9 6
9900 Boysenberry Books children 8 7
9913 Blackberry's children 8 8


The RANK and DENSE_RANK Functions

The RANK and DENSE_RANK functions are similar to the ROW_NUMBER function in the sense that they also provide ranking values according to a specified sort. The difference is that rather than assign a unique ranking value to each row, RANK and DENSE_RANK assign the same ranking value to rows with the same values in the specified sort columns when the ORDER BY list is not unique.

The difference between RANK and DENSE_RANK is that with the DENSE_RANK function, there are no gaps in the ranking. The RANK function skips the next number if there is a tie in the ranking value. Listing 4 modifies the query shown in Listing 1 by replacing the ROW_NUMBER function with RANK and DENSE_RANK and provides a good example of the differences between the two.

Listing 4. Using RANK and DENSE_RANK
select top 10 WITH TIES p.pub_id, pub_name, count(*) as num_titles,
RANK() OVER (order by count(*) DESC) as Rank,
DENSE_RANK() OVER (order by count(*) DESC) as Dense_Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go

pub_id pub_name num_titles Rank Dense_Rank
------ ------------------------------ ----------- ----- ----------
9911 Jones Jones and Johnson 44 1 1
9904 Strawberry Publications 34 2 2
9907 Incandescent Imprints 33 3 3
9905 Gooseberry Titles 32 4 4
9909 North American Press 30 5 5
9912 Landlocked Books 30 5 5
9913 Blackberry's 28 7 6
9914 Normanskill Printing Company 28 7 6
9910 Sidney's Books and More 28 7 6
9906 Tomato Books 28 7 6
9903 Kumquat Technical Publishing 28 7 6

Notice that in this result set, all publishers with the same number of titles get the same RANK and DENSE_RANK values.

Note

If the ORDER BY list for a ranking function is unique, ROW_NUMBER, RANK, and DENSE_RANK produce exactly the same values.


The NTILE Function

The NTILE function assigns a ranking value by separating the result rows of a query into a specified number of approximately even-sized groups. Each group of rows is assigned the same ranking number, starting with 1 for the first group, 2 for the second, and so on. You specify the number of groups you want the result set divided into as the argument to the NTILE function. The number of rows in a group is determined by dividing the total number of rows in the result set by the number of groups. If there’s a remainder, n, the first n groups have an additional row assigned to them. Listing 5 provides an example of using the NTILE function, so you can compare it to the ROW_NUMBER function.

Listing 5. Using the NTILE Function
select p.pub_id, pub_name, count(*) as num_titles,
NTILE(3) OVER (order by count(*) DESC) as NTILE,
ROW_NUMBER() OVER (order by count(*) DESC) as RowNum
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by count(*) desc
go

pub_id pub_name num_titles NTILE RowNum
------ ---------------------------------------- ----------- ----- ------
9911 Jones Jones and Johnson 44 1 1
9904 Strawberry Publications 34 1 2
9907 Incandescent Imprints 33 1 3
9905 Gooseberry Titles 32 1 4
9909 North American Press 30 1 5
9912 Landlocked Books 30 2 6
9913 Blackberry's 28 2 7
9914 Normanskill Printing Company 28 2 8
9910 Sidney's Books and More 28 2 9
9906 Tomato Books 28 2 10
9903 Kumquat Technical Publishing 28 3 11
9902 Lemon Legal Publishing 27 3 12
9901 GGG&G 25 3 13
9908 Springfield Publishing 25 3 14
9900 Boysenberry Books 23 4 15
9916 Nordome Titles 22 4 16
9915 Beanplant General 21 4 17
9917 BFG Books 17 4 18
9918 Significant Titles Company 17 5 19
0877 Binnet & Hardley 6 5 20
1389 Algodata Infosystems 6 5 21
0736 New Moon Books 5 5 22



In this example, NTILE is used to divide the result set into five groups. Because there are 22 rows in the publishers table, there are 4 rows in each group, with 2 left over. The 2 extra rows are added to the first two groups.

The NTILE function provides a way to generate a histogram with an even distribution of items for each step. In the previous example, the first step represents the publishers with the highest number of titles, and the last step represents the publishers with the lowest number of titles. You can use this information in a CASE expression to provide descriptive meaningful alternatives to the ranking numbers, as shown in Listing 6.

Listing 6. Using a CASE Expression to Provide Meaningful Labels to Ranking Values
select p.pub_id, pub_name, count(*) as num_titles,
case NTILE(5) OVER (order by count(*) DESC)
when 1 then 'Highest'
when 2 then 'Above Average'
when 3 then 'Average'
when 4 then 'Below Average'
when 5 then 'Lowest'
end as Ranking
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
order by pub_id
go

pub_id pub_name num_titles Ranking
------ ---------------------------------------- ----------- -------------
0736 New Moon Books 5 Lowest
0877 Binnet & Hardley 6 Lowest
1389 Algodata Infosystems 6 Lowest
9900 Boysenberry Books 23 Below Average
9901 GGG&G 25 Average
9902 Lemon Legal Publishing 27 Average
9903 Kumquat Technical Publishing 28 Average
9904 Strawberry Publications 34 Highest
9905 Gooseberry Titles 32 Highest
9906 Tomato Books 28 Above Average
9907 Incandescent Imprints 33 Highest
9908 Springfield Publishing 25 Average
9909 North American Press 30 Highest
9910 Sidney's Books and More 28 Above Average
9911 Jones Jones and Johnson 44 Highest
9912 Landlocked Books 30 Above Average
9913 Blackberry's 28 Above Average
9914 Normanskill Printing Company 28 Above Average
9915 Beanplant General 21 Below Average
9916 Nordome Titles 22 Below Average
9917 BFG Books 17 Below Average
9918 Significant Titles Company 17 Lowest



Using Row Numbers for Paging Results

Typical uses for row numbers are for paging through the results of a query and for selecting a specific subset of rows from within the result set. Essentially, given a page size in terms of number of rows, and a page number, you can return the rows that belong to that given page.

For example, suppose you want to return the second page of rows from a query similar to the one shown in Listing 2. Assuming a page size of five rows, the query shown in Listing 7 uses a CTE to first calculate the row numbers according to the ranking by number of titles, and then only those rows with numbers 6 through 10, which belong to the second page, are returned.

Listing 7. Using ROW_NUMBER to Page Through Results
with pub_titles as
(
select p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
)
select * from pub_titles
where Rank between 6 and 10
go

pub_id pub_name num_titles Rank
------ ---------------------------------------- ----------- --------------------
9912 Landlocked Books 30 6
9903 Kumquat Technical Publishing 28 7
9906 Tomato Books 28 8
9910 Sidney's Books and More 28 9
9913 Blackberry's 28 10



You could make this query more generic by using local variables for the page number and page size and using them to calculate the proper set of rows to return, as shown in Listing 8.

Listing 8. Using Local Variables for Determining Page Size When Paging Through Results
declare @pagesize tinyint,
@pagenum tinyint;
set @pagesize = 6;
set @pagenum = 2;

with pub_titles as
(
select p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
)
select * from pub_titles
where Rank between ((@pagenum - 1) * @pagesize) + 1
and @pagenum * @pagesize
go

pub_id pub_name num_titles Rank
------ ---------------------------------------- ----------- --------------------
9903 Kumquat Technical Publishing 28 7
9906 Tomato Books 28 8
9910 Sidney's Books and More 28 9
9913 Blackberry's 28 10
9914 Normanskill Printing Company 28 11
9902 Lemon Legal Publishing 27 12



The example in Listing 8 is adequate for ad hoc requests when you’re interested in retrieving only one specific page of the result set. However, this approach is not adequate for most applications that would issue multiple requests for individual pages of data because each invocation of the query would require a complete scan of the table to calculate the row numbers. A more efficient method for when the user might repeatedly request different pages would be to first populate a temporary table with all the base table rows, including the calculated row numbers, and then create a clustered index on the column in the temp table that contains the row numbers. An example is presented in Listing 9.

Listing 9. Using a Temp Table for Paging Through Results
select p.pub_id, pub_name, count(*) as num_titles,
ROW_NUMBER () OVER (order by count(*) DESC, p.pub_id) as Rank
into #paging_table
from publishers p join titles t on p.pub_id = t.pub_id
group by p.pub_id, p.pub_name
go
create unique clustered index idx1 on #paging_table(Rank)
go
declare @pagesize tinyint,
@pagenum tinyint;
set @pagesize = 6;
set @pagenum = 2;
SELECT *
FROM #paging_table
WHERE Rank BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize
ORDER BY Rank
set @pagesize = 6;
set @pagenum = 4;
SELECT *
FROM #paging_table
WHERE Rank BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize
ORDER BY Rank
go

pub_id pub_name num_titles Rank
------ ---------------------------------------- ----------- --------------------
9903 Kumquat Technical Publishing 28 7
9906 Tomato Books 28 8
9910 Sidney's Books and More 28 9
9913 Blackberry's 28 10
9914 Normanskill Printing Company 28 11
9902 Lemon Legal Publishing 27 12

pub_id pub_name num_titles Rank
------ ---------------------------------------- ----------- --------------------
9918 Significant Titles Company 17 19
0877 Binnet & Hardley 6 20
1389 Algodata Infosystems 6 21
0736 New Moon Books 5 22



If you are limiting the result set to a specific number of rows and are using a fixed page size, an alternative to using ROW_NUMBER would be to use the NTILE function to calculate the actual page numbers. For example, if you are using TOP to limit the result set to the first 500 rows and each page contains 10 rows, the total number of pages would be 500 / 10, or 50 pages. If you use 50 as the argument to the NTILE function, the query generates 50 distinct ranking values with 10 rows each. An example of this solution is presented in Listing 10.

Listing 10. Using NTILE to Generate Page Numbers
select TOP 500 t.title_id,
left(title, 20) as title,
sum(qty) as total_sales,
NTILE(50) OVER(ORDER BY sum(qty) desc) AS pagenum
into #title_list
from titles t join sales s on t.title_id = s.title_id
group by t.title_id, title
go
create clustered index page_index on #title_list(pagenum)
go
select * from #title_list
where pagenum = 11
go

title_id title total_sales pagenum
-------- -------------------- ----------- --------------------
FI1704 Journey 295872 11
FI2784 Rhoda: A Life in Sto 295836 11
FI4524 The Unconsoled 295584 11
FI4554 The Spy Who Came in 295500 11
FI0897 Polar Star 295308 11
CH0126 Little Bear 295296 11
FI5040 The Tombs of Atuan 295284 11
PS1372 Computer Phobic AND 295172 11
FI7820 Tinker, Tailor, Sold 295092 11
FI2816 Zuckerman Unbound 294960 11
CH0623 The Black Cauldron 294960 11
Other  
  •  SQL Server 2008 : Transact-SQL Programming - Common Table Expressions
  •  SQL Server 2008 : Transact-SQL Programming - The OUTPUT Clause
  •  SQL Server 2008 : Transact-SQL Programming - TOP Enhancements
  •  SQL Server 2008 : Transact-SQL Programming - The max Specifier
  •  SQL Server 2008 : T-SQL Tips and Tricks (part 3) - Generating T-SQL Statements with T-SQL & De-Duping Data with Ranking Functions
  •  SQL Server 2008 : T-SQL Tips and Tricks (part 2) - Using CONTEXT_INFO & Working with Outer Joins
  •  SQL Server 2008 : T-SQL Tips and Tricks (part 1) - Date Calculations & Sorting Results with the GROUPING Function
  •  SQL Server 2008 : General T-SQL Performance Recommendations
  •  SQL Server 2008 : General T-SQL Coding Recommendations (part 2) - Avoid SQL Injection Attacks When Using Dynamic SQL & Comment Your T-SQL Code
  •  SQL Server 2008 : General T-SQL Coding Recommendations (part 1) - Provide Explicit Column Lists & Qualify Object Names with a Schema Name
  •  
    programming4us
     
     
    programming4us