DATABASE

SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 2) - Using RANK, DENSE_RANK and NTILE

1/24/2011 4:48:02 PM

Using RANK, DENSE_RANK and NTILE

The ROW_NUMBER function returned a sequential number based on the number of rows in the given “partition” and ordered by the some value. The RANK AND DENSE_RANK functions provide more complex numbering based on the values themselves, not just the number of rows, whereas the NTILE function generates numbers by dividing the rows into a given number of groups based on the total number of rows (or the number of rows in the partition if the <partition_by_clause> is used) and then numbering each row with its group number. We will explain each of the functions in more detail and then present a single sample query that shows their values.

The RANK function generates numbers based on the values in the columns specified by the <order_by_clause>. The RANK value for a given row will be one greater than the number of rows that precede it in the current partition, except when it has the same value in the order by columns as the previous row, in which case it will have the same rank value as the previous row. This means that “gaps” can occur in the RANK values if one or more rows have duplicate values as other rows in columns being ranked. The following set of rows is an excerpt of a result set that shows the gaps that can occur in the rank value with the RANK function call of

RANK() OVER (PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice):
SubName ProdName Price ROW RANK
------- ---------------------- ------- --- ----
Gloves Half-Finger Gloves, S 24.49 1 1
Gloves Half-Finger Gloves, M 24.49 2 1
Gloves Half-Finger Gloves, L 24.49 3 1
Gloves Full-Finger Gloves, S 37.99 4 4
Gloves Full-Finger Gloves, M 37.99 5 4
Gloves Full-Finger Gloves, L 37.99 6 4

Notice how the RANK value for the first three rows is the same (1). The column being ranked is the ListPrice column (as specified in the ORDER BY ListPrice syntax). The first three rows all have the same value for the ListPrice (24.49), so they all get the same rank value. The fourth row, however, gets a rank value that is one greater than the number of rows that preceded it in the current partition. In this case, three rows preceded it, so it gets a value of 3 + 1 or 4. The rows that follow it, however, again have the same ListPrice value (37.99), so they get the same rank value as the first row with that value (4). The end result is that gaps can show up in the ranking value. If you don’t want the gaps, use DENSE_RANK.

DENSE RANK is similar to RANK in that it generates its ranking based on the column values specified by the <order_by_clause>, but it does not introduce gaps. DENSE RANK will generate rank values that are one greater than the previous rank value except where two rows have the same values in the columns being ranked. In that case, rows with the same values in the ranked columns will have the same rank values. Because subsequent DENSE RANK values are either the same or one greater than previous DENSE RANK values in the partition, no “gaps” appear. Here is the same result set excerpt as was shown for the RANK function above, but this time we have added a column with the DENSE_RANK function call of DENSE_RANK() OVER (PARTITION BY S. ProductSubcategoryID ORDER

SubName      ProdName                  Price      ROW    RANK     DENSE
------- ---------------------- ------- --- ---- -----
Gloves Half-Finger Gloves, S 24.49 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1
Gloves Full-Finger Gloves, S 37.99 4 4 2
Gloves Full-Finger Gloves, M 37.99 5 4 2
Gloves Full-Finger Gloves, L 37.99 6 4 2

You can see the difference between RANK and DENSE_RANK values. When rows have distinct values in the columns being ranked, RANK generates a value that is one greater than the number of rows that precede it in the current partition, whereas DENSE_RANK generates a value that is one greater than the DENSE_RANK value that precedes it in the current partition. Both behave the same way when a row has the same value as the previous row for the column being ranked in that they receive the same rank value as the previous row. Also, if a given partition had no rows with duplicate values in the columns being ranked RANK and DENSE_RANK would return the same values for the partition.

The NTILE is based more on the number of rows in the current partition than on the actual values in the columns being ranked. In this way, it is more akin to ROW_NUMBER than to RANK. DENSE_RANK NTILE’s ranking isn’t based on the magnitude of the actual value in the column being ranked, but rather the value’s sort order position in the current partition. NTILE is different from the other ranking functions in that it does expect an argument to be passed in. The argument is an integer value that specifies how many “groups” or “tiles” each partition should be broken into. Following is again the same result set excerpt we looked at for the RANK and DENSE_RANK functions above, but this time with a call to the NTILE function of NTILE (3) OVER (PARTITION BY S. Name ORDER BY ListPrice) :

SubName     ProdName                 Price       ROW      RANK      DENSE       NTILE
------- --------------------- ----- --- ---- ----- -----
Gloves Half-Finger Gloves, S 24.49 1 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1 2
Gloves Full-Finger Gloves, S 37.99 4 4 2 2

Gloves Full-Finger Gloves, M 37.99 5 4 2 3
Gloves Full-Finger Gloves, L 37.99 6 4 2 3


There are a total of six rows in the Gloves subcategory. We called the NTILE function with an argument of “3.” This means it has to break the six rows into three groups. Six divided by three is two, meaning we will have three groups with two rows each. The groups are then numbered based on the values in the columns specified in the <order_by_clause>. The first two rows are in group 1, the second two rows are in group 2, and so on.

If the number of rows is not evenly divisible by the number of groups requested, the additional rows are distributed among the lower order groups. This means that lower groups will have one row greater in each row than the higher groups depending on how many “remainder” rows there were. For example, if we called the above NTILE function with an argument of 4 rather than 3, we would get the following result:

SubName      ProdName                 Price     ROW    RANK    DENSE    NTILE
------- --------------------- ------ --- ---- ----- -----
Gloves Half-Finger Gloves, S 24.49 1 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1 2
Gloves Full-Finger Gloves, S 37.99 4 4 2 2
Gloves Full-Finger Gloves, M 37.99 5 4 2 3
Gloves Full-Finger Gloves, L 37.99 6 4 2 4


Pay close attention to the NTILE value of the last row; it is a “4” rather than a “3” as before. With six rows being divided into four groups, we end up with four groups of one row each and two rows remaining. The two additional rows are distributed to the lower groups (group one and two) giving them each two rows, and the higher two groups (groups three and four) remain at one row only.

Warning

The NTILE() function is the only one of the four ranking functions that requires or even accepts a parameter value. Don’t be fooled by versions of the other ranking functions that have parameter values being passed in on the test.


That about covers ranking functions now that we have talked about all four of the ranking functions, ROW_NUMBER, RANK, DENSE_RANK, and NTILE. To wrap it up, let’s look at a final query that shows all of them together so that we can easily compare their results:

USE AdventureWorks2008;
GO
SELECT
S.Name AS SubName, P.Name ProdName, ListPrice AS Price,
ROW_NUMBER() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS ROW,
RANK() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS RANK,
DENSE_RANK() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS DENSE,
NTILE(3) OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS NTILE
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE S.ProductSubCategoryID IN (5,18,20)
ORDER BY S.Name, ListPrice;
SubName ProdName Price ROW RANK DENSE NTILE
---------- ---------------------- ------- --- ---- ----- -----
Bib-Shorts Men's Bib-Shorts, S 89.99 1 1 1 1
Bib-Shorts Men's Bib-Shorts, M 89.99 2 1 1 2
Bib-Shorts Men's Bib-Shorts, L 89.99 3 1 1 3
Bottom Brackets LL Bottom Bracket 53.99 1 1 1 1
Bottom Brackets ML Bottom Bracket 101.24 2 2 2 2
Bottom Brackets HL Bottom Bracket 121.49 3 3 3 3
Gloves Half-Finger Gloves, S 24.49 1 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1 2
Gloves Full-Finger Gloves, S 37.99 4 4 2 2
Gloves Full-Finger Gloves, M 37.99 5 4 2 3
Gloves Full-Finger Gloves, L 37.99 6 4 2 3

Other  
 
Video
Top 10
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
The latest Audi TT : New angles for TT
Era of million-dollar luxury cars
Game Review : Hearthstone - Blackrock Mountain
Game Review : Battlefield Hardline
Google Chromecast
Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
Michael Kors Designs Stylish Tech Products for Women
REVIEW
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone