Using RANK, DENSE_RANK and NTILE
The ROW_NUMBER function returned a sequential number based on the number of rows in the given “partition” and ordered by the some value. The RANK AND DENSE_RANK functions provide more complex numbering based on the values themselves, not just the number of rows, whereas the NTILE
function generates numbers by dividing the rows into a given number of
groups based on the total number of rows (or the number of rows in the
partition if the <partition_by_clause>
is used) and then
numbering each row with its group number. We will explain each of the
functions in more detail and then present a single sample query that
shows their values.
The RANK function generates numbers based on the values in the columns specified by the <order_by_clause>. The RANK
value for a given row will be one greater than the number of rows that
precede it in the current partition, except when it has the same value
in the order by columns as the previous row, in which case it will have
the same rank value as the previous row. This means that “gaps” can
occur in the RANK
values if one or more rows have duplicate values as other rows in
columns being ranked. The following set of rows is an excerpt of a
result set that shows the gaps that can occur in the rank value with the
RANK function call of
RANK() OVER (PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice):
SubName ProdName Price ROW RANK
------- ---------------------- ------- --- ----
Gloves Half-Finger Gloves, S 24.49 1 1
Gloves Half-Finger Gloves, M 24.49 2 1
Gloves Half-Finger Gloves, L 24.49 3 1
Gloves Full-Finger Gloves, S 37.99 4 4
Gloves Full-Finger Gloves, M 37.99 5 4
Gloves Full-Finger Gloves, L 37.99 6 4
Notice how the RANK value for the first three rows is the same (1). The column being ranked is the ListPrice column (as specified in the ORDER BY
ListPrice syntax). The first three
rows all have the same value for the ListPrice (24.49), so they all get
the same rank value. The fourth row, however, gets a rank value that is
one greater than the number of rows that preceded it in the current
partition. In this case, three rows preceded it, so it gets a value of 3
+ 1 or 4. The rows that follow it, however, again have the same
ListPrice value (37.99), so they get the same rank value as the first
row with that value (4). The end result is that gaps can show up in the
ranking value. If you don’t want the gaps, use DENSE_RANK.
DENSE RANK is similar to RANK in that it generates its ranking based on the column values specified by the <order_by_clause>, but it does not introduce gaps. DENSE RANK
will generate rank values that are one greater than the previous rank
value except where two rows have the same values in the columns being
ranked. In that case, rows with the same values in the ranked columns
will have the same rank values. Because subsequent DENSE RANK values are either the same or one greater than previous DENSE RANK values in the partition, no “gaps” appear. Here is the same result set excerpt as was shown for the RANK function above, but this time we have added a column with the DENSE_RANK function call of DENSE_RANK() OVER (PARTITION BY S.
ProductSubcategoryID ORDER
SubName ProdName Price ROW RANK DENSE
------- ---------------------- ------- --- ---- -----
Gloves Half-Finger Gloves, S 24.49 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1
Gloves Full-Finger Gloves, S 37.99 4 4 2
Gloves Full-Finger Gloves, M 37.99 5 4 2
Gloves Full-Finger Gloves, L 37.99 6 4 2
You can see the difference between RANK and DENSE_RANK values. When rows have distinct values in the columns being ranked, RANK generates a value that is one greater than the number of rows that precede it in the current partition, whereas DENSE_RANK generates a value that is one greater than the DENSE_RANK value
that precedes it in the current partition. Both behave the same way
when a row has the same value as the previous row for the column being
ranked in that they receive the same rank value as the previous row.
Also, if a given partition had no rows with duplicate values in the
columns being ranked RANK and DENSE_RANK would return the same values for the partition.
The NTILE
is based more on the number of rows in the current partition than on
the actual values in the columns being ranked. In this way, it is more
akin to ROW_NUMBER than to RANK. DENSE_RANK NTILE’s
ranking isn’t based on the magnitude of the actual value in the column
being ranked, but rather the value’s sort order position in the current
partition. NTILE
is different from the other ranking functions in that it does expect an
argument to be passed in. The argument is an integer value that
specifies how many “groups” or “tiles” each partition should be broken
into. Following is again the same result set excerpt we looked at for
the RANK and DENSE_RANK functions above, but this time with a call to the NTILE function of NTILE
(3)
OVER (PARTITION BY S. Name ORDER BY ListPrice) :
SubName ProdName Price ROW RANK DENSE NTILE
------- --------------------- ----- --- ---- ----- -----
Gloves Half-Finger Gloves, S 24.49 1 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1 2
Gloves Full-Finger Gloves, S 37.99 4 4 2 2
Gloves Full-Finger Gloves, M 37.99 5 4 2 3
Gloves Full-Finger Gloves, L 37.99 6 4 2 3
There are a total of six rows in the Gloves subcategory. We called the NTILE
function with an argument of “3.” This means it has to break the six
rows into three groups. Six divided by three is two, meaning we will
have three groups with two rows each. The groups are then numbered based
on the values in the columns specified in the <order_by_clause>. The first two rows are in group 1, the second two rows are in group 2, and so on.
If the number of rows is
not evenly divisible by the number of groups requested, the additional
rows are distributed among the lower order groups. This means that lower
groups will have one row greater in each row than the higher groups
depending on how many “remainder” rows there were. For example, if we
called the above NTILE function with an argument of 4 rather than 3, we would get the following result:
SubName ProdName Price ROW RANK DENSE NTILE
------- --------------------- ------ --- ---- ----- -----
Gloves Half-Finger Gloves, S 24.49 1 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1 2
Gloves Full-Finger Gloves, S 37.99 4 4 2 2
Gloves Full-Finger Gloves, M 37.99 5 4 2 3
Gloves Full-Finger Gloves, L 37.99 6 4 2 4
Pay close attention to the NTILE
value of the last row; it is a “4” rather than a “3” as before. With
six rows being divided into four groups, we end up with four groups of
one row each and two rows remaining. The two additional rows are
distributed to the lower groups (group one and two) giving them each two
rows, and the higher two groups (groups three and four) remain at one
row only.
Warning
The NTILE()
function is the only one of the four ranking functions that requires or
even accepts a parameter value. Don’t be fooled by versions of the
other ranking functions that have parameter values being passed in on
the test.
That about covers ranking functions now that we have talked about all four of the ranking functions, ROW_NUMBER, RANK, DENSE_RANK, and NTILE. To wrap it up, let’s look at a final query that shows all of them together so that we can easily compare their results:
USE AdventureWorks2008;
GO
SELECT
S.Name AS SubName, P.Name ProdName, ListPrice AS Price,
ROW_NUMBER() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS ROW,
RANK() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS RANK,
DENSE_RANK() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS DENSE,
NTILE(3) OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS NTILE
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE S.ProductSubCategoryID IN (5,18,20)
ORDER BY S.Name, ListPrice;
SubName ProdName Price ROW RANK DENSE NTILE
---------- ---------------------- ------- --- ---- ----- -----
Bib-Shorts Men's Bib-Shorts, S 89.99 1 1 1 1
Bib-Shorts Men's Bib-Shorts, M 89.99 2 1 1 2
Bib-Shorts Men's Bib-Shorts, L 89.99 3 1 1 3
Bottom Brackets LL Bottom Bracket 53.99 1 1 1 1
Bottom Brackets ML Bottom Bracket 101.24 2 2 2 2
Bottom Brackets HL Bottom Bracket 121.49 3 3 3 3
Gloves Half-Finger Gloves, S 24.49 1 1 1 1
Gloves Half-Finger Gloves, M 24.49 2 1 1 1
Gloves Half-Finger Gloves, L 24.49 3 1 1 2
Gloves Full-Finger Gloves, S 37.99 4 4 2 2
Gloves Full-Finger Gloves, M 37.99 5 4 2 3
Gloves Full-Finger Gloves, L 37.99 6 4 2 3