Ranking
functions are built-in system functions that allow us to include
numeric columns in our result sets where the value in the column
reflects the “rank” of the row. We determine how to rank the rows by
choosing first which of the four possible ranking functions (ROW_NUMBER, RANK, DENSE_RANK, or NTILE
E)
we wish to use, what value the rank will be based on, and optionally
over what ranges or “partitions” we want the ranking to be performed.
All of the functions follow the same basic format:
<ranking_function > OVER ( [ <partition_by_clause> ] <order_by_clause> )
We’ll look at the general use of the functions by starting with the first one, ROW_NUMBER.
Using ROW_NUMBER
We will start with the ROW_NUMBER
function because it is the most basic of the four ranking functions. It
simply generates a numeric value based on the number of rows in the
result.
The following examples of
the ranking functions will build on a query that returns products from
three specific subcategories using the Product and ProductSubcategory tables in the AdventureWorks.Production
schema. The subcategories we see in the query are Bib-shorts, Bottom
Brackets, and Gloves. These subcategories were chosen to help keep the
result set small for demo purposes but still provide a set of data that
can be used to show all the ranking functions. Here is the initial query
we will use without any ranking function yet used:
USE AdventureWorks2008;
GO
SELECT
S.Name AS SubName, P.Name ProdName, ListPrice AS Price
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE S.ProductSubCategoryID IN (5,18,20)
ORDER BY S.Name, ListPrice;
The ROW_NUMBER function allows us to return a sequential row number value in our result sets. We have to indicate to the ROW_NUMBER
function the order in which we want the rows to be numbered. For
example, should they be numbered in order of list price, or should they
be listed by some other sort order? We state our sorting desires using
required <order_by_clause>. Because the ranking function gets its own <order_by_clause>,
the ordering of the row number sequence can be different than the
ordering of result set. In all of the examples, we will order the
ranking functions by the ListPrice of the products where lower list
prices have a lower rank value and higher list prices will have a higher
rank value. Here is the same query with the ROW_NUMBER function added and its results:
USE AdventureWorks2008
GO
SELECT
S.Name AS SubName, P.Name ProdName, ListPrice AS Price,
ROW_NUMBER() OVER (ORDER BY ListPrice) AS ROW
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE S.ProductSubCategoryID IN (5,18,20)
ORDER BY S.Name, ListPrice;
SubName ProdName Price ROW
---------------- ---------------------- ------- ---
Bib-Shorts Men's Bib-Shorts, S 89.99 8
Bib-Shorts Men's Bib-Shorts, M 89.99 9
Bib-Shorts Men's Bib-Shorts, L 89.99 10
Bottom Brackets LL Bottom Bracket 53.99 7
Bottom Brackets ML Bottom Bracket 101.24 11
Bottom Brackets HL Bottom Bracket 121.49 12
Gloves Half-Finger Gloves, S 24.49 1
Gloves Half-Finger Gloves, M 24.49 2
Gloves Half-Finger Gloves, L 24.49 3
Gloves Full-Finger Gloves, S 37.99 4
Gloves Full-Finger Gloves, M 37.99 5
Gloves Full-Finger Gloves, L 37.99 6
Notice that the values in the ROW
column are not ordered by the position of the rows, but rather by the
magnitude of the ListPrice value. This is because the call to the ROW_NUMBER
function indicated that the ranking should be ordered by the ListPrice,
whereas the rows themselves are being sorted first by the subcategory
name, then by list price. The ROW_NUMBER function call has been italicized in the preceding query to help you find it easily.
If we want the row number to number the rows in the same order as they are returned, we need to ask the ROW_NUMBER function to order its values using the same ordering as the ORDER BY clause on the query itself (S.Name, P.ListPrice), as follows:
USE AdventureWorks2008
GO
SELECT
S.Name AS SubName, P.Name ProdName, ListPrice AS Price,
ROW_NUMBER() OVER
(ORDER BY S.Name, ListPrice) AS ROW
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE S.ProductSubCategoryID IN (5,18,20)
ORDER BY S.Name, ListPrice;
SubName ProdName Price ROW
---------------- --------------------- ------ ----
Bib-Shorts Men's Bib-Shorts, S 89.99 1
Bib-Shorts Men's Bib-Shorts, M 89.99 2
Bib-Shorts Men's Bib-Shorts, L 89.99 3
Bottom Brackets LL Bottom Bracket 53.99 4
Bottom Brackets ML Bottom Bracket 101.24 5
Bottom Brackets HL Bottom Bracket 121.49 6
... ... ... ...
So
far, the row numbers generated are based on the entire result set (all
12 rows). We can break the ranking functions down into different subsets
or “partitions” of the total number of rows much in the same way that
we can group rows with the group by clause when using aggregate
functions. We indicate what rows are in the same partition by stating
the columns whose values the partitions will be based on in the <partition_by_clause>. For example, if we wanted to have the row number start over with each subcategory, we would use the following query:
USE AdventureWorks2008
GO
SELECT
S.Name AS SubName, P.Name ProdName, ListPrice AS Price,
ROW_NUMBER() OVER
(PARTITION BY S.ProductSubcategoryID ORDER BY ListPrice) AS ROW
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS S
ON P.ProductSubcategoryID = S.ProductSubcategoryID
WHERE S.ProductSubCategoryID IN (5,18,20)
ORDER BY S.Name, ListPrice;
SubName ProdName Price ROW
--------------- --------------------- ------ ---
Bib-Shorts Men's Bib-Shorts, S 89.99 1
Bib-Shorts Men's Bib-Shorts, M 89.99 2
Bib-Shorts Men's Bib-Shorts, L 89.99 3
Bottom Brackets LL Bottom Bracket 53.99 1
Bottom Brackets ML Bottom Bracket 101.24 2
Bottom Brackets HL Bottom Bracket 121.49 3
Gloves Half-Finger Gloves, S 24.49 1
Gloves Half-Finger Gloves, M 24.49 2
Gloves Half-Finger Gloves, L 24.49 3
Gloves Full-Finger Gloves, S 37.99 4
Gloves Full-Finger Gloves, M 37.99 5
Gloves Full-Finger Gloves, L 37.99 6
Notice that the values in the ROW column now start back at one with each new subcategory. This is because of the <partition_by_clause> of PARTITION BY S.ProductSubcategoryID
.
The new ROW_NUMBER() function provides a much more flexible way to number rows in our result sets that does the IDENTITY() function.
The IDENTITY() function (similar to the IDENTITY property in a table column) provides a way to automatically number rows in a result set. However, the IDENTITY() function can only be used in SELECT ... INTO queries. The ROW_NUMBER() function, on the other hand, can be used in any SELECT statement and is quite configurable via the partition and order by clauses.
|
That is the basics of using ROW_NUMBER
and how to use the <order_by_clause> and
<partition_by_clause> for all ranking functions. Let’s now take a
look at those other functions.