Databases hold data. Users sometimes want to perform
simple calculations or algorithms on that data to rank the results in a
specific order—like gold, silver, and bronze medals in the Olympics or
the top 10 customers by region. SQL Server 2005 introduces
functionality for using ranking expressions with your result set. You
can select a number of ranking algorithms, which are then applied to a
column that you specify. This comes in handy in .NET Framework
applications for paging and sorting in a grid, as well as in many other
scenarios.
ROW_NUMBER()
The most basic ranking function is ROW_NUMBER().
It returns a column as an expression that contains the row’s number in
the result set. This number is used only in the context of the result
set; if the result changes, the ROW_NUMBER() changes. The ROW_NUMBER()
expression takes an ORDER BY statement with the column you want to use for the row count and the new OVER operator, which links the ORDER BY to the specific ranking function you are using. The ORDER BYOVER clause replaces an ORDER BY at the end of the SQL statement. in the
The following simple example gives a row number to each row in the result set, ordering by SalesOrderID:
Listing 1. Row number ranking
USE AdventureWorks GO SELECT SalesOrderID, CustomerID, ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RunningCount FROM Sales.SalesOrderHeader WHERE SalesOrderID>10000
|
Results are shown here:
SalesOrderID CustomerID RunningCount
-------------------------------------
43659 676 1
43660 117 2
43661 442 3
43662 227 4
43663 510 5
43664 397 6
43665 146 7
43666 511 8
43667 646 9
[more rows]
The ranking functions order your result set by the fields specified in the ORDER BY statement contained in the OVER clause. Alternatively, you can include an additional ORDER BY statement in your result set; this optional statement is distinct from the ORDER BY clause in the OVERROW_NUMBER() function’s results are displayed in the order they are determined by in the additional ORDER BY statement, not by the ORDER BY statement contained within the OVER clause. The results can, therefore, be confusing, as in this example: expression. SQL Server 2005 allows this, but if you choose this option, the
--Row_Number using a unique value, different order by SELECT SalesOrderID, CustomerID, ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RunningCount FROM Sales.SalesOrderHeader WHERE SalesOrderID>10000 ORDER BY CustomerID --Different ORDER BY than in Row_Number
|
Here are the results:
SalesOrderID CustomerID RunningCount
------------------------------------------
43860 1 202
44501 1 843
45283 1 1625
46042 1 2384
46976 2 3318
47997 2 4339
49054 2 5396
50216 2 6558
51728 2 8070
57044 2 13386
63198 2 19540
69488 2 25830
65310 3 21652
71889 3 28231
53616 3 9958
[more rows]
As you can see, if you expect the results to be sorted by the OVER clause’s ORDER BY statement, you’d expect results ranked by SalesOrderID, when in fact they’re ordered by CustomerID.
If you choose the ROW_NUMBER()
function to run against a non-unique column that contains multiple
copies of the same value (also known as “ties,” such as the same amount
of items sold and the same time in a race), ROW_NUMBER( ) breaks the tie and still produces a running count so that no rows have the same number. For example, CustomerID
can repeat in this example, which will generate several ties; SQL
Server simply increases the running count for each row, regardless of
how many ties exist:
Listing 2. Row number ranking with ties
SELECT SalesOrderID, CustomerID, ROW_NUMBER() OVER (ORDER BY CustomerID) AS RunningCount FROM Sales.SalesOrderHeader WHERE SalesOrderID>10000
|
The results are shown here:
SalesOrderID CustomerID RunningCount
------------------------------------------------
43860 1 1
44501 1 2
45283 1 3
46042 1 4
46976 2 5
47997 2 6
49054 2 7
50216 2 8
51728 2 9
57044 2 10
63198 2 11
69488 2 12
44124 3 13
[more rows]
Grouping and Filtering with a Row_Number
When you want to include a GROUP BY function in your query, ranking functions do not work. The easy way around this limitation is to create your GROUP BY in a CTE and then perform your ranking on the results, as shown here:
Listing 3. Grouping by row number
USE AdventureWorks GO --rank by totaldue, summed --need a CTE to do the sum --so this example will have a --customerID summed with all of --their orders WITH CustomerSum AS ( SELECT CustomerID, SUM(totaldue) AS totalamt FROM Sales.SalesOrderHeader GROUP BY CustomerID ) --this appends a row_number to the end of the resultset SELECT *, ROW_NUMBER() OVER (ORDER BY totalamt DESC) AS RowNumber FROM CustomerSum
|
Here are the results:
CustomerID totalamt RowNumber
-----------------------------------------
678 1179857.4657 1
697 1179475.8399 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
155 1045197.0498 6
72 1005539.7181 7
[more rows]
To filter by a ROW_NUMBER, you have to put the ROW_NUMBER function in a CTE.
Listing 4. Filtering by row number
USE AdventureWorks GO --use a common table expression if you want --to filter by one of the rows that contain a --ranking function since ranking functions --are not allowed in where or having clauses WITH NumberRows AS ( SELECT SalesOrderID, CustomerID, ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RowNumber FROM Sales.SalesOrderHeader )
SELECT * FROM NumberRows WHERE RowNumber BETWEEN 100 AND 200
|
Here are the results:
SalesOrderID CustomerID RowNumber
--------------------------------------------
43759 13257 100
43760 16352 101
43761 16493 102
43762 27578 103
[more rows]
RANK()
The ranking function you will probably use the most is RANK(). RANK() ranks the data in the ORDER BY clause in the order you specify. RANK() is syntactically exactly like ROW_NUMBER()
but with true ranking results. It works just like in the Olympics, when
two people tie for the gold medal—the next rank is bronze. Similarly,
with the RANK() function, if four rows are tied with the value ‘1’, the
next row value for the rank column will be ‘5’. Consider this code:
Listing 5. The RANK() function
USE AdventureWorks GO SELECT SalesOrderID, CustomerID, RANK() OVER (ORDER BY CustomerID) AS RunningCount FROM Sales.SalesOrderHeader WHERE SalesOrderID > 10000
|
Here are the results:
SalesOrderID CustomerID RunningCount
----------------------------------------------
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 5
47997 2 5
49054 2 5
50216 2 5
51728 2 5
57044 2 5
63198 2 5
69488 2 5
44124 3 13
[more rows]
Just like with the other ranking functions, RANK()
needs the aid of a CTE to work with aggregates. Consider this query
that ranks the customers from highest to lowest by total sales. We have
to use a CTE to perform the aggregate first and then rank over the
newly created aggregate expression:
Listing 6. Ranked aggregates
USE AdventureWorks GO --rank by totaldue, summed --need a CTE to do a sum --so this example will have a --customerID summed with all of --their orders WITH CustomerSum AS ( SELECT CustomerID, SUM(totaldue) AS totalamt FROM Sales.SalesOrderHeader GROUP BY CustomerID ) SELECT *, RANK() OVER (ORDER BY totalamt DESC) AS Rank FROM CustomerSum
|
The results are shown here. Notice that customer 697 is in first place:
CustomerID totalamt Rank
-------------------------------------
697 1179475.8399 1
678 1164761.0209 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
[more rows]
It
is important to remember that the ranking functions provided by SQL
Server 2005 are valid only for the scope of the running query. If the
underlying data changes and then you run the same query again, you will
get different results. For example, let’s modify a record from customer
678:
Listing 7. RANK() results change with underlying data changes
USE AdventureWorks GO UPDATE Sales.SalesOrderDetail SET OrderQty = 13 --3 is the orig amt WHERE SalesOrderID = 44086 AND SalesOrderDetailID = 1542
|
The results are shown here. Notice that customer 678 is now the top customer:
CustomerID totalamt Rank
-------------------------------------
678 1185160.9609 1
697 1179475.8399 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
DENSE_RANK() and NTILE(n)
The last two ranking functions we will cover are DENSE_RANK() and NTILE(n). DENSE_RANK() works exactly like RANK(), except it increments only on distinct rank changes.
Listing 8. Ranking with DENSE_RANK()
USE AdventureWorks GO SELECT SalesOrderID, CustomerID, DENSE_RANK() OVER (ORDER BY CustomerID) AS RunningCount FROM Sales.SalesOrderHeader WHERE SalesOrderID > 10000
|
The results are shown here:
SalesOrderID CustomerID RunningCount
---------------------------------------------
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 2
47997 2 2
49054 2 2
50216 2 2
51728 2 2
57044 2 2
63198 2 2
69488 2 2
44124 3 3
[more rows]
The following example shows the difference between RANK() and DENSE_RANK().
We will round the customers’ sales to the nearest hundred (because
managers always like to look at whole numbers in their reports!) and
look at the difference when we run into a tie. The code is here:
Listing 9. RANK() versus DENSE_RANK()
USE AdventureWorks GO WITH CustomerSum AS ( SELECT CustomerID, ROUND(CONVERT(int, SUM(totaldue))/100,8) *100 as totalamt FROM Sales.SalesOrderHeader GROUP BY CustomerID ) SELECT *, RANK() OVER (ORDER BY totalamt DESC) AS Rank, DENSE_RANK() OVER (ORDER BY totalamt DESC) AS DenseRank FROM CustomerSum
|
And here are the results:
CustomerID totalamt Rank DenseRank
----------------------------------------------------
678 1185100 1 1
697 1179400 2 2
170 1134700 3 3
328 1084400 4 4
[more rows]
87 213300 170 170
667 210600 171 171
196 207700 172 172
451 206100 173 173
672 206100 173 173
272 05200 175 174
687 205200 175 174
163 204000 177 175
102 203900 178 176
[more rows]
Notice that customers 451 and 672 are tied, with the same total sales amount. They are ranked 173 by both the RANK() and DENSE_RANK()
functions. What happens next is what’s important. Customers 27 and 687
are tied for the next position, and they are both assigned 175 by RANK() and 174 by DENSE_RANK(). Customer 163 is the next non-tie, and it is assigned 177 by RANK() and 175 by DENSE_RANK().
NTILE(n)
divides the returned rows into approximately evenly sized groups, the
number of which you specify as a parameter to the function. It assigns
each member of a group the same number in the result set. A perfect
example of this is the percentile ranking in a college examination or a
road race.
Listing 10. Ranking with NTILE()
SELECT SalesOrderID, CustomerID, NTILE(10000) OVER (ORDER BY CustomerID) AS RunningCount FROM Sales.SalesOrderHeader WHERE SalesOrderID > 10000
|
The results are shown here:
SalesOrderID CustomerID RunningCount
---------------------------------------------
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 2
47997 2 2
49054 2 2
50216 2 2
51728 2 3
57044 2 3
63198 2 3
69488 2 3
44124 3 4
[more rows]
Using All the Ranking Functions Together
So
far we have looked at the ranking functions in isolation. The ranking
functions are just regular SQL Server expressions, so you can have as
many of them as you want in a single SELECT
statement. We’ll look at one last example that brings these all
together into one SQL statement and shows the differences between the
four ranking functions.
Listing 11. Contrasting SQL Server 2005 ranking functions
--Ranking All USE AdventureWorks GO SELECT SalesOrderID AS OrderID, CustomerID, ROW_NUMBER() OVER(ORDER BY CustomerID) AS RowNum, RANK() OVER (ORDER BY CustomerID) AS Rank, DENSE_RANK() OVER (ORDER BY CustomerID) as DRank, NTILE(10000) OVER (ORDER BY CustomerID) as NTile FROM Sales.SalesOrderHeader WHERE SalesOrderID>10000 ORDER BY CustomerID
|
The results are shown here:
OrderID CustomerID RowNum Rank Drank NTile
--------------------------------------------------------------------
43860 1 1 1 1 1
44501 1 2 1 1 1
45283 1 3 1 1 1
46042 1 4 1 1 1
46976 2 5 5 2 2
47997 2 6 5 2 2
49054 2 7 5 2 2
50216 2 8 5 2 2
51728 2 9 5 2 3
57044 2 10 5 2 3
63198 2 11 5 2 3
69488 2 12 5 2 3
44124 3 13 13 3 4
44791 3 14 13 3 4
[more rows]
Ranking over Groups: PARTITION BY
The
ranking functions can also be combined with windowing functions. A
windowing function divides a result set into equal partitions based on
the values of your PARTITION BY statement in conjunction with the OVER clause in your ranking function. This is like applying a GROUP BY to your ranking function—you get a separate ranking for each partition. The following example uses ROW_NUMBER with PARTITION BY to count the number of orders by order date by salesperson. We do this by using PARTITION BY SalesPersonID OVER OrderDate. You can do this with any of the four ranking functions.
Listing 12. Ranking over groups with PARTITION BY
USE AdventureWorks GO SELECT SalesOrderID, SalesPersonID, OrderDate, ROW_NUMBER() OVER (PARTITION BY SalesPersonID ORDER BY OrderDate) AS OrderRank FROM Sales.SalesOrderHeader WHERE SalesPersonID is not null
|
The results are shown here. You might find that the order of your rows varies slightly from the results shown.
SalesOrderID SalesPersonID OrderDate OrderRank
-----------------------------------------------------------------------------
[more rows]
43659 279 2001-07-01 00:00:00.000 1
43660 279 2001-07-01 00:00:00.000 2
43681 279 2001-07-01 00:00:00.000 3
43684 279 2001-07-01 00:00:00.000 4
43685 279 2001-07-01 00:00:00.000 5
43694 279 2001-07-01 00:00:00.000 6
43695 279 2001-07-01 00:00:00.000 7
43696 279 2001-07-01 00:00:00.000 8
43845 279 2001-08-01 00:00:00.000 9
43861 279 2001-08-01 00:00:00.000 10
[more rows]
48079 287 2002-11-01 00:00:00.000 1
48064 287 2002-11-01 00:00:00.000 2
48057 287 2002-11-01 00:00:00.000 3
47998 287 2002-11-01 00:00:00.000 4
48001 287 2002-11-01 00:00:00.000 5
48014 287 2002-11-01 00:00:00.000 6
47982 287 2002-11-01 00:00:00.000 7
47992 287 2002-11-01 00:00:00.000 8
48390 287 2002-12-01 00:00:00.000 9
48308 287 2002-12-01 00:00:00.000 10
[more rows]
Let’s
partition our ranking function by country. We’ll create a CTE to
aggregate the sales by customer and by country. Then we will apply the
ranking function over the TotalAmt field and the Customer ID field,
partitioned by the CountryName.
Listing 13. Aggregates with PARTITION BY
USE AdventureWorks GO WITH CTETerritory AS ( SELECT cr.Name AS CountryName, CustomerID, SUM(TotalDue) AS TotalAmt FROM Sales.SalesOrderHeader soh INNER JOIN Sales.SalesTerritory ter ON soh.TerritoryID=ter.TerritoryID INNER JOIN Person.CountryRegion cr ON cr.CountryRegionCode=ter.CountryRegionCode GROUP BY cr.Name, CustomerID ) SELECT *, RANK() OVER(PARTITION BY CountryName ORDER BY TotalAmt, CustomerID DESC) As Rank FROM CTETerritory
|
The results look like this:
CountryName CustomerID TotalAmt Rank
-----------------------------------------------------
Australia 29083 4.409 1
Australia 29061 4.409 2
Australia 29290 5.514 3
Australia 29287 5.514 4
Australia 28924 5.514 5
[more rows]
Canada 29267 5.514 1
Canada 29230 5.514 2
Canada 28248 5.514 3
Canada 27628 5.514 4
Canada 27414 5.514 5
[more rows]
France 24538 4.409 1
France 24535 4.409 2
France 23623 4.409 3
France 23611 4.409 4
France 20961 4.409 5
[more rows]
PARTITION BY supports other SQL Server aggregate functions, including MIN and MAX
as well as your own scalar functions. You can apply your aggregate
function in the same way that you apply the ranking functions, with a PARTITION BY statement. Let’s apply this technique to the current sample by adding a column to our results set using the AVG function. We will get the same results but with an additional column showing the average by country:
Listing 14. Using AVG with PARTITION BY
USE AdventureWorks GO WITH CTETerritory AS ( SELECT cr.Name AS CountryName, CustomerID, SUM(TotalDue) AS TotalAmt FROM Sales.SalesOrderHeader soh INNER JOIN Sales.SalesTerritory ter ON soh.TerritoryID=ter.TerritoryID INNER JOIN Person.CountryRegion cr ON cr.CountryRegionCode=ter.CountryRegionCode GROUP BY cr.Name, CustomerID ) SELECT *, RANK() OVER(PARTITION BY CountryName ORDER BY TotalAmt, CustomerID DESC) As Rank, AVG(TotalAmt) OVER(PARTITION BY CountryName) AS Average FROM CTETerritory
|
Here are the results:
CountryName CustomerID TotalAmt Rank Average
-------------------------------------------------------------------
Australia 29083 4.409 1 3364.8318
Australia 29061 4.409 2 3364.8318
Australia 29290 5.514 3 3364.8318
[more rows]
Canada 29267 5.514 1 12824.756
Canada 29230 5.514 2 12824.756
Canada 28248 5.514 3 12824.756
[more rows]