DATABASE

Exploring the T-SQL Enhancements in SQL Server 2005 : Ranking Functions

10/12/2010 9:39:26 AM
Databases hold data. Users sometimes want to perform simple calculations or algorithms on that data to rank the results in a specific order—like gold, silver, and bronze medals in the Olympics or the top 10 customers by region. SQL Server 2005 introduces functionality for using ranking expressions with your result set. You can select a number of ranking algorithms, which are then applied to a column that you specify. This comes in handy in .NET Framework applications for paging and sorting in a grid, as well as in many other scenarios.

ROW_NUMBER()

The most basic ranking function is ROW_NUMBER(). It returns a column as an expression that contains the row’s number in the result set. This number is used only in the context of the result set; if the result changes, the ROW_NUMBER() changes. The ROW_NUMBER() expression takes an ORDER BY statement with the column you want to use for the row count and the new OVER operator, which links the ORDER BY to the specific ranking function you are using. The ORDER BYOVER clause replaces an ORDER BY at the end of the SQL statement. in the

The following simple example gives a row number to each row in the result set, ordering by SalesOrderID:

Listing 1. Row number ranking
USE AdventureWorks
GO
SELECT SalesOrderID, CustomerID,
ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RunningCount
FROM Sales.SalesOrderHeader
WHERE SalesOrderID>10000

Results are shown here:

SalesOrderID       CustomerID      RunningCount
-------------------------------------
43659 676 1
43660 117 2
43661 442 3
43662 227 4
43663 510 5
43664 397 6
43665 146 7
43666 511 8
43667 646 9
[more rows]

ORDER BY Options

The ranking functions order your result set by the fields specified in the ORDER BY statement contained in the OVER clause. Alternatively, you can include an additional ORDER BY statement in your result set; this optional statement is distinct from the ORDER BY clause in the OVERROW_NUMBER() function’s results are displayed in the order they are determined by in the additional ORDER BY statement, not by the ORDER BY statement contained within the OVER clause. The results can, therefore, be confusing, as in this example: expression. SQL Server 2005 allows this, but if you choose this option, the

--Row_Number using a unique value, different order by
SELECT SalesOrderID, CustomerID,
ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RunningCount
FROM Sales.SalesOrderHeader
WHERE SalesOrderID>10000
ORDER BY CustomerID --Different ORDER BY than in Row_Number


Here are the results:

SalesOrderID    CustomerID    RunningCount
------------------------------------------
43860 1 202
44501 1 843
45283 1 1625
46042 1 2384
46976 2 3318
47997 2 4339
49054 2 5396
50216 2 6558
51728 2 8070
57044 2 13386
63198 2 19540
69488 2 25830
65310 3 21652
71889 3 28231
53616 3 9958
[more rows]

As you can see, if you expect the results to be sorted by the OVER clause’s ORDER BY statement, you’d expect results ranked by SalesOrderID, when in fact they’re ordered by CustomerID.

If you choose the ROW_NUMBER() function to run against a non-unique column that contains multiple copies of the same value (also known as “ties,” such as the same amount of items sold and the same time in a race), ROW_NUMBER( ) breaks the tie and still produces a running count so that no rows have the same number. For example, CustomerID can repeat in this example, which will generate several ties; SQL Server simply increases the running count for each row, regardless of how many ties exist:

Listing 2. Row number ranking with ties
SELECT SalesOrderID, CustomerID,
ROW_NUMBER() OVER (ORDER BY CustomerID) AS RunningCount
FROM Sales.SalesOrderHeader
WHERE SalesOrderID>10000

The results are shown here:

SalesOrderID       CustomerID       RunningCount
------------------------------------------------
43860 1 1
44501 1 2
45283 1 3
46042 1 4
46976 2 5
47997 2 6
49054 2 7
50216 2 8
51728 2 9
57044 2 10
63198 2 11
69488 2 12
44124 3 13
[more rows]

Grouping and Filtering with a Row_Number

When you want to include a GROUP BY function in your query, ranking functions do not work. The easy way around this limitation is to create your GROUP BY in a CTE and then perform your ranking on the results, as shown here:

Listing 3. Grouping by row number
USE AdventureWorks
GO
--rank by totaldue, summed
--need a CTE to do the sum
--so this example will have a
--customerID summed with all of
--their orders
WITH CustomerSum
AS
(
SELECT CustomerID, SUM(totaldue) AS totalamt
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
--this appends a row_number to the end of the resultset
SELECT *,
ROW_NUMBER() OVER (ORDER BY totalamt DESC) AS RowNumber
FROM CustomerSum

Here are the results:

CustomerID      totalamt        RowNumber
-----------------------------------------
678 1179857.4657 1
697 1179475.8399 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
155 1045197.0498 6
72 1005539.7181 7
[more rows]

To filter by a ROW_NUMBER, you have to put the ROW_NUMBER function in a CTE.

Listing 4. Filtering by row number
USE AdventureWorks
GO
--use a common table expression if you want
--to filter by one of the rows that contain a
--ranking function since ranking functions
--are not allowed in where or having clauses
WITH NumberRows
AS
(
SELECT SalesOrderID, CustomerID,
ROW_NUMBER() OVER (ORDER BY SalesOrderID) AS RowNumber
FROM Sales.SalesOrderHeader
)

SELECT * FROM NumberRows
WHERE RowNumber BETWEEN 100 AND 200

Here are the results:

SalesOrderID       CustomerID      RowNumber
--------------------------------------------
43759 13257 100
43760 16352 101
43761 16493 102
43762 27578 103
[more rows]

RANK()

The ranking function you will probably use the most is RANK(). RANK() ranks the data in the ORDER BY clause in the order you specify. RANK() is syntactically exactly like ROW_NUMBER() but with true ranking results. It works just like in the Olympics, when two people tie for the gold medal—the next rank is bronze. Similarly, with the RANK() function, if four rows are tied with the value ‘1’, the next row value for the rank column will be ‘5’. Consider this code:

Listing 5. The RANK() function
USE AdventureWorks
GO
SELECT SalesOrderID, CustomerID,
RANK() OVER (ORDER BY CustomerID) AS RunningCount
FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 10000

Here are the results:

SalesOrderID      CustomerID      RunningCount
----------------------------------------------
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 5
47997 2 5
49054 2 5
50216 2 5
51728 2 5
57044 2 5
63198 2 5
69488 2 5
44124 3 13
[more rows]

Just like with the other ranking functions, RANK() needs the aid of a CTE to work with aggregates. Consider this query that ranks the customers from highest to lowest by total sales. We have to use a CTE to perform the aggregate first and then rank over the newly created aggregate expression:

Listing 6. Ranked aggregates
USE AdventureWorks
GO
--rank by totaldue, summed
--need a CTE to do a sum
--so this example will have a
--customerID summed with all of
--their orders
WITH CustomerSum
AS
(
SELECT CustomerID, SUM(totaldue) AS totalamt
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT *,
RANK() OVER (ORDER BY totalamt DESC) AS Rank
FROM CustomerSum

The results are shown here. Notice that customer 697 is in first place:

CustomerID     totalamt         Rank
-------------------------------------
697 1179475.8399 1
678 1164761.0209 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5
[more rows]

It is important to remember that the ranking functions provided by SQL Server 2005 are valid only for the scope of the running query. If the underlying data changes and then you run the same query again, you will get different results. For example, let’s modify a record from customer 678:

Listing 7. RANK() results change with underlying data changes
USE AdventureWorks
GO
UPDATE Sales.SalesOrderDetail
SET OrderQty = 13 --3 is the orig amt
WHERE SalesOrderID = 44086 AND SalesOrderDetailID = 1542

The results are shown here. Notice that customer 678 is now the top customer:

CustomerID      totalamt         Rank
-------------------------------------
678 1185160.9609 1
697 1179475.8399 2
170 1134747.4413 3
328 1084439.0265 4
514 1074154.3035 5

DENSE_RANK() and NTILE(n)

The last two ranking functions we will cover are DENSE_RANK() and NTILE(n). DENSE_RANK() works exactly like RANK(), except it increments only on distinct rank changes.

Listing 8. Ranking with DENSE_RANK()
USE AdventureWorks
GO
SELECT SalesOrderID, CustomerID,
DENSE_RANK() OVER (ORDER BY CustomerID) AS RunningCount
FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 10000

The results are shown here:

SalesOrderID     CustomerID      RunningCount
---------------------------------------------
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 2
47997 2 2
49054 2 2
50216 2 2
51728 2 2
57044 2 2
63198 2 2
69488 2 2
44124 3 3
[more rows]

The following example shows the difference between RANK() and DENSE_RANK(). We will round the customers’ sales to the nearest hundred (because managers always like to look at whole numbers in their reports!) and look at the difference when we run into a tie. The code is here:

Listing 9. RANK() versus DENSE_RANK()
USE AdventureWorks
GO
WITH CustomerSum
AS
(
SELECT CustomerID, ROUND(CONVERT(int, SUM(totaldue))/100,8) *100 as totalamt
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
)
SELECT *,
RANK() OVER (ORDER BY totalamt DESC) AS Rank,
DENSE_RANK() OVER (ORDER BY totalamt DESC) AS DenseRank
FROM CustomerSum


And here are the results:

CustomerID       totalamt       Rank       DenseRank
----------------------------------------------------
678 1185100 1 1
697 1179400 2 2
170 1134700 3 3
328 1084400 4 4
[more rows]
87 213300 170 170
667 210600 171 171
196 207700 172 172
451 206100 173 173
672 206100 173 173
272 05200 175 174
687 205200 175 174
163 204000 177 175
102 203900 178 176
[more rows]

Notice that customers 451 and 672 are tied, with the same total sales amount. They are ranked 173 by both the RANK() and DENSE_RANK() functions. What happens next is what’s important. Customers 27 and 687 are tied for the next position, and they are both assigned 175 by RANK() and 174 by DENSE_RANK(). Customer 163 is the next non-tie, and it is assigned 177 by RANK() and 175 by DENSE_RANK().

NTILE(n) divides the returned rows into approximately evenly sized groups, the number of which you specify as a parameter to the function. It assigns each member of a group the same number in the result set. A perfect example of this is the percentile ranking in a college examination or a road race.

Listing 10. Ranking with NTILE()
SELECT SalesOrderID, CustomerID,
NTILE(10000) OVER (ORDER BY CustomerID) AS RunningCount
FROM Sales.SalesOrderHeader
WHERE SalesOrderID > 10000

The results are shown here:

SalesOrderID      CustomerID     RunningCount
---------------------------------------------
43860 1 1
44501 1 1
45283 1 1
46042 1 1
46976 2 2
47997 2 2
49054 2 2
50216 2 2
51728 2 3
57044 2 3
63198 2 3
69488 2 3
44124 3 4
[more rows]

Using All the Ranking Functions Together

So far we have looked at the ranking functions in isolation. The ranking functions are just regular SQL Server expressions, so you can have as many of them as you want in a single SELECT statement. We’ll look at one last example that brings these all together into one SQL statement and shows the differences between the four ranking functions.

Listing 11. Contrasting SQL Server 2005 ranking functions
--Ranking All
USE AdventureWorks
GO
SELECT SalesOrderID AS OrderID, CustomerID,
ROW_NUMBER() OVER(ORDER BY CustomerID) AS RowNum,
RANK() OVER (ORDER BY CustomerID) AS Rank,
DENSE_RANK() OVER (ORDER BY CustomerID) as DRank,
NTILE(10000) OVER (ORDER BY CustomerID) as NTile
FROM Sales.SalesOrderHeader
WHERE SalesOrderID>10000
ORDER BY CustomerID

The results are shown here:

OrderID      CustomerID      RowNum      Rank      Drank      NTile
--------------------------------------------------------------------
43860 1 1 1 1 1
44501 1 2 1 1 1
45283 1 3 1 1 1
46042 1 4 1 1 1
46976 2 5 5 2 2
47997 2 6 5 2 2
49054 2 7 5 2 2
50216 2 8 5 2 2
51728 2 9 5 2 3
57044 2 10 5 2 3
63198 2 11 5 2 3
69488 2 12 5 2 3
44124 3 13 13 3 4
44791 3 14 13 3 4
[more rows]

Ranking over Groups: PARTITION BY

The ranking functions can also be combined with windowing functions. A windowing function divides a result set into equal partitions based on the values of your PARTITION BY statement in conjunction with the OVER clause in your ranking function. This is like applying a GROUP BY to your ranking function—you get a separate ranking for each partition. The following example uses ROW_NUMBER with PARTITION BY to count the number of orders by order date by salesperson. We do this by using PARTITION BY SalesPersonID OVER OrderDate. You can do this with any of the four ranking functions.

Listing 12. Ranking over groups with PARTITION BY
USE AdventureWorks
GO
SELECT SalesOrderID, SalesPersonID, OrderDate,
ROW_NUMBER() OVER (PARTITION BY SalesPersonID ORDER BY OrderDate) AS OrderRank
FROM Sales.SalesOrderHeader
WHERE SalesPersonID is not null


The results are shown here. You might find that the order of your rows varies slightly from the results shown.

SalesOrderID       SalesPersonID       OrderDate                    OrderRank
-----------------------------------------------------------------------------
[more rows]
43659 279 2001-07-01 00:00:00.000 1
43660 279 2001-07-01 00:00:00.000 2
43681 279 2001-07-01 00:00:00.000 3
43684 279 2001-07-01 00:00:00.000 4
43685 279 2001-07-01 00:00:00.000 5
43694 279 2001-07-01 00:00:00.000 6
43695 279 2001-07-01 00:00:00.000 7
43696 279 2001-07-01 00:00:00.000 8
43845 279 2001-08-01 00:00:00.000 9
43861 279 2001-08-01 00:00:00.000 10
[more rows]
48079 287 2002-11-01 00:00:00.000 1
48064 287 2002-11-01 00:00:00.000 2
48057 287 2002-11-01 00:00:00.000 3
47998 287 2002-11-01 00:00:00.000 4
48001 287 2002-11-01 00:00:00.000 5
48014 287 2002-11-01 00:00:00.000 6
47982 287 2002-11-01 00:00:00.000 7
47992 287 2002-11-01 00:00:00.000 8
48390 287 2002-12-01 00:00:00.000 9
48308 287 2002-12-01 00:00:00.000 10
[more rows]


Let’s partition our ranking function by country. We’ll create a CTE to aggregate the sales by customer and by country. Then we will apply the ranking function over the TotalAmt field and the Customer ID field, partitioned by the CountryName.

Listing 13. Aggregates with PARTITION BY
USE AdventureWorks
GO
WITH CTETerritory
AS
(
SELECT cr.Name AS CountryName, CustomerID, SUM(TotalDue) AS TotalAmt
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory ter
ON soh.TerritoryID=ter.TerritoryID
INNER JOIN Person.CountryRegion cr
ON cr.CountryRegionCode=ter.CountryRegionCode
GROUP BY cr.Name, CustomerID
)
SELECT *, RANK() OVER(PARTITION BY CountryName ORDER BY TotalAmt, CustomerID DESC)
As Rank
FROM CTETerritory


The results look like this:

CountryName       CustomerID       TotalAmt      Rank
-----------------------------------------------------
Australia 29083 4.409 1
Australia 29061 4.409 2
Australia 29290 5.514 3
Australia 29287 5.514 4
Australia 28924 5.514 5
[more rows]
Canada 29267 5.514 1
Canada 29230 5.514 2
Canada 28248 5.514 3
Canada 27628 5.514 4
Canada 27414 5.514 5
[more rows]
France 24538 4.409 1
France 24535 4.409 2
France 23623 4.409 3
France 23611 4.409 4
France 20961 4.409 5
[more rows]

PARTITION BY supports other SQL Server aggregate functions, including MIN and MAX as well as your own scalar functions. You can apply your aggregate function in the same way that you apply the ranking functions, with a PARTITION BY statement. Let’s apply this technique to the current sample by adding a column to our results set using the AVG function. We will get the same results but with an additional column showing the average by country:

Listing 14. Using AVG with PARTITION BY
USE AdventureWorks
GO
WITH CTETerritory
AS
(
SELECT cr.Name AS CountryName, CustomerID, SUM(TotalDue) AS TotalAmt
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesTerritory ter
ON soh.TerritoryID=ter.TerritoryID
INNER JOIN Person.CountryRegion cr
ON cr.CountryRegionCode=ter.CountryRegionCode
GROUP BY cr.Name, CustomerID
)
SELECT *, RANK() OVER(PARTITION BY CountryName ORDER BY TotalAmt, CustomerID DESC)
As Rank,
AVG(TotalAmt) OVER(PARTITION BY CountryName) AS Average
FROM CTETerritory


Here are the results:

CountryName       CustomerID      TotalAmt      Rank      Average
-------------------------------------------------------------------
Australia 29083 4.409 1 3364.8318
Australia 29061 4.409 2 3364.8318
Australia 29290 5.514 3 3364.8318
[more rows]
Canada 29267 5.514 1 12824.756
Canada 29230 5.514 2 12824.756
Canada 28248 5.514 3 12824.756
[more rows]
Other  
 
Video
Top 10
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
The latest Audi TT : New angles for TT
Era of million-dollar luxury cars
Game Review : Hearthstone - Blackrock Mountain
Game Review : Battlefield Hardline
Google Chromecast
Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
Michael Kors Designs Stylish Tech Products for Women
REVIEW
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone