T-SQL has a number of built-in aggregates, such as SUM, AVG, and MAX, but that set of built-in functions is not always sufficient. Luckily, the SQL
CLR features in SQL Server 2012 allow us to implement user-defined
aggregates in .NET code and use them from T-SQL. User-defined
aggregates can be implemented only in SQL CLR code; they have no T-SQL
equivalent. Because aggregates tend to perform computation only, they
provide an excellent use case for SQL CLR code. As it turns out, they
are also quite easy to build.
At
first, aggregates look and feel like functions because they accept and
return values. In fact, if you use an aggregate in a non-data-querying
T-SQL call (for example, SELECT SUM(8)),
you are actually treating the aggregate as if it were a function. The
thing to remember is that the argument passed to an aggregate is
typically a column, and so each discrete value for that column—for
whichever WHERE, HAVING, ORDER BY, and/or GROUP BY
scope applies—gets passed in to the aggregate. It is the aggregate’s
job to update a variable, which eventually will be the return value, as
each discrete value is passed to it.
CLR aggregates require you to apply the SqlUserDefinedAggregate attribute to them. The SqlUserDefinedAggregate attribute accepts a number of parameters, but all of them are optional except Format. In our example, we will use the value Format.Native for the Format
parameter.
Note
User-defined aggregates greater than 8,000 bytes in size require use of the Format.UserDefined value for the Format parameter, rather than the Format.Native value used in this example.
Unlike the SqlProcedure, SqlFunction, and SqlTrigger attributes, the SqlUserDefinedAggregate attribute is required
by SQL Server for your class to be eligible for use as an aggregate.
Visual Studio SQL Server Database Projects do use this attribute for
deployment, and the attribute is included in the aggregate template,
but it also must be used in generic Class Library project code in order
for T-SQL registration of the aggregate to succeed.
Aggregate classes must have four methods: Init, Accumulate, Merge, and Terminate. The Init
method is used to clean up as necessary from previous uses of this
aggregate instance, allowing it to re-start a new aggregate
computation. The Accumulate method accepts a SQL type for processing. The Terminate method returns a SQL type representing the result. And finally, the Merge method accepts an object typed as the aggregate class itself so that it can be combined with the executing instance.
The Accumulate method handles the processing of a discrete value into the aggregate value, and the Terminate method returns the final aggregated value after all discrete values have been processed. The Init method provides startup code, typically initializing a class-level private variable that will be used by the AccumulateMerge method is called in a specific multithreading scenario, which we will describe later in this section. method. The
Just
to be perfectly clear, your aggregate class will not implement an
interface to supply these methods; you must create them to meet what we
might term the “conventions” that are expected of SQL
CLR aggregate classes (as opposed to a “contract” with which they must
comply). When you develop your code in a Visual Studio SQL Server
Database Project, the Aggregate template includes stubs for these four
methods as well as the proper application of the SqlUserDefinedAggregate attribute.
Creating your own aggregates
is fairly straightforward, but thinking through aggregation logic can
be a bit confusing at first. Imagine that you want to create a special
aggregate named BakersDozen
that increments its accumulated value by 1 for every discrete value’s
multiple of 12 (much as a baker, in simpler times, would throw in a
free 13th donut when you ordered 12). By using what you now know about
CLR aggregates and combining that with integer division, you can
implement a BakersDozen aggregate quite easily. Example 1, the code from struct BakersDozen in aggTest.cs in the sample project, contains the entire implementation of the aggregate BakersDozen.
Example 1. BakersDozen struct from aggTest.cs.
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
public struct BakersDozen
{
private SqlInt32 DonutCount;
public void Init()
{
DonutCount = 0;
}
public void Accumulate(SqlInt32 Value)
{
DonutCount += Value + ((Int32)Value) / 12;
}
public void Merge(BakersDozen Group)
{
DonutCount += Group.DonutCount;
}
public SqlInt32 Terminate()
{
return DonutCount;
}
}
The code here is fairly straightforward. The private variable DonutCount is used to track the BakersDozen-adjusted
sum of items ordered, adding the actual items ordered value and
incrementing the running total by the integer quotient of the ordered
value divided by 12. By this logic, bonus items are added only when an
individual value equals or exceeds a multiple of 12. Twelve includes a
full dozen, and so would 13. Twenty-four includes two dozen, and so
would 27. Two individual orders of 6 items each would not generate any
bonus items because a minimum of 12 items must be contained in a
discrete value to qualify for a bonus.
To deploy the aggregate,
use attribute-based deployment in the SQL Server Database Project or
the following command for the Class Library version:
CREATE AGGREGATE BakersDozen (@input int)
RETURNS int
EXTERNAL NAME SQLCLRDemo.BakersDozen
Notice
that no method name is specified because the aggregate is implemented
by an entire class rather than an individual function. Notice also that
the return value data type must be declared as the data type of the
values this aggregate function will process. The @input parameter acts as a placeholder, and its name is inconsequential. Note that aggregates can be built on SQL CLR types (covered in the next section) as well as SQL scalar types.
To see the aggregate work, first run the CreateTblAggregateTest.sql script file in the SSMS sample project to create a table named AggregateTest with columns OrderItemId, OrderId, and ItemsOrdered and several rows of data, as shown here:
CREATE TABLE tblAggregateTest(
[OrderItemId] [int] IDENTITY(1,1) NOT NULL,
[OrderId] [int] NULL,
[ItemsOrdered] [int] NOT NULL
)
GO
INSERT INTO tblAggregateTest VALUES (1,2)
INSERT INTO tblAggregateTest VALUES (1,4)
INSERT INTO tblAggregateTest VALUES (2,1)
INSERT INTO tblAggregateTest VALUES (2,12)
INSERT INTO tblAggregateTest VALUES (3,3)
INSERT INTO tblAggregateTest VALUES (3,2)
With such a table built, use the following T-SQL DML statement in a query window to test the aggregate function:
SELECT
OrderId,
SUM(ItemsOrdered) AS SUM,
dbo.BakersDozen(ItemsOrdered) AS BakersDozen
FROM tblAggregateTest
GROUP BY OrderId
For each distinct value in the OrderId column, this query effectively uses the CLR code under the following algorithm:
Call Init().
Call Accumulate once for each row with the same OrderId value, passing it that row’s value of the ItemsOrdered column.
Call Terminate to retrieve the aggregated value that the query will return to the client.
The results should be as follows:
OrderId SUM BakersDozen
----------- ----------- -----------
1 6 6
2 13 14
3 5 5
By including the built-in T-SQL aggregate SUM in the query, you can see how many bonus items were added. In this case, for OrderId 2, a single bonus item was added, due to one row in the table with the following values:
OrderItemId OrderId ItemsOrdered
----------- ----------- ------------
4 2 12
All the other rows contain ItemsOrdered values of less than 12, so no bonus items were added for them.
Because
SQL Server sometimes segments the work required to satisfy a query over
multiple threads, the query processor might need to execute your
aggregate function multiple times for a single query and then merge the results together. For your aggregate to work properly in this scenario, you must implement a Merge method.
The Merge
method takes the result of one thread’s aggregation and merges it into
the current thread’s aggregation. The calculation required to do this
could be complicated for some aggregates; in our case, you simply added the DonutCount value from the secondary thread’s aggregate (accessible via the Group input parameter) to your own. There is no need to add bonus items because they would have been added in the individual Accumulate
calls on the other thread. Simple addition is all that’s required. An
aggregate that calculated some type of average, for example, would
require more complex Merge code.
Don’t forget that aggregates
can be passed scalar values and can be used from T-SQL without
referencing a row set. Your aggregate must accommodate this scenario,
even if it seems impractical. In the case of BakersDozen, single scalar values are easily handled. To see for yourself, try executing the following table-less T-SQL query:
SELECT dbo.BakersDozen(13)
You will see that it returns the value 14.
Note
The TestAggregate.sql script file in the SSMS project contains both aggregate-testing queries.
Aggregates are an excellent use of SQL
CLR programming. Because they are passed data values to be processed,
they typically perform only computational tasks and no data access of
their own. They consist of compiled CLR code, so they perform well, and
unlike stored
procedures, triggers, and functions, they cannot be implemented at all
in T-SQL. That said, you must still make your aggregate code,
especially in the Accumulate
method, as “lean and mean” as possible. Injecting your own code into
the query processor’s stream of work is an honor, a privilege, and a
significant responsibility. Take that responsibility seriously, and
make sure that your code is as low-impact as possible.
Note
More Info SQL Server 2012 provides (as did SQL Server 2008R2 and 2008 before it) support for aggregates that accept multiple input parameters. Implementing such aggregate functions involves accepting two or more parameters in the Accumulate method, processing them in your code, and then declaring them in the CREATE AGGREGATE T-SQL statement, if you are not using SSDT auto-deployment. Aggregates that accept multiple input parameters still must, of course, return a single value.