SQL Azure : Managing a Shard (part 1) - Managing Exceptions & Managing Performance

2/5/2011 5:19:17 PM

Having created a shard and reached the point of being able to run queries and add data, you can begin to think about higher-level issues: how to handle exceptions, manage performance, control transactions, and more.

1. Managing Exceptions

So far, you've learned the basic principles of the sample shard library. You saw how to select, insert, update, and delete records in various ways through the methods provided by the library. Let's discuss how you can manage exceptions that the shard may throw at you.

The current library doesn't handle rollbacks, but it may throw exceptions that your code needs to capture. All the records were inserted except Jim Nastic: that name was too long for the SqlParameter object (hence it threw a "Value Would Be Truncated" exception).

The library handles exceptions through the AggregateException class provided by the TPL; this class holds a collection of exceptions. This is necessary because the library executes database calls in parallel. As a result, more than one exception may be taking place at the same time. You need to aggregate these exceptions and return them to the client for further processing.

For example, the shard library's ExecuteSingleNonQuery method takes a ConcurrentQueue<Exception> parameter, which represents an object that stores exceptions. This object is thread-safe, meaning that all running threads can add new exceptions to it safely without running into concurrency issues. The following code shows that if an exception is detected in the ExecuteSingleNonQuery method, the code adds the exception to the queue on line 14. Also, as a convention, the exception is rethrown if the queue isn't provided (line 16):

1) private static long ExecuteSingleNonQuery(
2)  SqlCommand command,
3)  SqlConnection connectionToUse,
4)  System.Collections.Concurrent.ConcurrentQueue<Exception> exceptions
5) )
6) {
7)        try
8)        {
9)        // ...
10)        }
11)        catch (Exception ex)
12)        {
13)                 if (exceptions != null)
14)                                 exceptions.Enqueue(ex);
15)                 else
16)                                 throw;
17)          }
18)   }

The following code shows the ExecuteShardNonQuery method, which calls the ExecuteSingleNonQuery method just described. Line 1 creates the exception queue (ConcurrentQueue), which is passed as a variable to ExecuteSingleNonQuery. After the parallel execution of the database calls is complete, the code checks whether the exception queue is empty. If it isn't empty, an AggregateException is thrown, which contains the collection of exceptions stored in the exception queue (lines 13 and 14):

1) var exceptions = new System.Collections.Concurrent.ConcurrentQueue<Exception>();
2)
3) Parallel.ForEach(connections, delegate(SqlConnection c)
4) {
5)    long rowsAffected = ExecuteSingleNonQuery(command, c, exceptions);
6)
7)    lock (alock)
8)       res += rowsAffected;
9)
10)    }
11) );
12)
13) if (!exceptions.IsEmpty)
14)    throw new AggregateException(exceptions);

As you can see, managing exceptions can be a bit tricky. However, these exception helper classes provide a good mechanism to store exceptions and return a collection of exceptions that the client code can consume.

2. Managing Performance

So far, you've seen how the shard library works and how you can use it in your code. But it's important to keep in mind why you go through all this trouble—after all, there is nothing trivial in creating a shard library. This shard library does something important: it allows a client application to grow parts (or all) of a database horizontally, with the intention of improving performance, scalability, or both.

What does this mean? It means the shard library can help an application keep its performance characteristics in a somewhat consistent manner as more users use the application (scalability), or it can help an application perform faster under a given load (performance). If you're lucky, the shard library may be able to achieve both. However, this won't happen without proper planning. The shard library by itself is only a splitter, in the sense that it spreads calls to multiple databases.

Shards don't necessarily help performance; in certain cases, a shard hurts both performance and scalability. The reason is that a shard imposes an overhead that wouldn't otherwise exist. Figure 10-9 shows the difference between a standard ADO.NET call selecting records and best case and worst case scenarios when fetching the same records from a shard. In the best case scenario, all records are assumed to be split in three distinct databases; the shard is able to concurrently access all three databases, aggregate the three resultsets, and filter and/or sort the data. The shard must then manage all of the following, which consumes processing time:

Loops for connecting to the underlying databases
Loops for fetching the data
Data aggregation, sorting and filtering

In the worst case scenario, all these operations can't be executed in parallel and require serial execution. This may be the case if the TPL detects that only a single processor is available. Finally, you may end up in a situation that mixes worst and best case scenarios, where some of the calls can be made in parallel, but not all.

Figure 1. Data access overhead comparison

Now that all the warnings are laid out, let's look at a scenario for which a shard makes sense and probably improves both performance and scalability. Imagine a DOC table that contains only two records. The table contains a few fields that represent document metadata, such as Title and Author ID. However, this table also contains a large field: a varbinary column called Document that holds a PDF file. Each PDF file is a few megabytes in size. Figure 10-10 shows the output of the table. Because this database is loaded in SQL Azure, the SELECT * FROM DOCS statement returns a few megabytes of data on an SSL encrypted link. The execution of this statement takes about 2.5 seconds on average, or roughly 1.25 seconds per record.

Figure 2. Sample table containing documents in one database

Both records come from the database; you can see this by looking at the database GUID, which is similar for both records. However, if you move the second record to another SQL Azure database, the average execution time drops to about 1.8 seconds. Figure 10-11 shows the result of the same statement that executed in 1.4 seconds (you can see that the database GUIDs are now different). This is half the execution time of the first result.

Figure 3. Sample table containing documents in two databases

You can execute this statement much more quickly because almost the entire time is spent returning the Document field. Figure 10-12 tells you that returning all the fields against both databases without the Document field takes only 103 milliseconds. This shows that using a shard can provide performance benefits even if there is a processing overhead; however, this may not always be the case. Be sure to carefully evaluate your database design to determine which tables, if any, can take advantage of parallel execution.