SQL Azure : Managing a Shard (part 1) - Managing Exceptions & Managing Performance

2/5/2011 5:19:17 PM
Having created a shard and reached the point of being able to run queries and add data, you can begin to think about higher-level issues: how to handle exceptions, manage performance, control transactions, and more.

1. Managing Exceptions

So far, you've learned the basic principles of the sample shard library. You saw how to select, insert, update, and delete records in various ways through the methods provided by the library. Let's discuss how you can manage exceptions that the shard may throw at you.

The current library doesn't handle rollbacks, but it may throw exceptions that your code needs to capture. All the records were inserted except Jim Nastic: that name was too long for the SqlParameter object (hence it threw a "Value Would Be Truncated" exception).

The library handles exceptions through the AggregateException class provided by the TPL; this class holds a collection of exceptions. This is necessary because the library executes database calls in parallel. As a result, more than one exception may be taking place at the same time. You need to aggregate these exceptions and return them to the client for further processing.

For example, the shard library's ExecuteSingleNonQuery method takes a ConcurrentQueue<Exception> parameter, which represents an object that stores exceptions. This object is thread-safe, meaning that all running threads can add new exceptions to it safely without running into concurrency issues. The following code shows that if an exception is detected in the ExecuteSingleNonQuery method, the code adds the exception to the queue on line 14. Also, as a convention, the exception is rethrown if the queue isn't provided (line 16):

1) private static long ExecuteSingleNonQuery(
2) SqlCommand command,
3) SqlConnection connectionToUse,
4) System.Collections.Concurrent.ConcurrentQueue<Exception> exceptions
5) )
6) {
7) try
8) {
9) // ...
10) }
11) catch (Exception ex)
12) {
13) if (exceptions != null)
14) exceptions.Enqueue(ex);
15) else
16) throw;
17) }
18) }

The following code shows the ExecuteShardNonQuery method, which calls the ExecuteSingleNonQuery method just described. Line 1 creates the exception queue (ConcurrentQueue), which is passed as a variable to ExecuteSingleNonQuery. After the parallel execution of the database calls is complete, the code checks whether the exception queue is empty. If it isn't empty, an AggregateException is thrown, which contains the collection of exceptions stored in the exception queue (lines 13 and 14):

1) var exceptions = new System.Collections.Concurrent.ConcurrentQueue<Exception>();
3) Parallel.ForEach(connections, delegate(SqlConnection c)
4) {
5) long rowsAffected = ExecuteSingleNonQuery(command, c, exceptions);
7) lock (alock)
8) res += rowsAffected;
10) }
11) );
13) if (!exceptions.IsEmpty)
14) throw new AggregateException(exceptions);

As you can see, managing exceptions can be a bit tricky. However, these exception helper classes provide a good mechanism to store exceptions and return a collection of exceptions that the client code can consume.

2. Managing Performance

So far, you've seen how the shard library works and how you can use it in your code. But it's important to keep in mind why you go through all this trouble—after all, there is nothing trivial in creating a shard library. This shard library does something important: it allows a client application to grow parts (or all) of a database horizontally, with the intention of improving performance, scalability, or both.

What does this mean? It means the shard library can help an application keep its performance characteristics in a somewhat consistent manner as more users use the application (scalability), or it can help an application perform faster under a given load (performance). If you're lucky, the shard library may be able to achieve both. However, this won't happen without proper planning. The shard library by itself is only a splitter, in the sense that it spreads calls to multiple databases.

Shards don't necessarily help performance; in certain cases, a shard hurts both performance and scalability. The reason is that a shard imposes an overhead that wouldn't otherwise exist. Figure 10-9 shows the difference between a standard ADO.NET call selecting records and best case and worst case scenarios when fetching the same records from a shard. In the best case scenario, all records are assumed to be split in three distinct databases; the shard is able to concurrently access all three databases, aggregate the three resultsets, and filter and/or sort the data. The shard must then manage all of the following, which consumes processing time:

  • Loops for connecting to the underlying databases

  • Loops for fetching the data

  • Data aggregation, sorting and filtering

In the worst case scenario, all these operations can't be executed in parallel and require serial execution. This may be the case if the TPL detects that only a single processor is available. Finally, you may end up in a situation that mixes worst and best case scenarios, where some of the calls can be made in parallel, but not all.

Figure 1. Data access overhead comparison

Now that all the warnings are laid out, let's look at a scenario for which a shard makes sense and probably improves both performance and scalability. Imagine a DOC table that contains only two records. The table contains a few fields that represent document metadata, such as Title and Author ID. However, this table also contains a large field: a varbinary column called Document that holds a PDF file. Each PDF file is a few megabytes in size. Figure 10-10 shows the output of the table. Because this database is loaded in SQL Azure, the SELECT * FROM DOCS statement returns a few megabytes of data on an SSL encrypted link. The execution of this statement takes about 2.5 seconds on average, or roughly 1.25 seconds per record.

Figure 2. Sample table containing documents in one database

Both records come from the database; you can see this by looking at the database GUID, which is similar for both records. However, if you move the second record to another SQL Azure database, the average execution time drops to about 1.8 seconds. Figure 10-11 shows the result of the same statement that executed in 1.4 seconds (you can see that the database GUIDs are now different). This is half the execution time of the first result.

Figure 3. Sample table containing documents in two databases

You can execute this statement much more quickly because almost the entire time is spent returning the Document field. Figure 10-12 tells you that returning all the fields against both databases without the Document field takes only 103 milliseconds. This shows that using a shard can provide performance benefits even if there is a processing overhead; however, this may not always be the case. Be sure to carefully evaluate your database design to determine which tables, if any, can take advantage of parallel execution.

Figure 4. Excluding the Document field from the SELECT
  •  SQL Server 2008 : OPENXML, sp_xml_preparedocument, and sp_xml_removedocument
  •  SQL Server 2008 : Retrieving and Transforming XML Data
  •  SQL Azure: Building a Shard (part 4) - Updating and Deleting Records in the Shard & Adding Records to the Shard
  •  SQL Azure: Building a Shard (part 3) - Reading Using the Shard & Caching
  •  SQL Azure: Building a Shard (part 2) - Managing Database Connections
  •  SQL Azure: Building a Shard (part 1) - Designing the Shard Library Object
  •  SQL Azure: Designing for High Performance - General Performance Concepts
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Creating and Altering Tables
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Managing Internationalization Considerations
  •  Programming Microsoft SQL Server 2005 : Deployment (part 2) - Testing Your Stored Procedures
  •  Programming Microsoft SQL Server 2005 : Deployment (part 1) - Deploying Your Assembly
  •  Programming Microsoft SQL Server 2005 : CLR Stored Procedures and Server-Side Data Access
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 3) - Using the Resource Governor
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 2)
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 1)
  •  Reporting Services with SQL Azure : Deploying the Report & Creating a Subreport
  •  Reporting Services with SQL Azure : Creating the Report Design
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 2) - Using RANK, DENSE_RANK and NTILE
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 1) - Using ROW_NUMBER
  •  SQL Server 2008 : Demystifying Data Types - Computed Columns
    Top 10
    Designing a Windows Server 2008 R2 Active Directory : Understanding the Federated Forests Design Model
    Monitoring a SharePoint 2010 Environment : Using the SharePoint Health Analyzer
    Building Your First Windows Phone 7 Application (part 2) - Using Your First Windows Phone Silverlight Controls
    Resolve a Hostname to an IP Address
    Windows System Programming : File Locking
    Managing Windows Firewall in Windows Vista
    Windows 7 : General Maintenance Tools (part 1) - Updating Your Computer
    Windows Server 2008 : Installing and Configuring Websites (part 2) - Configuring IIS 7.5 Website Properties
    Programming with DirectX : Game Math - Bounding Geometry (part 2) - Bounding Spheres & Bounding Hierarchies
    iPhone 3D Programming : Image-Processing Example: Bloom
    Most View
    iPad SDK : Popovers - The Font Size Popover
    Windows Phone 7 Development : Using GeoCoordinateWatcher and the Bing Maps Control to Track Your Movements
    Hacking :System Daemons
    Programming Microsoft SQL Server 2005 : Querying XML Data Using XQuery (part 1) - XQuery Defined
    iPhone Application Development : Basic User Input and Output
    Infrastructure Security: The Application Level
    Assembly vs. C
    Programming with DirectX : Shading and Surfaces - Types of Textures
    SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Using Cursors in Stored Procedures
    Backing Up and Restoring a SharePoint Environment : Using the Recycle Bin for Recovery
    Setting up Joomla! as a blog
    Caching User Controls
    Excel Programmer : Change Recorded Code
    Rollout Strategy in Group Policy of Windows Vista
    Managing Local Area Connections
    Windows Phone 7 Development : Push Notifications - Understanding Push Notifications
    Algorithms for Compiler Design: PREDICTIVE PARSING ERROR RECOVERY
    Exchange Server 2007 : Configure the Client Access Server - Create and Apply ActiveSync Mailbox Policies
    Installing Networking Components in Vista
    The SQL Programming Language : Creating Tables and Entering Data