SQL Azure: Building a Shard (part 3) - Reading Using the Shard & Caching

1/31/2011 5:26:29 PM

3. Reading Using the Shard

Now that you've reviewed how connections strings are handled in the application and the library, you need to know how the shard handles a SELECT operation against multiple databases. In its simplest form, the library executes the SELECT operation against the list of connections defined previously.

The client application calls ExecuteShardQuery, which in turn loops over the list of SqlConnection objects (see Figure 4). If you look at the code, you see that a copy of each connection object is made first; this is to avoid any potential collisions if the client code makes a call to this method multiple times (a connection can only make one call at a time). Then, for each connection, the code calls ExecuteSingleQuery, which is the method in the shard library that makes the call to the database.

Figure 4. Reading from the shard

The ExecuteShardQuery method is designed to call the ExecuteSingleQuery method multiple times in parallel using the TPL. The TPL provides many useful methods to easily handle parallel processing without having to manage threads. The shard library uses Parallel.ForEach, which is a construct that allows the code to execute an inline method concurrently, and automatically adjusts the number of threads depending on your computer's hardware configuration. So, the more CPUs you have, the faster the following code executes, if you have enough connections to loop through. Note, however, that you need to lock the data object (line 5), which is a DataTable, because it could be accessed by other executing threads. Finally, the Merge method on the DataTable object concatenates resultsets from the various connections. After the loop has executed, the resulting data object has the list of records retrieved from the shard, in no guaranteed order:

1) Parallel.ForEach(connections,
2) delegate(SqlConnection c)

3)    {
4) DataTable dt = ExecuteSingleQuery(command, c, exceptions);
5) lock (data)
6) data.Merge(dt, true, MissingSchemaAction.Add);
7) }
8) );

The following code is a simplified version of the actual sample application. (For clarity, some code that calculates execution time and performs exception handling has been removed.) Line 4 sets the command text to be executed, such as a SELECT statement, and line 5 executes it against the shard. Instead of calling ExecuteReader, the code calls ExecuteShardQuery to use the shard. Line 7 binds the resulting DataTable and displays the records returned by the shard:

1) SqlCommand cmd = new SqlCommand();
2) DataTable dataRes = new DataTable();
4) cmd.CommandText = this.textBox1.Text;
5) dataRes = cmd.ExecuteShardQuery();
7) dataGridView2.DataSource = dataRes;

Figure 5 shows the outcome of this code. The SELECT statement is designed to return database object names and types. Executing the statement against the shard performs as expected. However, notice that an extra column has been added to the display: __guidDB__. This is the name of the GUID column introduced previously. This column doesn't help much for reading, but it enables updates and deletes, as you see later.

Figure 5. Showing records returned by the shard

The GUID provided is unique for each database connection, as long as one of the key parameters is different in the connection string of each connection. It's added by the ExecuteSingleQuery method described previously. Within this method, a column is added in front of all the others, which carries the GUID. In the following code extract, line 3 creates the data column of type string, and line 4 sets its default value to the connection's GUID. Line 7 fills the data table with the query's result, along with the added GUID column. The following is the logic used to add this GUID:

1) // Add the connection GUID to this set of records
2) // This helps us identify which row came from which connection
3) DataColumn col = dt.Columns.Add(_GUID_, typeof(string));
4) col.DefaultValue = connection.ConnectionGuid();
6) // Get the data
7) da.Fill(dt);

4. Caching

To minimize roundtrips to the source databases, the shard library provides an optional caching mechanism. The caching technique used in this library offers basic capabilities and can be extended to address more complex scenarios. The objective of this library is to cache the entire DataTable of each database backend whenever requested. Figure 6 shows the logical decision tree of the caching approach. It's important to note that this library calculates a cache key based on each parameter, the parameter value, each SQL statement, and the database's GUID.

Figure 6. Caching logic

The effect of the cache is visible when you connect to SQL Azure. Considering that connecting to a SQL Azure database takes up to 250 milliseconds the first time, memory access is significantly faster. The importance of the cache increases as the number of records increases and the number of databases increases in the shard.

The cache provided by this library also provides a time to live (TTL) mechanism that implements an absolute expiration or a sliding expiration scheme. An absolute expiration resets the cache automatically at a specific time in the future, whereas the sliding setting moves the expiration time if the cache items are used before expiring. The following code shows how the caching is implemented. Line 1 creates a CacheItemPolicy used to define the behavior of the cache. Line 3 implements the sliding window cache, and line 5 implements the absolute approach:

1) CacheItemPolicy cip = new CacheItemPolicy();
2) if (UseSlidingWindow)
3) cip.SlidingExpiration = defaultTTL;
4) else
5) cip.AbsoluteExpiration =
new DateTimeOffset(System.DateTime.Now.Add(defaultTTL));
6) MemoryCache.Default.Add(cacheKey, dt, cip);

You can enhance this caching technique in multiple ways. For example, the DataTable object stored in the cache can be compressed when it contains many rows. Compression algorithms tend to increase latency, but the overall performance benefits may be worth a slight delay.

Another way to enhance this caching storage is to create different cache containers, so you can control at a more granular level which container holds which kind of data. Doing so lets you control a different setting per container, for example; or you may decide to always compress one cache container but not another.

Finally, the cache provided in this library isn't distributed; it's local to the machine running the library. If you need to develop a more robust cache, consider looking into the Windows Server AppFabric; its caching technology provides enterprise-level capabilities.


For more information about the Windows Server AppFabric, visit

  •  SQL Azure: Building a Shard (part 2) - Managing Database Connections
  •  SQL Azure: Building a Shard (part 1) - Designing the Shard Library Object
  •  SQL Azure: Designing for High Performance - General Performance Concepts
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Creating and Altering Tables
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Managing Internationalization Considerations
  •  Programming Microsoft SQL Server 2005 : Deployment (part 2) - Testing Your Stored Procedures
  •  Programming Microsoft SQL Server 2005 : Deployment (part 1) - Deploying Your Assembly
  •  Programming Microsoft SQL Server 2005 : CLR Stored Procedures and Server-Side Data Access
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 3) - Using the Resource Governor
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 2)
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 1)
  •  Reporting Services with SQL Azure : Deploying the Report & Creating a Subreport
  •  Reporting Services with SQL Azure : Creating the Report Design
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 2) - Using RANK, DENSE_RANK and NTILE
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 1) - Using ROW_NUMBER
  •  SQL Server 2008 : Demystifying Data Types - Computed Columns
  •  Programming Microsoft SQL Server 2005: Overview of SQL CLR - Visual Studio/SQL Server Integration
  •  Programming Microsoft SQL Server 2005: DDL Triggers and Notifications
  •  Programming Microsoft SQL Server 2005: Enabling CLR Integration
  •  Reporting Services with SQL Azure : Creating the SQL Azure Data Source
    Top 20
    ASP.NET 4 in VB 2010 : Membership - Role-Based Security
    Mobile Application Security: Security Testing
    Network Programming with Windows Sockets : A Socket-Based Client
    Sharepoint 2007: Create a Personal Site
    Visual Studio 2010 : Understanding Solutions and Projects (part 3)
    jQuery 1.3 : Developing plugins - Adding new global functions
    SQL Server 2005 : How Exceptions Work in SQL Server
    Dynamically Downloading Content with Silverlight
    Customizing the Browser User Interface
    Introducing SharePoint 2010 (part 1)
    Performing a typical Exchange Server 2010 install
    Building Android Apps: Introduction to PhoneGap
    SharePoint 2010 : Operations Management with the SharePoint Central Administration Tool (part 2) - Administering System Setting Tasks in SPCA
    Outlining Windows Server 2008 R2 IPv6 Support
    Combinations and Permutations with F#
    Exchange Server 2007 : Administrate Transport Settings - Set Transport Rules
    SQL Server 2008 Command-Line Utilities : The tablediff Command-Line Utility
    Exploring the T-SQL Enhancements in SQL Server 2005 : The PIVOT and UNPIVOT Operators
    Windows Server 2008 : The Prototype Phase - Creating and Testing the Plan
    Hybrid Ciphers