3. Reading Using the Shard
Now that you've
reviewed how connections strings are handled in the application and the
library, you need to know how the shard handles a SELECT operation against multiple databases. In its simplest form, the library executes the SELECT operation against the list of connections defined previously.
The client application calls ExecuteShardQuery, which in turn loops over the list of SqlConnection objects (see Figure 4).
If you look at the code, you see that a copy of each connection object
is made first; this is to avoid any potential collisions if the client
code makes a call to this method multiple times (a connection can only
make one call at a time). Then, for each connection, the code calls ExecuteSingleQuery, which is the method in the shard library that makes the call to the database.
The ExecuteShardQuery method is designed to call the ExecuteSingleQuery
method multiple times in parallel using the TPL. The TPL provides many
useful methods to easily handle parallel processing without having to
manage threads. The shard library uses Parallel.ForEach,
which is a construct that allows the code to execute an inline method
concurrently, and automatically adjusts the number of threads depending
on your computer's hardware configuration. So, the more CPUs you have,
the faster the following code executes, if you have enough connections
to loop through. Note, however, that you need to lock the data object
(line 5), which is a DataTable, because it could be accessed by other executing threads. Finally, the Merge method on the DataTable
object concatenates resultsets from the various connections. After the
loop has executed, the resulting data object has the list of records
retrieved from the shard, in no guaranteed order:
1) Parallel.ForEach(connections,
2) delegate(SqlConnection c)
3) {
4) DataTable dt = ExecuteSingleQuery(command, c, exceptions);
5) lock (data)
6) data.Merge(dt, true, MissingSchemaAction.Add);
7) }
8) );
The following code is a
simplified version of the actual sample application. (For clarity, some
code that calculates execution time and performs exception handling has
been removed.) Line 4 sets the command text to be executed, such as a SELECT statement, and line 5 executes it against the shard. Instead of calling ExecuteReader, the code calls ExecuteShardQuery to use the shard. Line 7 binds the resulting DataTable and displays the records returned by the shard:
1) SqlCommand cmd = new SqlCommand();
2) DataTable dataRes = new DataTable();
3)
4) cmd.CommandText = this.textBox1.Text;
5) dataRes = cmd.ExecuteShardQuery();
6)
7) dataGridView2.DataSource = dataRes;
Figure 5 shows the outcome of this code. The SELECT
statement is designed to return database object names and types.
Executing the statement against the shard performs as expected. However,
notice that an extra column has been added to the display: __guidDB__.
This is the name of the GUID column introduced previously. This column
doesn't help much for reading, but it enables updates and deletes, as
you see later.
The GUID provided is unique for
each database connection, as long as one of the key parameters is
different in the connection string of each connection. It's added by the
ExecuteSingleQuery method described
previously. Within this method, a column is added in front of all the
others, which carries the GUID. In the following code extract, line 3
creates the data column of type string,
and line 4 sets its default value to the connection's GUID. Line 7
fills the data table with the query's result, along with the added GUID
column. The following is the logic used to add this GUID:
1) // Add the connection GUID to this set of records
2) // This helps us identify which row came from which connection
3) DataColumn col = dt.Columns.Add(_GUID_, typeof(string));
4) col.DefaultValue = connection.ConnectionGuid();
5)
6) // Get the data
7) da.Fill(dt);
4. Caching
To minimize roundtrips to
the source databases, the shard library provides an optional caching
mechanism. The caching technique used in this library offers basic
capabilities and can be extended to address more complex scenarios. The
objective of this library is to cache the entire DataTable of each database backend whenever requested. Figure 6
shows the logical decision tree of the caching approach. It's important
to note that this library calculates a cache key based on each
parameter, the parameter value, each SQL statement, and the database's
GUID.
The effect of the cache is
visible when you connect to SQL Azure. Considering that connecting to a
SQL Azure database takes up to 250 milliseconds the first time, memory
access is significantly faster. The importance of the cache increases as
the number of records increases and the number of databases increases
in the shard.
The cache provided by this
library also provides a time to live (TTL) mechanism that implements an
absolute expiration or a sliding expiration scheme. An absolute
expiration resets the cache automatically at a specific time in the
future, whereas the sliding setting moves the expiration time if the
cache items are used before expiring. The following code shows how the
caching is implemented. Line 1 creates a CacheItemPolicy
used to define the behavior of the cache. Line 3 implements the sliding
window cache, and line 5 implements the absolute approach:
1) CacheItemPolicy cip = new CacheItemPolicy();
2) if (UseSlidingWindow)
3) cip.SlidingExpiration = defaultTTL;
4) else
5) cip.AbsoluteExpiration =
new DateTimeOffset(System.DateTime.Now.Add(defaultTTL));
6) MemoryCache.Default.Add(cacheKey, dt, cip);
You can enhance this caching technique in multiple ways. For example, the DataTable
object stored in the cache can be compressed when it contains many
rows. Compression algorithms tend to increase latency, but the overall
performance benefits may be worth a slight delay.
Another way to enhance this
caching storage is to create different cache containers, so you can
control at a more granular level which container holds which kind of
data. Doing so lets you control a different setting per container, for
example; or you may decide to always compress one cache container but
not another.
Finally, the cache provided in
this library isn't distributed; it's local to the machine running the
library. If you need to develop a more robust cache, consider looking
into the Windows Server AppFabric; its caching technology provides
enterprise-level capabilities.