3. Working with Partial Shards
Note that building a shard
isn't an all-or-nothing approach. You can easily create a partial shard
for a set of tables. Depending on how your code is structured, you may
or may not need to build logic that uses the shard library, depending on
which tables you need to access. This logic is best built in a data
access layer (DAL), where the physical organization of tables is
separated from the organization of business objects.
For example, you can
design an application that consumes business objects directly. These
business objects in turn consume command objects, which are specialized
routines smart enough to load data in memory structures by calling
execution objects. Figure 10-13
shows the Authors object calling two command objects that load data
from two separate libraries: the standard ADO.NET library and the shard
library. The complexity of determining which library to call is deferred
to the lowest level possible, protecting the application and business
objects from database structural changes.
4. Managing Transaction Consistency
Because this library is meant
for SQL Azure, and distributed transactions aren't supported in SQL
Azure, the shard library doesn't offer transactional consistency by
default. But you should look carefully at your transactional needs and
what this means to your application design.
You can add transactional capabilities in the shard library fairly easily by changing the ExecuteShardNonQuery and ExecuteParallelRoundRobinLoad
methods. To do so, you need to add a separate transaction context to
all connection objects and commit them in a loop at the end of the last
execution. If any exception occurs, you must roll back all the changes.
NOTE
As mentioned earlier, the
shard library is an open-source project and is likely to evolve over
time. Check for the latest release to see which features are supported.
5. Managing Foreign Key Constraints
Another interesting
issue to consider in shard databases is related to foreign key
constraints. Because the shard library proposed in this book splits
tables horizontally, you may quickly realize that maintaining
referential integrity can be challenging.
To maintain relational integrity, the following concerns apply:
Data duplication.
Because you don't know which records are where in the shard, the parent
tables needs to be duplicated in every database. For example, a table
that contains the list of states (Florida, Illinois, and so on) may need
to be replicated across all databases.
Identity values.
Adding records in one database can't be easily replicated across to
other databases. Thus, using an identity value as the primary key may be
difficult because you aren't guaranteed to have the same value in all
databases in the shard. For example, the StateID value for Florida may
be 10 in one database and 11 in another.
When it comes to data
duplication, you can either treat the parent tables as overhead and
duplicate them across databases, allowing you to maintain strong
referential integrity (RI), or sacrifice RI in the database by sharding
the parent tables as well. If you decide to shard parent tables, you can
no longer enforce RI in the database; but you may still be able to
enforce RI in your code by adding RI constraints to your DataTable objects. You can do so by creating a DataRelation object in the DataTable's ParentRelations collection. For example, the following code adds RI to the DOCS and AUTHORS DataTable objects:
1) SqlCommand cmd1 = new SqlCommand("SELECT * FROM Authors");
2) SqlCommand cmd2 = new SqlCommand("SELECT * FROM Docs");
3) DataTable authors = ExecuteShardQuery(cmd1);
4) DataTable docs = ExecuteShardQuery(cmd2);
5) DataRelation dr = new DataRelation("RI",
6) authors.Columns["authorId"],
7) docs.Columns["authorId"]);
8) docs.ParentRelations.Add(dr);
The issue with identity
values lies in the fact that an automatic identity is created for each
record. But because the tables are split across databases, you aren't
guaranteed to have the same values over time. To solve this issue, you
need to create RI rules that depend not on identity values, but on
codes. In the case of the table that stores states, you create a
StateCode column (that stores FL for Florida) and use that column as
your primary key and in your RI rules. This ensures that all databases
in the shard use the same values to maintain integrity.