SQL Azure : Managing a Shard (part 2) - Working with Partial Shards

2/5/2011 5:19:56 PM

3. Working with Partial Shards

Note that building a shard isn't an all-or-nothing approach. You can easily create a partial shard for a set of tables. Depending on how your code is structured, you may or may not need to build logic that uses the shard library, depending on which tables you need to access. This logic is best built in a data access layer (DAL), where the physical organization of tables is separated from the organization of business objects.

For example, you can design an application that consumes business objects directly. These business objects in turn consume command objects, which are specialized routines smart enough to load data in memory structures by calling execution objects. Figure 10-13 shows the Authors object calling two command objects that load data from two separate libraries: the standard ADO.NET library and the shard library. The complexity of determining which library to call is deferred to the lowest level possible, protecting the application and business objects from database structural changes.

Figure 5. Example application design implementing a partial shard

4. Managing Transaction Consistency

Because this library is meant for SQL Azure, and distributed transactions aren't supported in SQL Azure, the shard library doesn't offer transactional consistency by default. But you should look carefully at your transactional needs and what this means to your application design.

You can add transactional capabilities in the shard library fairly easily by changing the ExecuteShardNonQuery and ExecuteParallelRoundRobinLoad methods. To do so, you need to add a separate transaction context to all connection objects and commit them in a loop at the end of the last execution. If any exception occurs, you must roll back all the changes.


As mentioned earlier, the shard library is an open-source project and is likely to evolve over time. Check for the latest release to see which features are supported.

5. Managing Foreign Key Constraints

Another interesting issue to consider in shard databases is related to foreign key constraints. Because the shard library proposed in this book splits tables horizontally, you may quickly realize that maintaining referential integrity can be challenging.

To maintain relational integrity, the following concerns apply:

  • Data duplication. Because you don't know which records are where in the shard, the parent tables needs to be duplicated in every database. For example, a table that contains the list of states (Florida, Illinois, and so on) may need to be replicated across all databases.

  • Identity values. Adding records in one database can't be easily replicated across to other databases. Thus, using an identity value as the primary key may be difficult because you aren't guaranteed to have the same value in all databases in the shard. For example, the StateID value for Florida may be 10 in one database and 11 in another.

When it comes to data duplication, you can either treat the parent tables as overhead and duplicate them across databases, allowing you to maintain strong referential integrity (RI), or sacrifice RI in the database by sharding the parent tables as well. If you decide to shard parent tables, you can no longer enforce RI in the database; but you may still be able to enforce RI in your code by adding RI constraints to your DataTable objects. You can do so by creating a DataRelation object in the DataTable's ParentRelations collection. For example, the following code adds RI to the DOCS and AUTHORS DataTable objects:

1) SqlCommand cmd1 = new SqlCommand("SELECT * FROM Authors");
2) SqlCommand cmd2 = new SqlCommand("SELECT * FROM Docs");
3) DataTable authors = ExecuteShardQuery(cmd1);
4) DataTable docs = ExecuteShardQuery(cmd2);
5) DataRelation dr = new DataRelation("RI",
6) authors.Columns["authorId"],
7) docs.Columns["authorId"]);
8) docs.ParentRelations.Add(dr);

The issue with identity values lies in the fact that an automatic identity is created for each record. But because the tables are split across databases, you aren't guaranteed to have the same values over time. To solve this issue, you need to create RI rules that depend not on identity values, but on codes. In the case of the table that stores states, you create a StateCode column (that stores FL for Florida) and use that column as your primary key and in your RI rules. This ensures that all databases in the shard use the same values to maintain integrity.

  •  SQL Azure : Managing a Shard (part 1) - Managing Exceptions & Managing Performance
  •  SQL Server 2008 : OPENXML, sp_xml_preparedocument, and sp_xml_removedocument
  •  SQL Server 2008 : Retrieving and Transforming XML Data
  •  SQL Azure: Building a Shard (part 4) - Updating and Deleting Records in the Shard & Adding Records to the Shard
  •  SQL Azure: Building a Shard (part 3) - Reading Using the Shard & Caching
  •  SQL Azure: Building a Shard (part 2) - Managing Database Connections
  •  SQL Azure: Building a Shard (part 1) - Designing the Shard Library Object
  •  SQL Azure: Designing for High Performance - General Performance Concepts
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Creating and Altering Tables
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Managing Internationalization Considerations
  •  Programming Microsoft SQL Server 2005 : Deployment (part 2) - Testing Your Stored Procedures
  •  Programming Microsoft SQL Server 2005 : Deployment (part 1) - Deploying Your Assembly
  •  Programming Microsoft SQL Server 2005 : CLR Stored Procedures and Server-Side Data Access
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 3) - Using the Resource Governor
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 2)
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 1)
  •  Reporting Services with SQL Azure : Deploying the Report & Creating a Subreport
  •  Reporting Services with SQL Azure : Creating the Report Design
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 2) - Using RANK, DENSE_RANK and NTILE
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 1) - Using ROW_NUMBER
    Top 10
    Exchange Server 2010 : Active Manager - Automatic database transitions & Best copy selection
    Exchange Server 2010 : Breaking the link between database and server
    iPhone 3D Programming : Drawing an FPS Counter (part 2) - Rendering the FPS Text
    iPhone 3D Programming : Drawing an FPS Counter (part 1) - Generating a Glyphs Texture with Python
    Mobile Application Security : Mobile Geolocation - Geolocation Methods & Geolocation Implementation
    Mobile Application Security : SMS Security - Application Attacks & Walkthroughs
    Transact-SQL in SQL Server 2008 : Table-Valued Parameters
    Transact-SQL in SQL Server 2008 : New date and time Data Types and Functions
    Windows 7 : Working with User Accounts (part 2)
    Windows 7 : Working with User Accounts (part 1)
    Most View
    Sync Your iPad with iTunes : Troubleshooting iTunes and the Sync
    Web Security Testing : Manipulating Sessions - Analyzing Session Identifiers with Burp
    SQL Server 2005 : Importing and Exporting Data
    iPhone Application Development : Implementing a Custom Picker View (part 4) - Tweaking the Picker UI
    Programming the Mobile Web : HTML 5 (part 2) - The canvas Element
    Exchange Server 2010 : Administering Mailbox Content - Monitor and Restrict Communication (part 2) - Apply Common Monitoring and Restriction Scenarios
    SQL Server 2008 Command-Line Utilities : The bcp Command-Line Utility
    SQL Server 2008 : Leveraging the Microsoft Sync Framework
    Mobile Phone Game Programming : Getting to Know Mobile Platforms
    Migrating from Legacy SharePoint to SharePoint Server 2010 : Using Visual Upgrade
    The Art of SEO : How Links Influence Search Engine Rankings (part 1) - The Original PageRank Algorithm
    Exchange Server 2007 : Administrate Transport Settings - Manage Connectors (Send and Receive)
    Creating and Managing Views in SQL Server 2008 : Partitioned Views
    Reporting Services with SQL Azure : Starting a SQL Azure–Based Report
    Evidence and Code Identity: Extending the .NET Framework
    SQL Azure : Tuning Techniques (part 2) - Connection Pooling & Execution Plans
    Understanding Active Directory Certificate Services (AD CS) in Windows Server 2008 R2
    Advanced ASP.NET : LINQ and the Entity Framework - LINQ Basics
    Database Availability Group Replication in Exchange Server 2010 : Load Balancing in Exchange Server 2010
    Windows 7 : Maintaining Your System Configuration (part 1) - Configuring the Computer Name and Membership