DATABASE

SQL Azure : Managing a Shard (part 2) - Working with Partial Shards

2/5/2011 5:19:56 PM

3. Working with Partial Shards

Note that building a shard isn't an all-or-nothing approach. You can easily create a partial shard for a set of tables. Depending on how your code is structured, you may or may not need to build logic that uses the shard library, depending on which tables you need to access. This logic is best built in a data access layer (DAL), where the physical organization of tables is separated from the organization of business objects.

For example, you can design an application that consumes business objects directly. These business objects in turn consume command objects, which are specialized routines smart enough to load data in memory structures by calling execution objects. Figure 10-13 shows the Authors object calling two command objects that load data from two separate libraries: the standard ADO.NET library and the shard library. The complexity of determining which library to call is deferred to the lowest level possible, protecting the application and business objects from database structural changes.

Figure 5. Example application design implementing a partial shard

4. Managing Transaction Consistency

Because this library is meant for SQL Azure, and distributed transactions aren't supported in SQL Azure, the shard library doesn't offer transactional consistency by default. But you should look carefully at your transactional needs and what this means to your application design.

You can add transactional capabilities in the shard library fairly easily by changing the ExecuteShardNonQuery and ExecuteParallelRoundRobinLoad methods. To do so, you need to add a separate transaction context to all connection objects and commit them in a loop at the end of the last execution. If any exception occurs, you must roll back all the changes.

NOTE

As mentioned earlier, the shard library is an open-source project and is likely to evolve over time. Check for the latest release to see which features are supported.

5. Managing Foreign Key Constraints

Another interesting issue to consider in shard databases is related to foreign key constraints. Because the shard library proposed in this book splits tables horizontally, you may quickly realize that maintaining referential integrity can be challenging.

To maintain relational integrity, the following concerns apply:

  • Data duplication. Because you don't know which records are where in the shard, the parent tables needs to be duplicated in every database. For example, a table that contains the list of states (Florida, Illinois, and so on) may need to be replicated across all databases.

  • Identity values. Adding records in one database can't be easily replicated across to other databases. Thus, using an identity value as the primary key may be difficult because you aren't guaranteed to have the same value in all databases in the shard. For example, the StateID value for Florida may be 10 in one database and 11 in another.

When it comes to data duplication, you can either treat the parent tables as overhead and duplicate them across databases, allowing you to maintain strong referential integrity (RI), or sacrifice RI in the database by sharding the parent tables as well. If you decide to shard parent tables, you can no longer enforce RI in the database; but you may still be able to enforce RI in your code by adding RI constraints to your DataTable objects. You can do so by creating a DataRelation object in the DataTable's ParentRelations collection. For example, the following code adds RI to the DOCS and AUTHORS DataTable objects:

1) SqlCommand cmd1 = new SqlCommand("SELECT * FROM Authors");
2) SqlCommand cmd2 = new SqlCommand("SELECT * FROM Docs");
3) DataTable authors = ExecuteShardQuery(cmd1);
4) DataTable docs = ExecuteShardQuery(cmd2);
5) DataRelation dr = new DataRelation("RI",
6) authors.Columns["authorId"],
7) docs.Columns["authorId"]);
8) docs.ParentRelations.Add(dr);

The issue with identity values lies in the fact that an automatic identity is created for each record. But because the tables are split across databases, you aren't guaranteed to have the same values over time. To solve this issue, you need to create RI rules that depend not on identity values, but on codes. In the case of the table that stores states, you create a StateCode column (that stores FL for Florida) and use that column as your primary key and in your RI rules. This ensures that all databases in the shard use the same values to maintain integrity.

Other  
  •  SQL Server 2008 : OPENXML, sp_xml_preparedocument, and sp_xml_removedocument
  •  SQL Server 2008 : Retrieving and Transforming XML Data
  •  SQL Azure: Building a Shard (part 4) - Updating and Deleting Records in the Shard & Adding Records to the Shard
  •  SQL Azure: Building a Shard (part 3) - Reading Using the Shard & Caching
  •  SQL Azure: Building a Shard (part 2) - Managing Database Connections
  •  SQL Azure: Building a Shard (part 1) - Designing the Shard Library Object
  •  SQL Azure: Designing for High Performance - General Performance Concepts
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Creating and Altering Tables
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Managing Internationalization Considerations
  •  Programming Microsoft SQL Server 2005 : Deployment (part 2) - Testing Your Stored Procedures
  •  Programming Microsoft SQL Server 2005 : Deployment (part 1) - Deploying Your Assembly
  •  Programming Microsoft SQL Server 2005 : CLR Stored Procedures and Server-Side Data Access
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 3) - Using the Resource Governor
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 2)
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Controlling Execution Plans (part 1)
  •  Reporting Services with SQL Azure : Deploying the Report & Creating a Subreport
  •  Reporting Services with SQL Azure : Creating the Report Design
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 2) - Using RANK, DENSE_RANK and NTILE
  •  SQL Server 2008 : Explaining Advanced Query Techniques - Applying Ranking Functions (part 1) - Using ROW_NUMBER
  •  SQL Server 2008 : Demystifying Data Types - Computed Columns
  •  
    Most View
    Compact Digital Cameras Under $300 (Part 2) - BenQ LM100
    Windows Server 2003 : Configuring a Windows IPSec Policy (part 1) - Using the IPSec Policy Wizard to Create a Policy
    Connectivity Matters
    HP Spectre XT Touchsmart Review - Everything Is Fine Except Battery Life (Part 2)
    Microsoft Dynamics AX 2009 : Building Lookups - Creating an automatic lookup
    Optimus Vu - Battery Test
    iPhone Accessories : Gorillamobile, Olloclip Macro Len,Belkin LiveAction Camera Grip, Vtec Zoom Lens
    Tough Phones : Work or play in harsh environments
    The Great Software Payola (Part 3)
    DirectX 10 : The 2D Resurgence - Getting the Sprites Moving
    Top 10
    G-360 And G-550 Power Supply Devices Review (Part 4)
    G-360 And G-550 Power Supply Devices Review (Part 2)
    Canon IXUS 140 Camera - Great Color Reproduction
    Nikon Coolpix S5200 Camera - 10fps Continuous Shooting Mode
    Corsair Neutron GTX 240GB - A Fast Performing SSD
    G-360 And G-550 Power Supply Devices Review (Part 3)
    G-360 And G-550 Power Supply Devices Review (Part 1)
    OCZ Vector 256GB - One Of The Dominant Names In SSD
    Don’t Pay For Office 2013 (Part 2)
    Don’t Pay For Office 2013 (Part 1)