Windows Azure : Storing static reference data with dynamic data

2/21/2011 9:00:46 AM

In the previous section, we looked at how you could represent static data for the purposes of data retrieval. This is only one side of the picture, because typically you’ll want to associate the static data with some dynamic data.

For example, people viewing the product details web page shown in figure 1 will hopefully purchase the shirt displayed. When they do, how should you store that data so that you can easily retrieve it?

Depending on the implementation of the web page, you can either allow the user to purchase the item directly or add the item to a shopping cart. In either case, the method of storing the data will be the same, so let’s look at storing items in a shopping cart.

1. Representing the shopping cart in SQL Azure

You’re probably familiar with relational databases, so we’ll first look at how you can store shopping cart data in SQL Azure.

The Shopping Cart Data Model

The shopping cart can be persisted across sessions, and if the user is a registered logged-in user, you can associate this account with their user ID. Figure 1 represents a typical data model for a shopping cart.

Figure 1. Data model for a shopping cart

In figure 1 , the shopping cart is represented as two tables (ShoppingCart and ShoppingCartItems). The ShoppingCart table represents the shopping cart for each user, and each item in the shopping cart is stored in the ShoppingCartItems table. For each item in the cart, the ShirtId, MaterialId, and SizeId are stored, and the appropriate foreign-key relationships between tables are established.

If the website user is a registered user, the UserId would be stored in the ShoppingCart table; if the user is unregistered, you could use the session ID.

Note

Although this data model represents a shopping cart, you could model an orders table using the same data structure.

Retrieving the Data

Because SQL Azure is a relational database, you could query across multiple tables by using JOIN clauses in your SQL statements, if required. For example, to return all items in the shopping cart for UserId 12345, including the shirt name, size, and material, you could issue the following SQL query:

SELECT sci.ShoppingCartItemId,
       s.ShirtName,
       s.Description,
       s.Price,
       sz.SizeId,
       sz.SizeName,
       m.MaterialId,
       m.MaterialName
FROM ShoppingCart sc
JOIN ShoppingCartItems sci ON sci.ShoppingCartId = sc.ShoppingCartId
JOIN Shirts s ON s.ShirtId = sci.ShirtId
JOIN SizeTypes sz ON sz.SizeId = sci.SizeId
JOIN Materials m ON m.MaterialId = sci.MaterialId
WHERE sc.UserId = '12345'

Because the shopping cart table will never hold a large amount of data, and the static data is held in an in-memory cache, there’s no need to return the static data as part of your SQL query. You can, therefore, use the following query instead:

SELECT sci.ShoppingCartItemId,
       s.ShirtName,
       s.Description,
       s.Price,
       sci.SizeId,
       sci.MaterialId,
FROM ShoppingCart sc
JOIN ShoppingCartItems sci
   ON sci.ShoppingCartId = sc.ShoppingCartId
JOIN Shirts s ON s.ShirtId = sci.ShirtId
WHERE sc.UserId = '12345'

Not returning the static data (SizeTypes and Materials) both improves the performance and reduces the complexity of the query by reducing the number of joins.

Once the data is returned from the database, you can combine it with the in-memory cached data using LINQ to Objects.

Note

This strategy is perfectly acceptable when dealing with static reference data because the cached data tends to be very small. This would not be an acceptable strategy when dealing with millions of rows of data.

2. Partitioning the SQL Azure shopping cart

The data model in figure 14.5 looks like it’s only usable for single-server databases, but this isn’t strictly the case. That model will easily scale out, as we’ll explain in a second.

One of the issues with using SQL Azure is that currently there’s no built-in method of partitioning data across multiple servers. To avoid bottlenecking your application on a single database, you need to partition (or shard) your data across multiple servers in your application layer.

Splitting the Data Model Across Multiple Servers

As we said, the data model in figure 14.5 will work across multiple servers—you just have to be a little smart about how you do this. Because the shopping cart tables have no dependencies on any other part of the application, you can separate those tables into their own separate database.

In the application layer, you could therefore separate the shopping cart’s data-access layer methods and connection strings into their own layer. By keeping the functionality logically separated, you can split your application across multiple databases if required. In the Hawaiian Shirt Shop example, you could easily maintain separate databases for the shopping cart, orders, customers, products, and static data, if required.

Partitioning Data Further

In your web application, you might get to the point where the shopping cart database is bottlenecking and you need to partition the data further. This isn’t as difficult as it sounds.

Because the shopping cart data is only used by a single user and you never query across multiple users, you can easily partition out the data by user. For example, you might have 100,000 registered users who maintain a shopping cart. If you currently have a single database and wish to split it into two or more databases, you could use a partitioning function in your application:

If (userId.ToString()[0] < 'N')
{
   // Use Connection String for Database 1
}
else
{
  // Use Connection String for Database 2
}

In the preceding example, if the first character of the userId begins with N or later, they would use the second shopping cart database; otherwise, they’d use the first database. You could break this down into further partitions if required.

As stated earlier, this type of partitioning can only be done in the application layer at present. It’s planned that in future releases of SQL Azure there’ll be some partitioning built-in on the server side.

Let’s transition over to how we would implement the same concept, but using the Azure Table service instead.

3. Representing the shopping cart’s static data in the Table service

In the previous section, we looked at how you could store the shopping cart in SQL Azure and how you could scale it out horizontally if necessary. Although it’s possible to scale out a SQL Azure database, it still requires you to add some manual partitioning code at the application layer. We’ll now look at how you could represent the shopping cart table in the Table service, and at how you can use the built-in partitioning model to scale out your tables.

Due to the architecture of the Table service, there’s no facility to perform a server-side join between a Shirts table and the ShoppingCart table. This effectively leaves you two options when you have data that resides on two different tables:

Duplicate the data
Join the data on the client side

For now, we’ll ignore the duplicate data option (we’ll cover that in the next section), and we’ll focus on client-side data joining. But before we look at joining the data on the client side, let’s take a peek at how you could represent the ShoppingCart table in the Table service.

Shoppingcart Entity

In the SQL Azure implementation of our shopping cart data model, there were two tables (ShoppingCart and ShoppingCartItems). Because the Table service isn’t a relational database and doesn’t support joining between tables, you can represent the shopping cart as a single table (ShoppingCart). Within the ShoppingCart table you can store the entity as follows:

public class ShoppingCartItem : TableServiceEntity
{
   public string Shirt {get;set;}
   public string Material { get; set; }
   public string Size { get; set; }
}

Consider the definition of the ShoppingCartItem entity. Because both the material and size data are cached in memory, you can simply store a reference to the data (the row key for the material and size) and then perform a client-side join between the shopping cart entity and the cached versions of the Material and SizeType entities. Because the cached data is a small set of static reference data, and it’s being joined to a small set of shopping cart data, this technique is appropriate.

To join these two sets of data, you can define a new entity that will represent a strong version of your shopping cart item, as follows:

public class StrongShoppingCartItem
{
   public Shirt SelectedShirt { get; set; }
   public SizeType Size { get; set; }
   public Material Material { get; set; }
}

As you can see, this code represents the shopping cart item with a reference to each entity rather than the using an ID reference.

Now that you have the stronger version of the entity, you need to populate it, like this:

var materials = (IEnumerable<Material>)Cache["materials"];
var sizeTypes = (IEnumerable<SizeType>)Cache["sizeTypes"];

var shoppingCartItemContext = new ShoppingCartItemContext ();
var shoppingCartItems =
   shoppingCartItemContext.ShoppingCartItem.ToList();

var q = from shoppingCartItem in shoppingCartItems
        join sizeType in sizeTypes
           on shoppingCartItem.Size.RowKey equals sizeType.RowKey
        join material in materials
           on shoppingCartItem.Material.RowKey equals material.RowKey
        select new StrongShoppingCartItem
        {
           SelectedShirt = new Shirt(),
           Material = material,
           Size = sizeType
        };

The cool thing about the preceding query is that because the size type and materials are cached, you don’t need to make any extra table service calls to get the reference data. This is all performed in-memory using LINQ to Objects.

Warning

This technique is super cool for small datasets. Make sure you check your performance on large datasets, such as when you have millions of rows, because it may not meet your needs.

The previous example improved performance and saved money by joining static data. Although this works well for static data, it doesn’t work so well for nonstatic data—dynamic data or infrequently changing data.

Other

SQL Server 2008 : Managing Query Performance - Adding Hints Through Plan Guides

SQL Server 2008 : Managing Query Performance - Forcing a Specific Execution Plan

Programming Microsoft SQL Server 2005 : An Overview of SQL CLR - CLR Triggers

Programming Microsoft SQL Server 2005 : An Overview of SQL CLR - CLR Functions

SQL Server 2008 : Performance Tuning - Using Dynamic Management Views

SQL Server 2008 : Performance Tuning - Working with Database Tuning Advisor

SQL Server 2008 : Performance Tuning - Tracing with SQL Profiler

SQL Server 2008 : Performance Tuning - Working with Query Execution Plans

Externalizing BLOB Storage in SharePoint 2010 (part 2) - Installing and Configuring RBS & Migrating and Moving BLOBs Between BLOB Stores

Externalizing BLOB Storage in SharePoint 2010 (part 1)