Azure Table Storage (part 2) - Accessing Table Storage, Choosing a PartitionKey, Exception handling

9/9/2012 9:33:43 PM

Accessing Table Storage

For security purposes, each request to Table Storage must be authenticated using the 256-bit shared keys created when we added the storage service. Table Storage can be directly accessed via REST, or queried using a subset of LINQ. The REST interface allows languages such as Java, PHP, and Ruby to consume Table Storage, while client libraries for ADO.NET Data Services are limited to the .NET languages.

Each request made via the REST API has a different set of required headers, and the body of each request is Atom format. Queries made via the REST API will return either 1,000 records, or run for 5 seconds (a total of 30 seconds from scheduling/processing to completion). If a query crosses these boundaries, a continuation token will be returned, which can be used in a subsequent request.

An important API header property is x-ms-version. Just as .NET allows multiple versions of the same libraries to be coexist in the Global Assembly Cache (GAC), multiple versions of the Table Storage API will also coexist. This is an optional property, but if this property is left blank, the default library will be the most basic library available. If we are targeting specific API features in our application, or want to ensure no part of our application will break when the API is updated, we need to include this property. The value is a date stamp, so the header property for the April 2009 API would read x-ms-version: 2009-04-14.

Third-party products are being developed that allow us to work directly with tables in a more friendly way than coding. 

Working with tables

The client class for working with tables via .NET and the Azure Managed Library is Microsoft.WindowsAzure.StorageClient.CloudTableClient. The methods listed in the following table are methods of this class, unless specified otherwise.

The base URI for accessing tables via the REST API is http://<myaccount>.table.core.windows.net/Tables. The different HTTP verbs (POST, GET, DELETE) are used to determine the action, and parameters (such as the table name) are specified in the request body.

Table names must follow a naming convention:

  • Names can only be alphanumeric

  • Length must be between 3 to 63 characters

  • The name cannot begin with a number

  • Names are case insensitive

Operation REST API Client Library
Creating tables We use the POST method to the base URI (shown above) to create a new table. The table name is in<TableName> element of the request body. The CreateTable(<tablename>) method creates a blank table, but will fail if the table already exists. CreateTableIfNotExist(<tablename>)will create a blank table only if it does not exist. If we want our tables to be based on a class in our application, we can use the CreateTablesFromModel method. If we want or need to create tables asynchronously, we can use the BeginTableCreate or BeginCreateTableIfNotExists method. Each of these have a corresponding End method as well.
Querying a list of tables Using the GET method, we can retrieve a list of tables in our storage account. There is no request body for this operation. The ListTables method returns a list of the tables in our storage account. If we want to check for the existence of a particular table, we can use the DoesTableExist method. For asynchronous methods, we can utilize the BeginListTablesSegmented method.
Deleting a table The DELETE method is used to delete a single table. The table name is specified in the URI such as http://<myaccount>.table.core.windows.net/Tables('<mytable>'). Not surprisingly, the DeleteTable or DeleteTableIfExist method is used to delete a table. For asynchronmously deleting tables, we utilize the BeginDeleteTable and BeginDeleteTableIfExist methods.

A note about table deletion: The actual table deletion is not immediate. The table is merely marked for deletion and becomes immediately inaccessible and the actual deletion occurs during garbage collection at a later time. Depending on the size of the table, it can take at least 40 seconds to delete the table. If we try to access a table while it is being deleted, we'll receive a status code of 409 in the response, along with an error message alerting that the table is being deleted.

Working with entities

The base URI for working with entities via the REST API is http://<account>.table.core.windows.net/<tablename>. Note that the specific table name is specified as part of the URI, unlike when we were working with tables. Entity properties are specified in the request body, which is in Atom format.

It's not possible to work directly with individual properties. Instead, we must retrieve the entity containing the property, manipulate the property, and then update the entity back in the cloud.

Operation REST API Client Library
Inserting entities The POST method is used to insert a new entity into the table specified in the URI. Entity properties are sent as child elements of the<properties> element. After we create a DataServiceContext to our table, we then use the AddObject method to add the entity, and the SaveChanges method to add the entity to the table.
Querying entities Querying entities from a table uses the GET method. The REST API has a simple query syntax, with either the keys or a filter string passed in the URI. Because values are passed in the querystring, the following characters must be encoded before the filter string is assembled: /, ?, :, @, &, =, + , and $. There is no request body, as the entire request is contained in the URI.

If the PaginationKey and RowKey are known, a specific entity can be retrieved using the following URI:

http://<account>.table.core.windows.net/<table>(PartitionKey='<partitionkey>', RowKey='<rowkey>').

If we want to retrieve a filtered list of entities, we can use the following URI:


Recall that queries made via the REST API have boundaries, and exceeding the boundaries will result in a partial recordset and a continuation token being returned. 

The response body will include an opaque property called an ETag. The ETag is considered opaque because we cannot alter its value, nor should we use this value as an identifier in our code (as there is a date/time component, the ETag won't have a consistent value). The ETag is used by the back-end of the REST API for concurrency when making changes to entities.
When we query ADO.NET Data Services using LINQ via a client library, there are no query boundaries. The query will return the full recordset, and will process up to the configured global timeout values. 
Updating entities: Azure uses optimistic concurrency, and assumes that updates will not affect each other, so resources are not locked. This is similar to optimistic concurrency in a database system. We use the PUT method to a specific entity, as defined by the PartitionKey and RowKey combination. The URI looks just the same as when querying a specific entity:

http://<account>.table.core.windows.net/<table>(PartitionKey="<PartitionKey>", RowKey="<RowKey>")

The entity is contained in the request body. The ETag returned as part of the initial query must also be returned for concurrency. If the ETag returned matches the one on the entity, the update will be performed. If the ETag does not match, this indicates that the entity was changed since it was retrieved, and a Precondition Failed (response code 412) will be returned. Should this happen, we need to retrieve the latest version of the entity, perform the modifications again, and resubmit the update. An update can be forced by setting the If-Match request header parameter to the wildcard character "*".
The DataServiceContext maintains the Etags for us, and handles the concurrency checking. After retrieving the entity or entities we want to modify, we make our changes and then update the data context using the UpdateObject method. The SaveChanges method then propagates the changes back to the table.
Merging Entities: The merge operation is used to combine the properties sent in the request with the properties of a specific entity. The merge method doesn't remove or change properties, it only adds them to an existing entity. We use the MERGE method to combine two entities, as defined by the PartitionKey and RowKey combination. The URI looks just the same as when querying a specific entity:

http://<account>.table.core.windows.net/<table>(PartitionKey="<PartitionKey>", RowKey="<RowKey>")

The request body should contain the properties to be merged with the entity referenced in the URI. Concurrency is handled the same as with updates an ETag is verified before an entity is merged, and if the ETags match, the merge is performed.
As a merge is a form of an update, we use the same methods to merge as we do to update.
Deleting entities: As with tables, entity is first marked for deletion and made inaccessible, then deleted via garbage collection at a later time. We use the DELETE method to remove a specific entity, as defined by the PartitionKey and RowKey combination. The URI looks just the same as when querying a specific entity:

http://<account>.table.core.windows.net/<table>(PartitionKey="<PartitionKey>", RowKey="<RowKey>")

There is no request body, as the entity to be removed is identified specifically in the querystring. Concurrency is handled the same as with updates. An ETag is checked prior to deletion and, if the ETags match, the entity is marked for deletion.
In our DataService Context, we call the DeleteObject method to remove the entity from the context, and then the SaveChanges method to propagate the change to the cloud.

We must follow naming rules for naming properties. Property names can contain alphanumeric characters and underscore, but cannot contain any extended or special characters. Property values can be one of eight types, with limitations on value ranges or size as described in the following table:

Value type Description
Binary A byte array, up to 64 KB in size. For large binary objects, consider using Blob Storage and making a pointer to the blob a property value.
Boolean Boolean true/false value type.
DateTime UTC time. As this is a 64-bit value, the valid range of dates is 1/1/1601 to 12/31/9999.
Double A floating point value type. This is the only value type that can be used for decimal numbers. This is a 64-bit type.
GUID A standard 128-bit globally unique identifier.
Int A 32-bit signed integer. It has values ranging from -2,147,483,648 to 2,147,483,647.
Int64 A 64-bit signed integer. It has values ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
String Encoded UTF-16. Strings can be of a maximum size of 64 KB.

If we are using LINQ to query Table Storage, the property values will be inferred when the data are returned. However, if we are utilizing the REST API, property values will be returned as string data, which we will need to convert to the proper type in our application.

Entity Group Transactions

The examples we have seen in earlier sections focus on operations against a single entity. But what if we want to update all entities having the same partition key? Using the client library, we can perform multiple entity transaction. In our data context, we can queue a number of create/update/delete commands before committing the changes with the SaveChanges method. There are a few rules and limitations regarding Entity Group Transactions:

  • Each command group can contain up to 100 commands.

  • Operations can be performed only on entities with the same partition key.

  • As the name implies, the commands are executed as an all-or-nothing transaction. If one command fails, the entire set is rolled back.

  • The entire group can be only 4 MB in size. This means insertions of a large number of entities may need to be split into several groups.

  • An entity can appear only once. We cannot insert an entity at the beginning of the group and then update it later.

  • Commands are executed in the order they were inserted into the group.

  • Concurrency is checked on the server. If an entity's ETags do not match, no change will be made and the entire command group fails.

Entity Group Transactions can be performed with either the REST API or the .NET Client Library. 

Choosing a PartitionKey

In order to store the massive amount of data and quickly return queries against this data, tables may be partitioned across thousands of nodes. This is where the partition key fits into the storage scheme all entities with the same partition key will be kept together. Different entities from the same table may be served from different nodes, but every entity with the same partition key will be served from the same node. In our Contacts example we have seen earlier, all the BillGates records will be kept together, and all of the SteveJobs records would be kept together, which may be a different node than the BillGates records.

The Azure Fabric constantly monitors traffic to our partitions, and replicates active partitions to multiple nodes in order to satisfy traffic demands. Selecting a partition key becomes an important balance between query performance and response time. The smaller our partitions, the more nodes our table can be spread over. However, if we split apart entities that are frequently returned in the same resultset, we can degrade query performance.

Microsoft offers some advice on choosing a good PartitionKey:

  • Identify the properties that will most commonly be used in filters. This is our list of important properties.

  • Narrow the list of important properties down to a couple of the most important properties. These are our key candidates.

  • Rank the key candidates in order of importance. If there is only one key property, that's our PartitionKey. If there are two, they should become our PartitionKey and RowKey. If there are more than two, the key properties can be concatenated into single keys with composite values.

  • If the PartitionKey cannot be guaranteed to be unique, add a unique identifier to the key.

  • Finally, a reality check is the chosen PartitionKey likely to result in entities that are too large or too small?

Final confirmation of a good key choice will come by choosing a sample dataset, performing stress tests on our table, and then tweaking the PartitionKey if necessary.

Exception handling

Designing a robust application means handling the exceptions and errors that may arise at regular intervals. The following sections cover some of the more common categories of exceptions that may be encountered. How these exceptions are dealt with depends on the design and purpose of the application. Exception handling is entirely in the hands of the application developer.

Retry on exceptions

If the data matters, our application should retry the operation when the response code indicates something other than success. For applications with an end user, it may be sufficient to guide the user through a series of steps to retry the operation. For unattended applications, local retry queues, event logs/notifications, and increasing times between attempts may need to be implemented.

Network issues and connections being closed can result in an operation failing to reach the server. And although these should be rare, timeout exceptions can occur while an entity is being updated or propagated. The time interval between attempts should be increased if these errors occur multiple times.

Not every exception should have a retry. If we're attempting to delete an entity, and we receive a response that the entity does not exist, there is no need to reattempt the deletion.

Exceptions on retry

It's very possible that a server-side operation may succeed, but a network or timeout error prevents proper notification of success. A retry will then result in an error message that indicates the first operation was successful. For instance, if we successfully insert an entity, and a network timeout results in our application retrying the operation, we'll receive an entity already exists error. It would not be a good idea to retry the insertion in this circumstance because we'll be in a never-ending loop. One way to handle this situation gracefully is to query the table before an insert is attempted, to make sure the entity does not already exist.

Concurrency conflicts

In update and delete operations, an ETag mismatch will result in a Precondition Failed response. In this situation, we need to either retrieve the updated entity, make our modification, and then attempt the update again, or cancel our update altogether.

Table errors and HTTP response codes

When using the REST API, exception information is contained in two places. Each table error is mapped to an HTTP status code in the header of the response. The HTTP status codes are standard codes, and are not as informative as the table error code in the response body. The header codes are useful for determining the result of an operation, but the<ExceptionDetails> in the response body should be manifested to the user, or written to the application logs.

The client library receives the more detailed message as part of the thrown exception.

  •  Cheetah3D 6 : Britain's next top modeler
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 5) - Site Design,Client Architecture,Multilanguage Scenarios
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 4) - Capacity Planning,Site Boundaries,Roaming
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 3) - Developing the Server Architecture
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 2) - Configuration Manager 2007 Roles
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 1) - Developing the Network Infrastructure
  •  System Center Configuration Manager 2007 : Operating System Deployment Planning, Out of Band Management Planning
  •  Visual Studio 2010 IDE : Customizing Visual Studio 2010
  •  Visual Studio 2010 IDE : Exporting Templates
  •  System Center Configuration Manager 2007 : Certificate Requirements Planning, Windows Server 2008 Planning
  •  System Center Configuration Manager 2007 : Planning for Internet-Based Clients
  •  Active Directory Domain Services 2008 : Automatically Populate a Migration Table from a Group Policy Object
  •  Active Directory Domain Services 2008 : Create a Migration Table
  •  Microsoft Content Management Server : Developing Custom Properties for the Web Part
  •  Microsoft Content Management Server : Building SharePoint Web Parts - Creating the Web Part, Defining Custom Properties for the Web Part
  •  Microsoft Content Management Server : Building SharePoint Web Parts - The SharePoint MCMS Navigation Control, Creating the Web Part Project
  •  Active Directory Domain Services 2008 : Search Group Policy Objects
  •  Active Directory Domain Services 2008 : Export a Starter GPO, Import a Starter GPO
  •  The Very Successful Hardware That Microsoft Has Ever Produced
  •  Xen Virtualization - Managing Xen : Virtual Machine Manager
    Most View
    The Perfect Enclosure (Part 3) : Corsair Graphite 600T, Corsair Vengeance C70
    Windows 8 Hardware (Part 2) : Lenovo Ideapad Yoga 13, Dell XPS 12
    Sigma 120-300mm F/2.8 DG OS HSM S Lens Review
    Fujifilm X-E1 - A Retro Camera That Inspires (Part 5)
    Learn How To... Protect And Copy Your Optical Discs
    The Truth About Free Trials (Part 2)
    Musical Fidelity M1 CLiC Universal Music Controller (Part 2)
    Microsoft Content Management Server Development : A Date-Time Picker Placeholder Control (part 1)
    What If You Don’t Like Pi? (Part 2)
    Nokia Lumia 920 - Windows Phone 8 And Magic Camera (Part 1)
    Top 10
    Windows Management and Maintenance : The Windows 7 Control Panel (part 11) - Region and Language, System
    Windows Management and Maintenance : The Windows 7 Control Panel (part 10) - Programs and Features
    Windows Management and Maintenance : The Windows 7 Control Panel (part 9) - Notification Area Icons, Performance Information and Tools
    Windows Management and Maintenance : The Windows 7 Control Panel (part 8) - Fonts
    Windows Management and Maintenance : The Windows 7 Control Panel (part 7) - Ease of Access Center
    Windows Management and Maintenance : The Windows 7 Control Panel (part 6) - Devices and Printers
    Windows Management and Maintenance : The Windows 7 Control Panel (part 5) - AutoPlay
    Windows Management and Maintenance : The Windows 7 Control Panel (part 4) - AutoPlay
    Windows Management and Maintenance : The Windows 7 Control Panel (part 3) - Action Center
    Windows Management and Maintenance : The Windows 7 Control Panel (part 2)