Accessing Table Storage
For security
purposes, each request to Table Storage must be authenticated using the
256-bit shared keys created when we added the storage service. Table
Storage can be directly accessed via REST, or queried using a subset of
LINQ. The REST interface allows languages such as Java, PHP, and Ruby to
consume Table Storage, while client libraries for ADO.NET Data Services
are limited to the .NET languages.
Each
request made via the REST API has a different set of required headers,
and the body of each request is Atom format. Queries made via the REST
API will return either 1,000 records, or run for 5 seconds (a total of
30 seconds from scheduling/processing to completion). If a query crosses
these boundaries, a continuation token will be returned, which can be
used in a subsequent request.
An important API header property is x-ms-version. Just as .NET allows multiple versions of the same libraries to be coexist in the Global Assembly Cache (GAC),
multiple versions of the Table Storage API will also coexist. This is
an optional property, but if this property is left blank, the default
library will be the most basic library available. If we are targeting
specific API features in our application, or want to ensure no part of
our application will break when the API is updated, we need to include
this property. The value is a date stamp, so the header property for the
April 2009 API would read x-ms-version: 2009-04-14.
Third-party products are being
developed that allow us to work directly with tables in a more friendly
way than coding.
Working with tables
The client class for working with tables via .NET and the Azure Managed Library is Microsoft.WindowsAzure.StorageClient.CloudTableClient.
The methods listed in the following table are methods of this class,
unless specified otherwise.
The base URI for accessing tables via the REST API is http://<myaccount>.table.core.windows.net/Tables. The different HTTP verbs (POST, GET, DELETE) are used to determine the action, and parameters (such as the table name) are specified in the request body.
Table names must follow a naming convention:
Names can only be alphanumeric
Length must be between 3 to 63 characters
The name cannot begin with a number
Names are case insensitive
Operation
|
REST API
|
Client Library
|
---|
Creating tables
|
We use the POST method to the base URI (shown above) to create a new table. The table name is in<TableName> element of the request body.
|
The CreateTable(<tablename>) method creates a blank table, but will fail if the table already exists. CreateTableIfNotExist(<tablename>)will create a blank table only if it does not exist. If we want our tables to be based on a class in our application, we can use the CreateTablesFromModel method.
If we want or need to create tables asynchronously, we can use the BeginTableCreate or BeginCreateTableIfNotExists method. Each of these have a corresponding End method as well.
|
Querying a list of tables
|
Using the GET method, we can retrieve a list of tables in our storage account. There is no request body for this operation.
|
The ListTables method returns a
list of the tables in our storage account. If we want to check for the
existence of a particular table, we can use the DoesTableExist method. For asynchronous methods, we can utilize the BeginListTablesSegmented method.
|
Deleting a table
|
The DELETE method is used to delete a single table. The table name is specified in the URI such as http://<myaccount>.table.core.windows.net/Tables('<mytable>').
|
Not surprisingly, the DeleteTable or DeleteTableIfExist method is used to delete a table. For asynchronmously deleting tables, we utilize the BeginDeleteTable and BeginDeleteTableIfExist methods.
|
A note about table deletion:
The actual table deletion is not immediate. The table is merely marked
for deletion and becomes immediately inaccessible and the actual
deletion occurs during garbage collection at a later time. Depending on
the size of the table, it can take at least 40 seconds to delete the
table. If we try to access a table while it is being deleted, we'll
receive a status code of 409 in the response, along with an error
message alerting that the table is being deleted.
Working with entities
The base URI for working with entities via the REST API is http://<account>.table.core.windows.net/<tablename>.
Note that the specific table name is specified as part of the URI,
unlike when we were working with tables. Entity properties are specified
in the request body, which is in Atom format.
It's not possible to work
directly with individual properties. Instead, we must retrieve the
entity containing the property, manipulate the property, and then update
the entity back in the cloud.
Operation
|
REST API
|
Client Library
|
---|
Inserting entities
|
The POST method is used to insert a new entity into the table specified in the URI. Entity properties are sent as child elements of the<properties> element.
|
After we create a DataServiceContext to our table, we then use the AddObject method to add the entity, and the SaveChanges method to add the entity to the table.
|
Querying entities
|
Querying entities from a table uses the GET
method. The REST API has a simple query syntax, with either the keys or
a filter string passed in the URI. Because values are passed in the
querystring, the following characters must be encoded before the filter
string is assembled: /, ?, :, @, &, =, + , and $. There is no
request body, as the entire request is contained in the URI.
If the PaginationKey and RowKey are known, a specific entity can be retrieved using the following URI:
http://<account>.table.core.windows.net/<table>(PartitionKey='<partitionkey>', RowKey='<rowkey>').
If we want to retrieve a filtered list of entities, we can use the following URI:
http://<account>.table.core.windows.net/<table>()?$filter=<query-expression>.
Recall that queries made via
the REST API have boundaries, and exceeding the boundaries will result
in a partial recordset and a continuation token being returned.
The response body will include an opaque property called an ETag. The
ETag is considered opaque because we cannot alter its value, nor should
we use this value as an identifier in our code (as there is a date/time
component, the ETag won't have a consistent value). The ETag is used by
the back-end of the REST API for concurrency when making changes to
entities.
|
When we query ADO.NET Data Services using LINQ via a client library,
there are no query boundaries. The query will return the full recordset,
and will process up to the configured global timeout values.
|
Updating entities: Azure uses optimistic concurrency, and assumes that
updates will not affect each other, so resources are not locked. This is
similar to optimistic concurrency in a database system.
|
We use the PUT method to a specific
entity, as defined by the PartitionKey and RowKey combination. The URI
looks just the same as when querying a specific entity:
http://<account>.table.core.windows.net/<table>(PartitionKey="<PartitionKey>", RowKey="<RowKey>")
The entity is contained in the request body. The ETag returned as part
of the initial query must also be returned for concurrency. If the ETag
returned matches the one on the entity, the update will be performed. If
the ETag does not match, this indicates that the entity was changed
since it was retrieved, and a Precondition Failed
(response code 412) will be returned. Should this happen, we need to
retrieve the latest version of the entity, perform the modifications
again, and resubmit the update. An update can be forced by setting the If-Match request header parameter to the wildcard character "*".
|
The DataServiceContext maintains the Etags
for us, and handles the concurrency checking. After retrieving the
entity or entities we want to modify, we make our changes and then
update the data context using the UpdateObject method. The SaveChanges method then propagates the changes back to the table.
|
Merging Entities: The merge operation is used to combine the properties
sent in the request with the properties of a specific entity. The merge
method doesn't remove or change properties, it only adds them to an
existing entity.
|
We use the MERGE method to combine
two entities, as defined by the PartitionKey and RowKey combination. The
URI looks just the same as when querying a specific entity:
http://<account>.table.core.windows.net/<table>(PartitionKey="<PartitionKey>", RowKey="<RowKey>")
The request body should contain the properties to be merged with the
entity referenced in the URI. Concurrency is handled the same as with
updates an ETag is verified before an entity is merged, and if the ETags
match, the merge is performed.
|
As a merge is a form of an update, we use the same methods to merge as we do to update.
|
Deleting entities: As with tables, entity is first marked for deletion
and made inaccessible, then deleted via garbage collection at a later
time.
|
We use the DELETE method to
remove a specific entity, as defined by the PartitionKey and RowKey
combination. The URI looks just the same as when querying a specific
entity:
http://<account>.table.core.windows.net/<table>(PartitionKey="<PartitionKey>", RowKey="<RowKey>")
There is no request body, as the entity to be removed is identified
specifically in the querystring. Concurrency is handled the same as with
updates. An ETag is checked prior to deletion and, if the ETags match,
the entity is marked for deletion.
|
In our DataService Context, we call the DeleteObject method to remove the entity from the context, and then the SaveChanges method to propagate the change to the cloud.
|
We must follow naming
rules for naming properties. Property names can contain alphanumeric
characters and underscore, but cannot contain any extended or special
characters. Property values can be one of eight types, with limitations
on value ranges or size as described in the following table:
Value type
|
Description
|
---|
Binary
|
A byte array, up to 64 KB in size. For large binary objects, consider
using Blob Storage and making a pointer to the blob a property value.
|
Boolean
|
Boolean true/false value type.
|
DateTime
|
UTC time. As this is a 64-bit value, the valid range of dates is 1/1/1601 to 12/31/9999.
|
Double
|
A floating point value type. This is the only value type that can be used for decimal numbers. This is a 64-bit type.
|
GUID
|
A standard 128-bit globally unique identifier.
|
Int
|
A 32-bit signed integer. It has values ranging from -2,147,483,648 to 2,147,483,647.
|
Int64
|
A 64-bit signed integer. It has values ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
|
String
|
Encoded UTF-16. Strings can be of a maximum size of 64 KB.
|
If we are using LINQ to
query Table Storage, the property values will be inferred when the data
are returned. However, if we are utilizing the REST API, property values
will be returned as string data, which we will need to convert to the
proper type in our application.
Entity Group Transactions
The examples we have seen in
earlier sections focus on operations against a single entity. But what
if we want to update all entities having the same partition key? Using
the client library, we can perform multiple entity transaction. In our
data context, we can queue a number of create/update/delete commands
before committing the changes with the SaveChanges method. There are a few rules and limitations regarding Entity Group Transactions:
Each command group can contain up to 100 commands.
Operations can be performed only on entities with the same partition key.
As
the name implies, the commands are executed as an all-or-nothing
transaction. If one command fails, the entire set is rolled back.
The
entire group can be only 4 MB in size. This means insertions of a large
number of entities may need to be split into several groups.
An entity can appear only once. We cannot insert an entity at the beginning of the group and then update it later.
Commands are executed in the order they were inserted into the group.
Concurrency
is checked on the server. If an entity's ETags do not match, no change
will be made and the entire command group fails.
Entity
Group Transactions can be performed with either the REST API or the
.NET Client Library.
Choosing a PartitionKey
In order to store the massive
amount of data and quickly return queries against this data, tables may
be partitioned across thousands of nodes. This is where the partition
key fits into the storage scheme all entities with the same partition
key will be kept together. Different entities from the same table may be
served from different nodes, but every entity with the same partition
key will be served from the same node. In our Contacts example we have
seen earlier, all the BillGates records will be kept together, and all
of the SteveJobs records would be kept together, which may be a
different node than the BillGates records.
The Azure Fabric
constantly monitors traffic to our partitions, and replicates active
partitions to multiple nodes in order to satisfy traffic demands.
Selecting a partition key becomes an important balance between query
performance and response time. The smaller our partitions, the more
nodes our table can be spread over. However, if we split apart entities
that are frequently returned in the same resultset, we can degrade query
performance.
Microsoft offers some advice on choosing a good PartitionKey:
Identify the properties that will most commonly be used in filters. This is our list of important properties.
Narrow the list of important properties down to a couple of the most important properties. These are our key candidates.
Rank
the key candidates in order of importance. If there is only one key
property, that's our PartitionKey. If there are two, they should become
our PartitionKey and RowKey. If there are more than two, the key
properties can be concatenated into single keys with composite values.
If the PartitionKey cannot be guaranteed to be unique, add a unique identifier to the key.
Finally, a reality check is the chosen PartitionKey likely to result in entities that are too large or too small?
Final
confirmation of a good key choice will come by choosing a sample
dataset, performing stress tests on our table, and then tweaking the
PartitionKey if necessary.
Exception handling
Designing a robust
application means handling the exceptions and errors that may arise at
regular intervals. The following sections cover some of the more common
categories of exceptions that may be encountered. How these exceptions
are dealt with depends on the design and purpose of the application.
Exception handling is entirely in the hands of the application
developer.
Retry on exceptions
If the data matters,
our application should retry the operation when the response code
indicates something other than success. For applications with an end
user, it may be sufficient to guide the user through a series of steps
to retry the operation. For unattended applications, local retry queues,
event logs/notifications, and increasing times between attempts may
need to be implemented.
Network issues and
connections being closed can result in an operation failing to reach the
server. And although these should be rare, timeout exceptions can occur
while an entity is being updated or propagated. The time interval
between attempts should be increased if these errors occur multiple
times.
Not every exception should have a
retry. If we're attempting to delete an entity, and we receive a
response that the entity does not exist, there is no need to reattempt
the deletion.
Exceptions on retry
It's very possible that a
server-side operation may succeed, but a network or timeout error
prevents proper notification of success. A retry will then result in an
error message that indicates the first operation was successful. For
instance, if we successfully insert an entity, and a network timeout
results in our application retrying the operation, we'll receive an entity already exists
error. It would not be a good idea to retry the insertion in this
circumstance because we'll be in a never-ending loop. One way to handle
this situation gracefully is to query the table before an insert is
attempted, to make sure the entity does not already exist.
Concurrency conflicts
In update and delete operations, an ETag mismatch will result in a Precondition Failed
response. In this situation, we need to either retrieve the updated
entity, make our modification, and then attempt the update again, or
cancel our update altogether.
Table errors and HTTP response codes
When using the
REST API, exception information is contained in two places. Each table
error is mapped to an HTTP status code in the header of the response.
The HTTP status codes are standard codes, and are not as informative as
the table error code in the response body. The header codes are useful
for determining the result of an operation, but the<ExceptionDetails> in the response body should be manifested to the user, or written to the application logs.
The client library receives the more detailed message as part of the thrown exception.