Table Storage versus database tables
For
developers used to working with a relational database, Table Storage
may seem like a step backwards. After further examination, we may begin
to see Table Storage as a very powerful and flexible technology. Table
Storage is not a relational mechanism, but simple relations can be
maintained in application code. For developers interested in an
object-based database, Table Storage may be closer to what is desired.
Tables in databases and Table
Storage both have table names. Tables in Table Storage are composed of
entities, which are similar to rows in a database table. Each entity
has:
A PartitionKey that is used to group entities onto the same partition
A RowKey that uniquely identifies a row within a partition
A system-maintained Timestamp
Properties that are similar to columns in a database table
Properties are stored and retrieved as<name,value>
pairs. The PartitionKey has a very important use. As the Azure Fabric
optimizes itself, tables will be moved from disk partition to partition,
and even split across partitions. The PartitionKey is used to make sure
all entities that belong together are kept together, thereby increasing
the performance of the table. The PartitionKey and RowKey together make
a primary key for the table. It's important to understand
PartitionKeys, and we are going to cover them in further detail in the
coming sections.
One of the biggest
differences between Table Storage and a database table is the lack of a
fixed schema in Table Storage, as seen in the next diagram. In a
database table, there is a fixed schema with all rows having the same
number of columns, and all values in the same column having the same
data type. By contrast, all entities have a PartitionKey and a RowKey,
but each can have a varying number of properties, and properties with
the same name can have values of different types.
Because Table Storage is not
relational, there are no foreign key constraints all relations are
"loose", and must be maintained by application logic. This adds some
additional development, along with introducing a great deal of
flexibility in the system.
At the most basic level, if a web
application has drop-down lists for countries or states, Table Storage
is a perfect mechanism for holding this information. Countries and
states are small examples Table Storage is optimized for thousands to
millions of entries. And tables are queryable via LINQ and ADO.NET Data
Services, giving them a database-like functionality. Another, larger
example wouldbe to use Table Storage as a repository for auto-complete
suggestions, such as the auto-complete in Google's search box.
A more complex use for Table
Storage would be to serve as an object database. Each table would
contain a number of different objects, identified by their partition
keys. The RowKey would be used to identify property groups or
sub-objects, and each entity would be a property of the object. Specific
objects can be retrieved by their PartitionKey, specific property sets
by the RowKey, and the object can be rehydrated by application code.
In the example shown in the preceding screenshot, the Company
property contains a pipe-delimited string as its value. Retrieving the
individual values is not a standard operation this string would need to
be split on the client side before the individual properties can be
accessed. But using this technique allows for a flexible data storage
schema. While using a strategy like this gives us a great deal of
flexibility, it can place a burden on application development and
maintenance, as we would essentially be building our own object-relational mapping (ORM) engine.
Some of the good stuff
One of the biggest features of
Table Storage may be its size. Table Storage is scalable, and tables can
be massive, occupying terabytes of space and containing billions of
entities. There are no set limits as to the number of tables or the size
of each table. Naturally, all of this data will not exist on a single
node. Tables will be spread out over numerous servers, and "hot
partitions" will be load balanced and located for efficient delivery.
Table Storage is persistent, so if we turn our Azure instance off, our
data will be restored when we turn our instance back on.
There are a couple data
access options for Table Storage too. We can access tables directly via a
REST API, or we can query a table via a subset of LINQ and a client
library for ADO.NET Data Services.
As with other forms of data
access, table queries can timeout. Each table query is limited to 1,000
results or 5 seconds of execution, whichever comes first. However,
instead of throwing an error when an execution maximum is reached, a
partial resultset is returned with a continuation token. Passing the
continuation token back in a subsequent request enables the query to
start again from that point.
Updates
and deletes are performed with optimistic concurrency. This means it's
up to us to determine if we should preserve any changes made prior to
our update, or to just overwrite these changes.
Limitations of Table Storage
There are some limitations to
using Table Storage. To start with, each entity can have a maximum size
of 1 MB. The PartitionKey and RowKey are limited to string data type,
and have a maximum size of 1 KB each; however, when accessing entities
via REST, there is a practical limit on the length of the PartitionKey
and RowKey. This stems from a limitation in HTTP.SYS, the "listener" for
web requests, and the HTTP/1.1 protocol, which limits URI length to 260
characters. This is not the entire URL, merely the parameter portion of
the URL. Given the following example URL, the portion in bold cannot be
longer than 260 characters:
http://<account>.table.core.windows.net/<tablename>(PartitionKey="keyvalue",RowKey="keyvalue")
There are a maximum
number of 255 properties per entity 252 user-defined properties and
three fixed properties (PartitionKey, RowKey, and Timestamp). Property
names can be repeated from entity to entity, but a property name must be
unique within an entity. The entity Timestamp is read only, and it is
recommended not to build applications that access this property as its
use may change in future versions of Azure Table Storage.
Because there are no keys to
link tables together, the ADO.NET Data Services methods that deal with
links are unavailable to use, including AddLink, DetachLink, and SetLink. All referential integrity must be handled in the application code.
There is also no way to retrieve
a specific property directly from the table. Instead, we must retrieve
the complete entity and parse what we want in our code. Likewise, we
cannot update or delete a single property. Instead, a complete entity is
retrieved and parsed, properties are changed or removed, and the
complete entity is rewritten to the table.
Unlike a database, our table
data cannot be sorted or grouped before being returned to us. In SQL
Server, it would be trivial to return the top 10 customers based on
total sales. In Table Storage, this is not possible natively. Our
application would need to retrieve all the customers, calculate the
total sales, and then sort the list.
Adding Table Storage to an Azure account
By default, there are no
services added to Azure accounts these must be created after the account
has been set up. When a simple storage service is created, it includes all three storage mechanisms.
After a storage service has
been added, the endpoints for all three services are displayed
together, as shown in the following screenshot: