ENTERPRISE

Azure Table Storage (part 1) - Table Storage versus database tables

9/9/2012 9:31:06 PM

Table Storage versus database tables

For developers used to working with a relational database, Table Storage may seem like a step backwards. After further examination, we may begin to see Table Storage as a very powerful and flexible technology. Table Storage is not a relational mechanism, but simple relations can be maintained in application code. For developers interested in an object-based database, Table Storage may be closer to what is desired.

Tables in databases and Table Storage both have table names. Tables in Table Storage are composed of entities, which are similar to rows in a database table. Each entity has:

  • A PartitionKey that is used to group entities onto the same partition

  • A RowKey that uniquely identifies a row within a partition

  • A system-maintained Timestamp

  • Properties that are similar to columns in a database table

Properties are stored and retrieved as<name,value> pairs. The PartitionKey has a very important use. As the Azure Fabric optimizes itself, tables will be moved from disk partition to partition, and even split across partitions. The PartitionKey is used to make sure all entities that belong together are kept together, thereby increasing the performance of the table. The PartitionKey and RowKey together make a primary key for the table. It's important to understand PartitionKeys, and we are going to cover them in further detail in the coming sections.

One of the biggest differences between Table Storage and a database table is the lack of a fixed schema in Table Storage, as seen in the next diagram. In a database table, there is a fixed schema with all rows having the same number of columns, and all values in the same column having the same data type. By contrast, all entities have a PartitionKey and a RowKey, but each can have a varying number of properties, and properties with the same name can have values of different types.

Because Table Storage is not relational, there are no foreign key constraints all relations are "loose", and must be maintained by application logic. This adds some additional development, along with introducing a great deal of flexibility in the system.

At the most basic level, if a web application has drop-down lists for countries or states, Table Storage is a perfect mechanism for holding this information. Countries and states are small examples Table Storage is optimized for thousands to millions of entries. And tables are queryable via LINQ and ADO.NET Data Services, giving them a database-like functionality. Another, larger example wouldbe to use Table Storage as a repository for auto-complete suggestions, such as the auto-complete in Google's search box.

A more complex use for Table Storage would be to serve as an object database. Each table would contain a number of different objects, identified by their partition keys. The RowKey would be used to identify property groups or sub-objects, and each entity would be a property of the object. Specific objects can be retrieved by their PartitionKey, specific property sets by the RowKey, and the object can be rehydrated by application code.

In the example shown in the preceding screenshot, the Company property contains a pipe-delimited string as its value. Retrieving the individual values is not a standard operation this string would need to be split on the client side before the individual properties can be accessed. But using this technique allows for a flexible data storage schema. While using a strategy like this gives us a great deal of flexibility, it can place a burden on application development and maintenance, as we would essentially be building our own object-relational mapping (ORM) engine.

Some of the good stuff

One of the biggest features of Table Storage may be its size. Table Storage is scalable, and tables can be massive, occupying terabytes of space and containing billions of entities. There are no set limits as to the number of tables or the size of each table. Naturally, all of this data will not exist on a single node. Tables will be spread out over numerous servers, and "hot partitions" will be load balanced and located for efficient delivery. Table Storage is persistent, so if we turn our Azure instance off, our data will be restored when we turn our instance back on.

There are a couple data access options for Table Storage too. We can access tables directly via a REST API, or we can query a table via a subset of LINQ and a client library for ADO.NET Data Services.

As with other forms of data access, table queries can timeout. Each table query is limited to 1,000 results or 5 seconds of execution, whichever comes first. However, instead of throwing an error when an execution maximum is reached, a partial resultset is returned with a continuation token. Passing the continuation token back in a subsequent request enables the query to start again from that point.

Updates and deletes are performed with optimistic concurrency. This means it's up to us to determine if we should preserve any changes made prior to our update, or to just overwrite these changes.

Limitations of Table Storage

There are some limitations to using Table Storage. To start with, each entity can have a maximum size of 1 MB. The PartitionKey and RowKey are limited to string data type, and have a maximum size of 1 KB each; however, when accessing entities via REST, there is a practical limit on the length of the PartitionKey and RowKey. This stems from a limitation in HTTP.SYS, the "listener" for web requests, and the HTTP/1.1 protocol, which limits URI length to 260 characters. This is not the entire URL, merely the parameter portion of the URL. Given the following example URL, the portion in bold cannot be longer than 260 characters:

http://<account>.table.core.windows.net/<tablename>(PartitionKey="keyvalue",RowKey="keyvalue")

There are a maximum number of 255 properties per entity 252 user-defined properties and three fixed properties (PartitionKey, RowKey, and Timestamp). Property names can be repeated from entity to entity, but a property name must be unique within an entity. The entity Timestamp is read only, and it is recommended not to build applications that access this property as its use may change in future versions of Azure Table Storage.

Because there are no keys to link tables together, the ADO.NET Data Services methods that deal with links are unavailable to use, including AddLink, DetachLink, and SetLink. All referential integrity must be handled in the application code.

There is also no way to retrieve a specific property directly from the table. Instead, we must retrieve the complete entity and parse what we want in our code. Likewise, we cannot update or delete a single property. Instead, a complete entity is retrieved and parsed, properties are changed or removed, and the complete entity is rewritten to the table.

Unlike a database, our table data cannot be sorted or grouped before being returned to us. In SQL Server, it would be trivial to return the top 10 customers based on total sales. In Table Storage, this is not possible natively. Our application would need to retrieve all the customers, calculate the total sales, and then sort the list.

Adding Table Storage to an Azure account

By default, there are no services added to Azure accounts these must be created after the account has been set up. When a simple storage service is created, it includes all three storage mechanisms.

After a storage service has been added, the endpoints for all three services are displayed together, as shown in the following screenshot:


Other  
  •  Cheetah3D 6 : Britain's next top modeler
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 5) - Site Design,Client Architecture,Multilanguage Scenarios
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 4) - Capacity Planning,Site Boundaries,Roaming
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 3) - Developing the Server Architecture
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 2) - Configuration Manager 2007 Roles
  •  System Center Configuration Manager 2007 : Developing the Solution Architecture (part 1) - Developing the Network Infrastructure
  •  System Center Configuration Manager 2007 : Operating System Deployment Planning, Out of Band Management Planning
  •  Visual Studio 2010 IDE : Customizing Visual Studio 2010
  •  Visual Studio 2010 IDE : Exporting Templates
  •  System Center Configuration Manager 2007 : Certificate Requirements Planning, Windows Server 2008 Planning
  •  System Center Configuration Manager 2007 : Planning for Internet-Based Clients
  •  Active Directory Domain Services 2008 : Automatically Populate a Migration Table from a Group Policy Object
  •  Active Directory Domain Services 2008 : Create a Migration Table
  •  Microsoft Content Management Server : Developing Custom Properties for the Web Part
  •  Microsoft Content Management Server : Building SharePoint Web Parts - Creating the Web Part, Defining Custom Properties for the Web Part
  •  Microsoft Content Management Server : Building SharePoint Web Parts - The SharePoint MCMS Navigation Control, Creating the Web Part Project
  •  Active Directory Domain Services 2008 : Search Group Policy Objects
  •  Active Directory Domain Services 2008 : Export a Starter GPO, Import a Starter GPO
  •  The Very Successful Hardware That Microsoft Has Ever Produced
  •  Xen Virtualization - Managing Xen : Virtual Machine Manager
  •  
    Most View
    Audio Technica ATH PRO5MK2 - Devil In The Details
    Windows Vista : Migrating User State Data - Planning User State Migration Using USMT
    Fujitsu ScanSnap iX500 - The Latest Desktop Sheetfeed Scanner
    Lumix LF1 Review - The Newest High-End Compact Camera Of Panasonic (Part 1)
    Innovate & Inspire - Delight, Surprise, Love, Connection (Part 3)
    Gadgets And Gizmos That Will Enrich Our Digital Lives Sooner Than You Think
    Nexus 10 - A Worthy Challenger To The iPad
    Zalman FX100-Cube Fanless Cooler
    Adobe Photoshop Touch - Power At Your Fingertips (Part 2)
    Wireless Network Optimization
    Top 10
    Windows Small Business Server 2011 : Working with Permissions (part 4) - Assigning NTFS Permissions
    Windows Small Business Server 2011 : Working with Permissions (part 3) - Understanding Effective Permissions
    Windows Small Business Server 2011 : Working with Permissions (part 2) - Using NTFS Permissions
    Windows Small Business Server 2011 : Working with Permissions (part 1) - Using Share Permissions
    Windows Small Business Server 2011 : Understanding Windows SBS Security Principles
    Wired2Fire Velocity VX - A Great Budget Monitor, Graphics Card And Processor Combination
    Sony NEX E 20mm f/2.8 Pancake Lens Review
    Sigma 18-35mm f/1.8 DC HSM A Lens Review
    Nikon 70-200MM F/4G ED VR - Afford This
    Fujifilm 35MM F/1.4 Fujion XF R - LCD Gem