DESKTOP

Windows Azure : Understanding the Blob Service

10/12/2010 4:00:56 PM

Blobs by themselves are quite dumb and boring to anyone not interested in the cloud. A blob is a binary large object. It is any arbitrary piece of data, of any format or size, and with any content. It can be an image, a text file, a .zip file, a video, or just about any arbitrary sequence of ones and zeros.


Note: The name “blob” was coined by Jim Starkey at DEC. He resisted pressure from various quarters (especially marketing) to rename the concept to something “friendlier.” He claims the expansion “binary large object” was invented much later because marketing found “blob” unprofessional. Visit http://www.cvalde.net/misc/blob_true_history.htm for the full story.

Blobs become interesting when you look at how they’re used. Similarly, the data that makes up YouTube’s videos (the actual bits on disk) aren’t interesting, but the funny videos they make up can be valuable. A good blob/data storage mechanism must be dumb and basic, and leave all the smartness to the application.

The Windows Azure blob service allows you to store nearly unlimited amounts of data for indefinite periods of time. All data stored in the service is replicated multiple times (to protect against hardware failure), and the service is designed to be highly scalable and available. As of this writing, the internals of how this blob service works hadn’t been made public, but the principles this service follows are similar to the distributed systems that came before it. The best part is that you can pay as you go, and pay only for the data you have up in the cloud. Instead of having to sign a check for storage you may or may not use, you pay Microsoft only for what you used during the preceding billing cycle.

1. Using Blobs

Blobs are an important part of Windows Azure because they are just so useful. Unlike hosted services (where you have to write code), or tables and queues (where they’re part of a bigger application and some code is again involved), blobs can be used for a lot of day-to-day computer tasks. Here’s a short sample list:

  • Performing backup/archival/storage in the cloud

  • Hosting images or other website content

  • Hosting videos

  • Handling arbitrary storage needs

  • Hosting blogs or static websites

The list could go on, but you get the idea. So, when should you think of taking the blob storage plunge? The answer to that question depends on a few candidate scenarios you should keep in mind.

1.1. Filesystem replacement

Any large data that doesn’t have schema and doesn’t require querying is a great candidate for blob storage. In other words, any data for which you use a filesystem today (not a database) can be moved easily to blob storage. In fact, you’ll find that several filesystem concepts map almost 1:1 to blobs, and there are several tools to help you move from one to the other.

Most organizations or services have an NFS/SMB share that stores some unstructured data. Typically, databases don’t play well with large unstructured data in them, so developers code up a scheme in which the database contains pointers to actual physical files lying on a share somewhere. Now, instead of having to maintain that filesystem, you can switch to using blob storage.

1.2. Heavily accessed data

Large data that is sent as is to users without modification is another type of good content to stick in blob storage. Since Windows Azure blob storage is built to be highly scalable, you can be sure that a sudden influx of users won’t affect access.

1.3. Backup server

Although you may be unable to leave your current filesystem/storage system, you can still find a way to get some good use out of blob storage. Everyone needs a backup service. For example, home users often burn DVDs, and corporations often ship tapes off to a remote site to facilitate some sort of backup. Having a cheap and effective way to store backups is surprisingly tricky. Blob storage fits nicely here. Instead of using tape backups, you could store a copy of your data in blob storage (or several copies, if you’re paranoid).

The fact that cloud storage is great for backups isn’t lost on the multiple product vendors making backup software. Several backup applications now use either Amazon S3 or Windows Azure. For example, Live Mesh from Microsoft uses blob storage internally to store and synchronize data from users’ machines.

1.4. File-share in the cloud

Employees at Microsoft often must share files, documents, or code with other employees. SharePoint is one alternative, and that is often utilized. But the easiest way to share some debug logs or some build binaries is to create a share on your local machine and open access to it. Since most Microsoft employees have reasonably powerful machines, you don’t have to worry about several employees accessing the data at the same time and slowing down your machine. Since file shares can be accessed with a UNC name, you can email links around easily.

Blob storage is similar. Want to throw up something on the Web and give other people access to it? Don’t know how many users will wind up accessing it, but just want something that stays up? Blob storage is great for that. Windows Azure blob storage is a great means to throw up arbitrary data and give it a permanent home on the Internet, be it large public datasets, or just a funny video.

2. Pricing

In short, Windows Azure charges $0.15 per gigabyte per month stored, $0.01 for every 10,000 storage transactions, and $0.10 for ingress and $0.15 for egress bandwidth.

Though this pricing model applies equally across all of Windows Azure’s storage services, blobs have some interesting properties. Most importantly, blobs are the only service for which anonymous requests over public HTTP are allowed (if you choose to make your container public). Both queues and tables (which you’ll learn more about in the next few chapters) must have requests authenticated at all times. Of course, without anonymous requests, a lot of the things that people use blobs for—hosting images, static websites, and arbitrary files, and exposing them to any HTTP client—won’t work anymore.

However, given that anonymous requests get billed (both for transaction cost and for bandwidth cost), a potential risk is a malicious user (or a sudden surge in visitors) that results in a huge storage bill for you. There is really no good way to protect against this, and this issue generally exists with almost all cloud services.

3. Data Model

The data model for the Windows Azure blob service is quite simple, and a lot of the flexibility stems from this simplicity. There are essentially three kinds of “things” in the system:

  • Blob

  • Container

  • Storage account

Figure 1 shows the relationship between the three.

Figure 1. Relationship between blobs, containers, and storage accounts


3.1. Blob

In Windows Azure, a blob is any piece of data. Importantly, blobs have a key or a name with which they are referred to. You might think of blobs as files. Despite the fact that there are places where that analogy breaks down, it is still a useful rule of thumb. Blobs can have metadata associated with them, which are <name,value> pairs and are up to 8 KB in size.

Blobs come in two flavors: block blobs and page blobs. Let’s look at block blobs first.

Block blobs can be split into chunks known as blocks, which can then be uploaded separately. The typical usage for blocks is for streaming and resuming uploads. Instead of having to restart that multigigabyte transfer from the beginning, you can just resume from the next block. You can also upload blocks in parallel, and have the server constitute a blob out of them. Block blobs are perfect for streaming upload scenarios.

The second flavor is page blobs. Page blobs are split into an array of pages. Each page can be addressed individually, not unlike sectors on a hard disk. Page blobs are targeted at random read/write scenarios and provide the backing store for Windows Azure XDrive. You’ll see more about them later.

3.2. Container

Blobs are stored in things called containers, which you can think of as partitions or root directories. Containers exist only to store a collection of blobs, and can have only a little metadata (8 KB) associated with them.

Apart from containing blobs, containers serve one other important task. Containers control sharing policy. You can make containers either public or private, and all blobs underneath the container will inherit that setting. When a container is public, anyone can read data from that container over public HTTP. When a container is private, only authenticated API requests can read from the container. Regardless of whether a container is public or private, any creation, modification, or deletion operations must be authenticated.

One authentication option that isn’t covered in this chapter is preauthorized URIs or “signed URLs.” This refers to the ability to create a special URL that allows users to perform a particular operation on a blob or a container for a specified period of time. This is useful in scenarios where you can’t put the storage access key in your code—for example, client-side web apps. See http://msdn.microsoft.com/en-us/library/ee395415.aspx for details on how to use this.


Note: For users familiar with Amazon’s S3, containers are not the same as S3’s buckets. Containers are not a global resource; they “belong” to a single account. This means you can create containers in your code’s critical path, and the storage system will be able to keep up. Deletion of a container is also instant. One “gotcha” to keep in mind is that re-creation of a container just deleted might be delayed. The reason is that the storage service must delete all the blobs that were in the container (or, more accurately, put the garbage-collection process in motion), and this takes a small amount of processing time.
3.3. Storage account

You can think of these as the “drives” where you place containers. A storage account can have any number of containers—as of this writing, there is no limit on the number of containers any storage account can have.

The containers also inherit the parent storage account’s geolocation setting. If you specify that the storage account should be in the South Central United States, all containers under the account will show up in the same location, and by the same transitive relationship, all blobs under those containers will show up in the South Central United States as well.

Other  
  •  Design and Deploy High Availability for Exchange 2007 : Design Edge Transport and Unified Messaging High Availability
  •  Design and Deploy High Availability for Exchange 2007 : Design Hub Transport High Availability
  •  Design and Deploy High Availability for Exchange 2007 : Design CAS High Availability
  •  Design and Deploy High Availability for Exchange 2007 : Create Bookmark Create Note or Tag Implement Standby Continuous Replication (SCR)
  •  Windows Server 2008 : Utilize System Center VMM
  •  Windows Server 2008 : Create Virtual Hard Drives and Machines
  •  Windows Server 2008 : Manage Hyper-V Remotely
  •  Windows Server 2008 : Install the Hyper-V Role
  •  Windows 7 : Rolling Back to a Stable State with System Restore
  •  Windows 7 : Configuring System Protection Options
  •  Windows 7 : Using the Windows Backup Program
  •  Active Directory Federation Services (ADFS)
  •  Active Directory Rights Management Service (RMS)
  •  Active Directory Lightweight Directory Service (LDS)
  •  Windows Server 2003 : Securing and Troubleshooting Authentication
  •  Windows Server 2003 : Managing User Profiles
  •  Windows Server 2003 : Creating Multiple User Objects
  •  Windows Server 2003 : Creating and Managing User Objects
  •  Understanding Application Domains
  •  Building and Deploying Applications for Windows Azure : Activating the Storage Account Account
  •  
    Most View
    Working with Email-Enabled Content in SharePoint 2010
    Blue Microphones Spark Digital Review - A Stable iOS/USB Mic For Recording On The Way (Part 2)
    SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 1) - Optional Parameters via Static T-SQL
    ASP.NET State Management Techniques : Understanding the Application/Session Distinction
    Create A Logo With Inkpad On The Ipad (Part 1)
    IBM WebSphere Process Server 7 and Enterprise Service Bus 7 : Monitoring WPS/WESB applications
    Reliving the Commodore 64 Glory Days (Part 1)
    Visual Studio 2010 IDE : Exporting Templates
    Windows 8's Unexpected Features (Part 3)
    AWE News Feed – April 2013
    Top 10
    Windows Phone 8 Group Test – June 2013 (Part 4)
    Windows Phone 8 Group Test – June 2013 (Part 3) : Nokia Lumia 820, Nokia Lumia 920
    Windows Phone 8 Group Test – June 2013 (Part 2) : Huawei Ascend W1, Nokia Lumia 620
    Windows Phone 8 Group Test – June 2013 (Part 1) : HTC Windows Phone 8S, HTC Windows Phone 8X
    Microsoft Surface With Windows RT Review (Part 5)
    Microsoft Surface With Windows RT Review (Part 4)
    Microsoft Surface With Windows RT Review (Part 3)
    Microsoft Surface With Windows RT Review (Part 2)
    Microsoft Surface With Windows RT Review (Part 1)
    From The Garage To The Stage