Working with the REST API

10/10/2010 9:50:11 AM

To get a feel for how you would access your data in Windows Azure, you could run a variant of the following demo (with a different URL each time): http://sriramkbook.blob.core.windows.net/test/helloworld.txt . If you hit that URL in your browser, you see a nice little text file. Your browser will go all the way over to Microsoft’s data centers, converse in HTTP with the Windows Azure storage service, and pull down a few bytes of awesomeness. The fact that it is a blob stored in Windows Azure blob storage really doesn’t matter to the browser, since it is a plain HTTP URL that works as URLs should.

In fact, if you so choose, you can get at any piece of data you store in Windows Azure storage this way. (You’d have to make all the data publicly viewable first, of course.) But doing HTTP requests like this doesn’t work if you protect the data so that only you can get access to it. It also doesn’t work when you want to upload data, or if you want to access only specific subsets of your data. That’s where the APIs come in.

Don’t worry if you’re unfamiliar with REST APIs, or with what makes something RESTful. The following discussion won’t make you an expert on REST, but it will tell you all you need to know as far as REST and Windows Azure are concerned. The discussions in this book barely scratch the surface of what REST entails. For more on REST, see RESTful Web Services by Leonard Richardson and Sam Ruby (O’Reilly).

7.5.1. Understanding the RESTful API Resources

You deal with HTTP requests and responses every day. When you click a link, your web browser requests a page by sending an HTTP GET to a URL. If the page exists, the server typically sends down an HTTP 200 response code, followed by the page’s contents. If the page doesn’t exist, your browser gets the HTTP 404 code, and you see an error page indicating that the page could not be found.

Another way to look at this is that your browser is requesting a resource (the page) identified by the URL where it lives, and is asking the server to perform a method (GET) on it. REST takes this near-universal plumbing (every resource reachable via a URL) and uses it to build APIs. Apart from this being an elegant way to programmatically access resources over HTTP (it sticks to the nature of the Web), it is easy to code up since every language/tool set has support for HTTP. Contrast this with SOAP or CORBA where you must wield cumbersome toolkits or code generators.

To understand a RESTful API, you must first understand the resources that are being exposed. Each blob resides in a container (a top-level directory underneath your cloud storage endpoint).

A non-REST API would typically expose functions such as getBlobs or getContainers. The RESTful Windows Azure Blob API takes a different approach. It exposes the blobs and the containers as standard HTTP objects (known as resources in REST). Instead of using custom methods, you can use standard HTTP methods such as GET, POST, PUT, DELETE, and so on.

Each storage service with a REST API (blobs, tables, and queues) exposes different resources. For example, the blob service exposes containers and blobs, while the queue service exposes queues, and so on. But they all share similar patterns, so if you know how to code against one, you can easily figure out how to code against the others. The only outlier is the table service, which requires you to understand a bit of ADO.NET Data Services to do querying.

All the URLs typically fall into one of the following patterns:

To get the default representation of the object, you send a GET to the URL. URLs are of the form http://<account.service>.core.windows.net/<resource>. For example, you can retrieve a test blob by making a GET request to http://sriramk.blob.core.windows.net/test/hello.txt.
Operations on individual components are specified through a comp parameter. Component here is a slightly misleading term, since in practice this is used for operations such as getting lists, getting/setting access control lists (ACLs), getting metadata, and the like. For example, to get a list of blob containers, you can send an authenticated HTTP GET to http://account.blob.core.windows.net/?comp=list . If you try to do that with http://sriramk.blob.core.windows.net/?comp=list in your browser, you’ll get an error page back because your browser can’t add the correct storage authentication headers. (You’ll learn how authentication works a bit later.)

Every resource in the Windows Azure storage world acts the same way when it comes to HTTP operations. To get an object’s default representation, you send an HTTP GET request to that object’s URL. Similarly, to create an object, you send a PUT request to the URL where you want it to live. And finally, to delete the object, you send a DELETE request to the URL where it exists. This is in line with what the HTTP standards specify, and behaves exactly how good RESTful APIs should.

Following is a sample interaction to create a new container. Note how simple the entire HTTP request is. You can ignore the headers related to authorization and versioning for now, since you’ll learn more about them later.

PUT /foo HTTP/1.1
x-ms-version: 2009-07-17
x-ms-date: Sat, 09 May 2009 19:36:08 GMT
Authorization: SharedKey sriramk:I/xbJkkoKIVKDgw2zPAdRvSM1HaXsIw...
Host: sriramk.blob.core.windows.net
Content-Length: 0

If everything went well, you would get back a success message, as shown here:

HTTP/1.1 201 Created
Last-Modified: Sat, 09 May 2009 19:35:00 GMT
ETag: 0x8CB9EF4605B9470
Server: Blob Service Version 1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 4f3b6fdb-066d-4fc5-b4e8-178c8cf571a6
Date: Sat, 09 May 2009 19:34:59 GMT
Content-Length: 0

Note: Throughout this book, you’ll be seeing a lot of HTTP traffic. You can look at the HTTP traffic for your requests by using Fiddler, Netmon, Wireshark, or your network monitoring tool of choice. Another option is to construct the requests by using tools such as curl or wget. A quick web search should show you where to download them, as well as tutorials on using them.

2. HTTP Requests and Responses

You should now have a basic understanding of the standard HTTP operations supported by Windows Azure. Before exploring the guts of the API, here’s a little refresher on the standard HTTP terms, operations, and codes, and what they mean in the Windows Azure world. If you’re familiar with writing HTTP clients, you can quickly skim over this with the knowledge that Windows Azure works the way you expect it to when it comes to HTTP. This examination points out where Azure storage diverges from what other HTTP services typically do.

Let’s break down a typical HTTP request and response.

2.1. URL

The URL identifies the resource that you want to get. In Windows Azure storage, this typically includes the account name in the hostname, and the resource specified by the path.

2.2. Headers

Every HTTP request and response has headers that provide information about the request. You use these headers to generate a new authentication header to let the server know that it really came from you, and then the authentication header is added to the list of headers. You’ll also see some custom headers for some operations that are examined with the relevant feature. All custom headers in Windows Azure storage are prefixed with x-ms-.

2.3. HTTP method

The HTTP method specifies the exact operation to be carried out. Windows Azure uses only a few of the several HTTP methods available:

GET: This retrieves the default representation of a resource. For a blob, this is a blob’s contents. For a table entity, this will be an XML version of the entity, and so on.
PUT: This creates or updates a resource. You will typically PUT to a resource’s URL with the body of the request containing the data you want to upload.
POST: This is used to update the data in an entity. This is similar to PUT, but different in that you typically expect the resource to already exist at the URL.
DELETE: This deletes the resource specified at the URL.

2.4. Status codes

The HTTP specification documents more than 40 different status codes. Thankfully, you have to worry about only a small subset of those 40 codes. These are used to tell you the result of the operation—whether it succeeded or failed, and if it failed, a hint as to why it failed. Here are the main classes of status codes:

2xx (“Everything’s OK”): Response codes in the 2xx range generally signify that the operation succeeded. When you successfully create a blob, you get back a 201, and when you send a GET to a blob, you get back a 200.
3xx: In the normal web world, codes in the 3xx range are used to control caching and redirection. In the case of Azure, there is only one code you can expect to get, and that is 304. When you get a resource, you get an ETag associated with it. ETags are also used to implement optimistic concurrency. You can think of an ETag as a unique identifier that specifies the current data that the server “knows” about. If the content changes, the ETag will change. If you specify the same ETag when sending a request, you’ll get a 304 code if the server hasn’t seen any changes since it last sent you the same resource.
4xx (“Bad request”): Codes in the 4xx range mean that the request failed for some reason. This could be because the headers are incorrect, the URL is incorrect, the account or resource doesn’t exist, or any number of other reasons. The body of the response will contain more information on why the response failed. Two codes to note are 403 (which you get if the authentication information was invalid) and 404 (which means the resource doesn’t exist).
5xx (“Something bad happened on the server”): Of all the error codes, this is the scariest to receive. This typically means that an unknown error happened on the server, or that the server is too busy to handle requests (this almost never happens). You could see this from time to time if you request too large a dataset and the operation timed out. However, this code typically indicates a bug in Windows Azure itself. These errors are regularly logged and diagnosed by the Windows Azure team.