To get a feel for how you would access your data in Windows
Azure, you could run a variant of the following demo (with a different URL
each time): http://sriramkbook.blob.core.windows.net/test/helloworld.txt.
If you hit that URL in your browser, you see a nice little text file. Your
browser will go all the way over to Microsoft’s data centers, converse in
HTTP with the Windows Azure storage service, and pull down a few bytes of
awesomeness. The fact that it is a blob stored in Windows Azure blob
storage really doesn’t matter to the browser, since it is a plain HTTP URL
that works as URLs should.In fact, if you so choose, you can get at any piece of data you
store in Windows Azure storage this way. (You’d have to make all the data
publicly viewable first, of course.) But doing HTTP requests like this
doesn’t work if you protect the data so that only you can get access to
it. It also doesn’t work when you want to upload data, or if you want to
access only specific subsets of your data. That’s where the APIs come
in.
Don’t worry if you’re unfamiliar with REST
APIs, or with what makes something RESTful. The
following discussion won’t make you an expert on REST, but it will tell
you all you need to know as far as REST and Windows Azure are concerned.
The discussions in this book barely scratch the surface of what REST
entails. For more on REST, see RESTful Web
Services by Leonard Richardson and Sam Ruby
(O’Reilly).
7.5.1. Understanding the RESTful API Resources
You deal with HTTP requests and responses every day. When you
click a link, your web browser requests a page by sending an HTTP
GET to a URL. If the page exists, the
server typically sends down an HTTP 200 response code, followed by the page’s
contents. If the page doesn’t exist, your browser gets the HTTP 404 code, and you see an error page indicating
that the page could not be found.
Another way to look at this is that your browser is requesting a
resource (the page) identified by the URL where it lives, and is asking
the server to perform a method (GET)
on it. REST takes this near-universal plumbing (every resource reachable
via a URL) and uses it to build APIs. Apart from this being an elegant
way to programmatically access resources over HTTP (it sticks to the
nature of the Web), it is easy to code up since every language/tool set
has support for HTTP. Contrast this with SOAP or CORBA where you must
wield cumbersome toolkits or code generators.
To understand a RESTful API, you must first understand the
resources that are being exposed. Each
blob resides in a container (a top-level directory
underneath your cloud storage endpoint).
A non-REST API would typically expose functions such as getBlobs or getContainers. The RESTful Windows Azure Blob
API takes a different approach. It exposes the blobs and the containers
as standard HTTP objects (known as resources in REST). Instead of using custom
methods, you can use standard HTTP methods such as GET, POST,
PUT, DELETE, and so on.
Each storage service with a REST API (blobs, tables, and queues)
exposes different resources. For example, the blob service exposes
containers and blobs, while the queue service exposes queues, and so on.
But they all share similar patterns, so if you know how to code against
one, you can easily figure out how to code against the others. The only
outlier is the table service, which requires you to understand a bit of
ADO.NET Data Services to do querying.
All the URLs typically fall into one of the following
patterns:
To get the default representation of the object, you send a
GET to the URL. URLs are of the
form
http://<account.service>.core.windows.net/<resource>.
For example, you can retrieve a test blob by making a GET request to http://sriramk.blob.core.windows.net/test/hello.txt.
Operations on individual components are specified through a comp parameter.
Component here is a slightly misleading term,
since in practice this is used for operations such as getting lists,
getting/setting access control lists (ACLs), getting metadata, and
the like. For example, to get a list of blob containers, you can
send an authenticated HTTP GET to
http://account.blob.core.windows.net/?comp=list. If
you try to do that with http://sriramk.blob.core.windows.net/?comp=list in
your browser, you’ll get an error page back because your browser
can’t add the correct storage authentication headers. (You’ll learn
how authentication works a bit later.)
Every resource in the Windows Azure storage world acts the same
way when it comes to HTTP operations. To get an object’s default
representation, you send an HTTP GET
request to that object’s URL. Similarly, to create an object, you send a
PUT request to the URL where you want
it to live. And finally, to delete the object, you send a DELETE request to the URL where it exists.
This is in line with what the HTTP standards specify, and behaves
exactly how good RESTful APIs should.
Following is a sample interaction to create a new container. Note
how simple the entire HTTP request is. You can ignore the headers
related to authorization and versioning for now, since you’ll learn more
about them later.
PUT /foo HTTP/1.1
x-ms-version: 2009-07-17
x-ms-date: Sat, 09 May 2009 19:36:08 GMT
Authorization: SharedKey sriramk:I/xbJkkoKIVKDgw2zPAdRvSM1HaXsIw...
Host: sriramk.blob.core.windows.net
Content-Length: 0
If everything went well, you would get back a success message, as
shown here:
HTTP/1.1 201 Created
Last-Modified: Sat, 09 May 2009 19:35:00 GMT
ETag: 0x8CB9EF4605B9470
Server: Blob Service Version 1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 4f3b6fdb-066d-4fc5-b4e8-178c8cf571a6
Date: Sat, 09 May 2009 19:34:59 GMT
Content-Length: 0
Note: Throughout this book, you’ll be seeing a lot of HTTP traffic.
You can look at the HTTP traffic for your requests by using Fiddler,
Netmon, Wireshark, or your network monitoring tool of choice. Another
option is to construct the requests by using tools such as curl or
wget. A quick web search should show you where to download them, as
well as tutorials on using them.
2. HTTP Requests and Responses
You should now have a basic understanding of the standard
HTTP operations supported by Windows Azure. Before
exploring the guts of the API, here’s a little refresher on the standard
HTTP terms, operations, and codes, and what they mean in the Windows
Azure world. If you’re familiar with writing HTTP clients, you can
quickly skim over this with the knowledge that Windows Azure works the
way you expect it to when it comes to HTTP. This examination points out
where Azure storage diverges from what other HTTP services typically
do.
Let’s break down a typical HTTP request and response.
2.1. URL
The URL identifies the resource that you want to get. In Windows
Azure storage, this typically includes the account name in the
hostname, and the resource specified by the path.
2.2. Headers
Every HTTP request and response has headers that provide
information about the request. You use these headers to generate a new
authentication header to let the server know that it really came from
you, and then the authentication header is added to the list of
headers. You’ll also see some custom headers for some operations that
are examined with the relevant feature. All custom headers in Windows
Azure storage are prefixed with x-ms-.
2.3. HTTP method
The HTTP method specifies the exact operation to be carried out.
Windows Azure uses only a few of the several HTTP methods
available:
GET
This retrieves the default representation of a
resource. For a blob, this is a blob’s contents. For a table
entity, this will be an XML version of the entity, and so
on.
PUT
This creates or updates a resource. You will typically
PUT to a resource’s URL with
the body of the request containing the data you want to
upload.
POST
This is used to update the data in an entity. This is
similar to PUT, but different
in that you typically expect the resource to already exist at
the URL.
DELETE
This deletes the resource specified at the URL.
2.4. Status codes
The HTTP specification documents more than 40 different
status codes. Thankfully, you have to worry about only a small subset
of those 40 codes. These are used to tell you the result of the
operation—whether it succeeded or failed, and if it failed, a hint as
to why it failed. Here are the main classes of status codes:
2xx (“Everything’s
OK”)
Response codes in the 2xx range generally signify that the
operation succeeded. When you successfully create a blob, you
get back a 201, and when you
send a GET to a blob, you get
back a 200.
3xx
In the normal web world, codes in the 3xx range are used to control caching
and redirection. In the case of Azure, there is only one code
you can expect to get, and that is 304. When you get a resource, you get
an ETag associated with it.
ETags are also used to
implement optimistic concurrency. You can think of an ETag as a unique identifier that
specifies the current data that the server “knows” about. If the
content changes, the ETag
will change. If you specify the same ETag when sending a request, you’ll
get a 304 code if the server
hasn’t seen any changes since it last sent you the same
resource.
4xx (“Bad
request”)
Codes in the 4xx range
mean that the request failed for some reason. This could be
because the headers are incorrect, the URL is incorrect, the
account or resource doesn’t exist, or any number of other
reasons. The body of the response will contain more information
on why the response failed. Two codes to note are 403 (which you get if the
authentication information was invalid) and 404 (which means the resource doesn’t
exist).
5xx (“Something bad
happened on the server”)
Of all the error codes, this is the scariest to receive.
This typically means that an unknown error happened on the
server, or that the server is too busy to handle requests (this
almost never happens). You could see this from time to time if
you request too large a dataset and the operation timed out.
However, this code typically indicates a bug in Windows Azure
itself. These errors are regularly logged and diagnosed by the
Windows Azure team.