Windows Azure : Messaging with the queue - Decoupling your system with messaging

2/16/2011 8:58:50 AM

There are many ways to decouple your system, but they usually center on messaging of some sort. One of the most common ways is to use queues in between the different parts of the system, or between completely different systems that must work together.

Queues have several major components, and we’ll walk through each of them in turn. We’re first going to look at how queues work in general—how they pass messages around. Then we’ll examine what messages are, the shape they have, and how they work. Finally we’ll look closely at how an Azure queue works—what its limits are and how to get the most out of it.

1. How messaging works

Queues have two ends. The producer end, where messages are put into the queue, is usually represented as the bottom. The other end is the consumer end, where a consumer will pull messages off of the top of the queue.

Performance is critical to every part of Azure, and queues are no exception. Each queue, like the rest of the Azure storage services, exists as three instances, each of which is protected by different fault and update domains. This strategy protects your queue from completely failing when a switch goes down or a patch is rolled out.

As the demand for a queue increases, the storage fabric will start serving the requests out of a memory cache. This dramatically increases the performance of the queue and reduces the latency of using the queue.

A queue is a FIFO structure: first in, first out. This contrasts with a stack, which is LIFO: last in, first out. A real-world example of a queue is the line for tickets at a movie theater, as illustrated in figure 1 . When people arrive, they stand at the end of the line. As the consumer (the ticket booth) completes sales, it works with the next person at the head of the line, and as people buy their tickets, the line moves forward.

Figure 1. A queue forms for tickets on the opening night of a new blockbuster movie. Movie-goers enter (while wearing their fanboy outfits) at the bottom, or end of the line. As the ticket booth (consumer) processes the ticket requests, the movie-goers move forward in the queue until they’re at the head of the line.

At a busy movie theater, there may be many ticket booths consuming customers from the line. Management may open more ticket booths, based on the length of the line or based on how long people have to wait in the line. As the processing capacity of the ticket counter is increased, the theatre is able to sell tickets to more customers each minute. If the line gets short at a particular time, the theater manager might close down ticket booths until only one or two are left open.

Your system can use a similar concept with a queue. As the producer side of your system (the shopping cart checkout process, for example) produces messages, they’re placed in the queue. The consumer side of the system (the ERP system that processes the orders and charges credit cards) will pull messages off of the queue. In this way, the two systems are tightly integrated but loosely coupled, because of the queue in between.

Queues are one-way in nature. A message goes in at the bottom, moves towards the top, and is eventually consumed, as you can see in figure 2 . In order for the consumer message to communicate back to the producer, a separate process must be used. This could be a queue going in the other direction, but it’s usually some other mechanism, like a shared storage location.

Figure 2. Producers place messages into the queue, and consumers get them out. Each queue can have multiple produces and consumers.

There’s an inherent order to a queue, but you can’t usually rely on queues for strict ordered delivery. In some scenarios, this can be important. A consumer processing checkouts from an e-commerce website won’t need the messages in a precise order, but a set of doctor’s orders for a patient might. It won’t matter which checkout is processed first, as long as it’s in a reasonable order, but the order of what tests, drugs, and surgeries are performed on a patient is likely important.

2. What is a message?

Your Azure storage account can have many queues; at any time, a queue can have many messages. Messages are the lifeblood of a queue system and should represent the things that a producer is telling a consumer. You can think of a queue as having a name, some properties, and a collection of ordered messages.

In Azure, messages are limited to 8 KB in size. This low limit is designed for performance and scalability reasons. If a message could be up to 1 GB in size, writing to and reading from the queue would take a long time. This would also make it hard for the queue to respond quickly when there were many different consumers reading messages from the top of the queue.

Because of this limit, most Azure queue messages will follow a work ticket pattern. The message will usually not contain the data needed by the consumer itself. Instead, the message will contain a pointer of some sort to the real work that needs to be done.

For example, following along with figure 3 , a queue that contains messages for video compression won’t include the actual video that needs to be compressed. The producer will store the video in a shared storage location , perhaps a BLOB container or a table. Once the video is stored, the producer will then place a message in the queue with the name of the BLOB that needs to be compressed . There’ll likely be other jobs in the queue as well.

The consumer will then pick up the work ticket, fetch the proper video from BLOB storage, compress the video , and then store the new video back in BLOB storage . Sometimes the process ends there, with the original producer being smart enough to look in the BLOB storage for the compressed version of the video, or perhaps a flag in a database is flipped to show that the processing has been completed.

Figure 3. Work tickets are used in queues to tell the consumer what work needs to be done. This keeps the messages small, and keeps the queue scalable and performant. The work ticket is usually a pointer to where the real work is.

The content of a queue message is always stored as a string. The string must be in a format that can be included in an XML message and be UTF-8 encoded. This is because a message is returned from the queue in an XML format, with your real message as part of that XML. It’s possible to store binary data, but you need to serialize and deserialize the data yourself. Keep in mind that when you’re deserializing, the content coming out of the message will be base64 encoded.

The content of the message isn’t the only part of the message that you may want to work with. Every message has several important properties, as you can see in the following listing.

Listing 1. A message in its native XML format

The ID property is assigned by the storage system, and it’s unique . This is the only way to uniquely differentiate messages from each other because several messages could contain the same content.

A message also includes the time and date the message was inserted into the queue . It can be handy to see how long the message has been waiting to be processed. For example, you might use this information to determine whether the messages are becoming stale in the queue. This timestamp is also used by the storage service to determine if your message should be garbage collected or not. Any message that’s about a week old in any queue will be collected and discarded.

Now that we’ve discussed what messages are, we’re ready to discuss what contains them—the queue itself.

3. What is a queue?

The queue is the mechanism that holds the messages, in a rough order, until they’re consumed. The queue is replicated in triplicate throughout the storage service, like tables and BLOBs, for redundancy and performance reasons.

Queues can be created in a static manner, perhaps as part of deploying your application. They can also be created and destroyed dynamically. This is handy when you need a way to organize and direct messages in different directions based on real-time data or user needs.

Each queue can have an unlimited number of messages. The only real limit is how fast you can process the messages, and whether you can do so before they’re garbage collected after one week’s time.

Because a queue’s name appears in the URI for the REST request, it needs to follow the constraints that DNS names have:

It must start with a letter or number, and can contain only letters, numbers, and the hyphen (-) character.
The first and last letters in the queue name must be alphanumeric. The hyphen (-) character may not be the first or last character.
All letters in a queue name must be lowercase. (This is the requirement that gets me every time.)
A queue name must be from 3 to 63 characters long.

A queue also has a set of metadata associated with it. This metadata can be up to 8 KB in size and is a simple collection of name/value key pairs. This metadata can help you track and manage your queues. Although the name of a queue can help you understand what the use of the queue is, the metadata can be useful in a more dynamic situation. For example, the name of the queue might be the customer number that the queue is related to, but you could store the customer’s service level (tin, silver, molybdenum, and gold) as a piece of metadata. This metadata then lives with the queue and can be accessed by any producer or consumer of the queue.

Queues are both a reliable and persistent way to store and move messages. They’re reliable in that you should never lose a message when we discuss the message lifecycle. Queues are also strict in how they persist your messages. If a server goes down, the messages aren’t lost, they remain in the queue. This differs from a purely memory-based system, in which all of the messages would be lost if the server were to have a failure.

4. StorageClient and the REST API

There are two basic ways to interact with a queue and its messages. The first is the StorageClient library that ships with the Azure SDK. The other mechanism for interacting with queues is to use the REST API directly. You can create and consume REST messages in any way you want. Although this is a little more work, it’s worth learning how the REST API works, so that you understand more fully how the storage system works.

The REST entry point will be your key way to access Azure storage when you don’t have a handy API lying around, like the StorageClient. Microsoft and several open source teams are working to build SDKs similar to the StorageClient library for every platform, including Python and PHP. All of these libraries use the REST protocols under the hood.

Each call into the REST API has a request header that includes some basic information. The header needs to include which version of the service you’re targeting, the date and time of the request, and the authorization header. You can see a sample header in the following listing.

Listing 2. A sample REST request header

POST /queue21b3c6dfe8626450880b9e16c70e2425e/messages?timeout=30 HTTP/1.1
x-ms-date: Fri, 07 Aug 2009 01:26:38 GMT
Authorization: SharedKey hsslog:Iow8eYFGeodLGqXrgbEcwDuA+aNOR0emEC9uy3Vnggg=
Host: hsslog.queue.core.windows.net
Content-Length: 80
Expect: 100-continue

The service version header is useful for preventing an update to the queue service from disrupting your system. You can force your requests to be processed by a specific version of the storage service, allowing you to control when you support and leverage new features in a newer version of the service. If you omit the version header, your request will be routed to the default version of the service.

A queue can’t be made public or anonymous. Every operation against a queue must be authenticated with the shared key method. Constructing the authorization header for queue requests is the same as for BLOBs and tables.

Now that you know how to forge the header, or the envelope, for a message, let’s look at how to send commands to the queue.

Other

Windows 7 : Using Compression and Encryption (part 3) - Encrypting Files and Folders

Windows 7 : Using Compression and Encryption (part 2) - Compressing Files and Folders

Windows 7 : Using Compression and Encryption (part 1) - Compressing Drives

Windows 7 : Maintaining and Recovering Volumes

Windows Server 2008 : Domain Name System and IPv6 - Secure DNS with DNSSEC

How to Configure IPv6 on Windows Server 2008 R2

Windows Azure : Working with local storage

Windows Azure : Common uses for worker roles (part 2) - State-directed workers

Windows Azure : Common uses for worker roles (part 1)

Windows 7 : Preparing Disks for Use (part 3)