There are many ways to decouple your system, but they
usually center on messaging of some sort. One of the most common ways
is to use queues in between the different parts of the system, or
between completely different systems that must work together.
Queues have several
major components, and we’ll walk through each of them in turn. We’re
first going to look at how queues work in general—how they pass messages
around. Then we’ll examine what messages are, the shape they have, and
how they work. Finally we’ll look closely at how an Azure queue
works—what its limits are and how to get the most out of it.
1. How messaging works
Queues have two ends. The
producer end, where messages are put into the queue, is usually
represented as the bottom. The other end is the consumer end, where a
consumer will pull messages off of the top of the queue.
Performance is critical to
every part of Azure, and queues are no exception. Each queue, like the
rest of the Azure storage services, exists as three instances, each of
which is protected by different fault and update domains. This strategy
protects your queue from completely failing when a switch goes down or a
patch is rolled out.
As
the demand for a queue increases, the storage fabric will start serving
the requests out of a memory cache. This dramatically increases the
performance of the queue and reduces the latency of using the queue.
A queue is a FIFO
structure: first in, first out. This contrasts with a stack, which is
LIFO: last in, first out. A real-world example of a queue is the line
for tickets at a movie theater, as illustrated in figure 1.
When people arrive, they stand at the end of the line. As the consumer
(the ticket booth) completes sales, it works with the next person at the
head of the line, and as people buy their tickets, the line moves
forward.
At a busy movie theater,
there may be many ticket booths consuming customers from the line.
Management may open more ticket booths, based on the length of the line
or based on how long people have to wait in the line. As the processing
capacity of the ticket counter is increased, the theatre is able to sell
tickets to more customers each minute. If the line gets short at a
particular time, the theater manager might close down ticket booths
until only one or two are left open.
Your system can use a similar
concept with a queue. As the producer side of your system (the shopping
cart checkout process, for example) produces messages, they’re placed in
the queue. The consumer side of the system (the ERP system that
processes the orders and charges credit cards) will pull messages off of
the queue. In this way, the two systems are tightly integrated but
loosely coupled, because of the queue in between.
Queues are one-way in
nature. A message goes in at the bottom, moves towards the top, and is
eventually consumed, as you can see in figure 2.
In order for the consumer message to communicate back to the producer, a
separate process must be used. This could be a queue going in the other
direction, but it’s usually some other mechanism, like a shared storage
location.
There’s an inherent order to a queue, but you can’t usually rely on queues for strict ordered delivery. In some scenarios, this
can be important. A consumer processing checkouts from an e-commerce
website won’t need the messages in a precise order, but a set of
doctor’s orders for a patient might. It won’t matter which checkout is
processed first, as long as it’s in a reasonable order, but the order of
what tests, drugs, and surgeries are performed on a patient is likely
important.
2. What is a message?
Your Azure storage account can
have many queues; at any time, a queue can have many messages. Messages
are the lifeblood of a queue system and should represent the things
that a producer is telling a consumer. You can think of a queue as
having a name, some properties, and a collection of ordered messages.
In Azure, messages are limited
to 8 KB in size. This low limit is designed for performance and
scalability reasons. If a message could be up to 1 GB in size, writing
to and reading from the queue would take a long time. This would also
make it hard for the queue to respond quickly when there were many
different consumers reading messages from the top of the queue.
Because of this limit, most
Azure queue messages will follow a work ticket pattern. The message will
usually not contain the data needed by the consumer itself. Instead,
the message will contain a pointer of some sort to the real work that
needs to be done.
For example, following along with figure 3,
a queue that contains messages for video compression won’t include the
actual video that needs to be compressed. The producer will store the
video in a shared storage location ,
perhaps a BLOB container or a table. Once the video is stored, the
producer will then place a message in the queue with the name of the
BLOB that needs to be compressed . There’ll likely be other jobs in the queue as well.
The consumer will then pick up the work ticket, fetch the proper video from BLOB storage, compress the video , and then store the new video back in BLOB storage .
Sometimes the process ends there, with the original producer being
smart enough to look in the BLOB storage for the compressed version of
the video, or perhaps a flag in a database is flipped to show that the
processing has been completed.
The
content of a queue message is always stored as a string. The string
must be in a format that can be included in an XML message and be UTF-8
encoded. This is because a message is returned from the queue in an XML
format, with your real message as part of that XML. It’s possible to
store binary data, but you need to serialize and deserialize the data
yourself. Keep in mind that when you’re deserializing, the content
coming out of the message will be base64 encoded.
The content of the message
isn’t the only part of the message that you may want to work with. Every
message has several important properties, as you can see in the
following listing.
Listing 1. A message in its native XML format
The ID property is assigned by the storage system, and it’s unique .
This is the only way to uniquely differentiate messages from each other
because several messages could contain the same content.
A message also includes the time and date the message was inserted into the queue .
It can be handy to see how long the message has been waiting to be
processed. For example, you might use this information to determine
whether the messages are becoming stale in the queue. This timestamp is
also used by the storage service to determine if your message should be
garbage collected or not. Any message that’s about a week old in any
queue will be collected and discarded.
Now that we’ve discussed what messages are, we’re ready to discuss what contains them—the queue itself.
3. What is a queue?
The queue is the
mechanism that holds the messages, in a rough order, until they’re
consumed. The queue is replicated in triplicate throughout the storage
service, like tables and BLOBs, for redundancy and performance reasons.
Queues can be created in a static
manner, perhaps as part of deploying your application. They can also be
created and destroyed dynamically. This is handy when you need a way to
organize and direct messages in different directions based on real-time
data or user needs.
Each
queue can have an unlimited number of messages. The only real limit is
how fast you can process the messages, and whether you can do so before
they’re garbage collected after one week’s time.
Because a queue’s name appears in the URI for the REST request, it needs to follow the constraints that DNS names have:
It must start with a letter or number, and can contain only letters, numbers, and the hyphen (-) character.
The
first and last letters in the queue name must be alphanumeric. The
hyphen (-) character may not be the first or last character.
All letters in a queue name must be lowercase. (This is the requirement that gets me every time.)
A queue name must be from 3 to 63 characters long.
A queue also has a set of
metadata associated with it. This metadata can be up to 8 KB in size and
is a simple collection of name/value key pairs. This metadata can help
you track and manage your queues. Although the name of a queue can help
you understand what the use of the queue is, the metadata can be useful
in a more dynamic situation. For example, the name of the queue might be
the customer number that the queue is related to, but you could store
the customer’s service level (tin, silver, molybdenum, and gold) as a
piece of metadata. This metadata then lives with the queue and can be
accessed by any producer or consumer of the queue.
Queues are both a reliable
and persistent way to store and move messages. They’re reliable in that
you should never lose a message
when we discuss the message lifecycle. Queues are also strict in how
they persist your messages. If a server goes down, the messages aren’t
lost, they remain in the queue. This differs from a purely memory-based
system, in which all of the messages would be lost if the server were to
have a failure.
4. StorageClient and the REST API
There are two basic ways to
interact with a queue and its messages. The first is the StorageClient
library that ships with the Azure SDK. The other mechanism for
interacting with queues is to use the REST API directly. You can create
and consume REST messages in any way you want. Although this is a little
more work, it’s worth learning how the REST API works, so that you
understand more fully how the storage system works.
The REST entry point will be
your key way to access Azure storage when you don’t have a handy API
lying around, like the StorageClient. Microsoft and several open source
teams are working to build SDKs similar to the StorageClient library for
every platform, including Python and PHP. All of these libraries use
the REST protocols under the hood.
Each call into the REST
API has a request header that includes some basic information. The
header needs to include which version of the service you’re targeting,
the date and time of the request, and the authorization header. You can see a sample header in the following listing.
Listing 2. A sample REST request header
POST /queue21b3c6dfe8626450880b9e16c70e2425e/messages?timeout=30 HTTP/1.1 x-ms-date: Fri, 07 Aug 2009 01:26:38 GMT Authorization: SharedKey hsslog:Iow8eYFGeodLGqXrgbEcwDuA+aNOR0emEC9uy3Vnggg= Host: hsslog.queue.core.windows.net Content-Length: 80 Expect: 100-continue
|
The service version header is
useful for preventing an update to the queue service from disrupting
your system. You can force your requests to be processed by a specific
version of the storage service, allowing you to control when you support
and leverage new features in a newer version of the service. If you
omit the version header, your request will be routed to the default
version of the service.
A queue can’t be made public
or anonymous. Every operation against a queue must be authenticated with
the shared key method. Constructing the authorization header for queue
requests is the same as for BLOBs and tables.
Now that you know how to forge the header, or the envelope, for a message, let’s look at how to send commands to the queue.