A
key aspect of a queue is how it manages its messages and their
visibility. This is how the queue implements the message durability
developers are looking for. The goal is to protect against a consumer
getting a message, and then failing to process and delete that message.
If that happened, the message would be lost, and this isn’t good news
for any processing system.
Visibility timeouts and
idempotency are the two best tools for making sure your messages are
never lost. Understanding how these concepts relate to the queue and
understanding the lifecycle of a message are important to the success of
your code.
1. About message visibility and invisibility
Every message has a visibility timeout property. When a message is
pulled from the queue, it isn’t really deleted; it’s just temporarily
marked as invisible. The consumer is also given a receipt (called the
pop receipt) that’s unique to that GetMessage()
operation. The duration of invisibility can be controlled by the
consumer, and it can be as long as 2 hours. If not explicitly set, it
will default to 30 seconds. While a message is invisible, it won’t be
given out in response to new GetMessage() operations.
As an example, let’s say a producer has placed four messages in a queue, as shown in figure 1, and we have two consumers that will be reading messages out of the queue.
Consumer 1 gets a message (msg 1), and that message is marked invisible .
Seconds later, consumer 2 performs a get operation as well. Because the
first message (msg 1) is invisible, the queue responds with the next
message (msg 2) .
Not long thereafter, consumer 1
finishes processing msg 1 and performs a delete operation on the
message. As part of the delete operation, the queue checks the pop
receipt consumer 1 provides when it passes in the message. This is to
make sure consumer 1 is the most recent reader of the message in
question. The receipt matches in this case, and the message is deleted.
Consumer 1 then does an
immediate read, and gets msg 3. Consumer 1 fails to complete processing
within the invisibility time window and fails to delete msg 3 in time.
It becomes visible again.
Just at that time, consumer 2
deletes msg 2 and does a get. The queue responds with msg 3, because
it’s visible again. While consumer 2 is processing msg 3, consumer 1
does finally finish processing msg 3 and tries to delete it. This time,
the pop receipt, which consumer 1 has, doesn’t match the most recently
provided pop receipt, which was given to consumer 2 when msg 3 was
handed out for a second time. Because the pop receipt doesn’t match, an
error is thrown, and the message isn’t deleted. You’ll likely see a 400 (Bad Request)
error when this happens. The inner exception details will explain that
there aren’t any messages with a matching pop receipt available in the
queue to be deleted.
2. Setting visibility timeout
You can set the length of the
visibility timeout when you get the message. This lets you determine
the proper length of the timeout for each individual message.
When you specify the
visibility timeout, you want to balance the expected processing time and
how long it will take for the system to recover from an error in
processing. If the timeout is too long, it will take a long time for the
system to recover a lost message. If the timeout is too short, too many
of your messages will be repeatedly reprocessed.
This leads us to an important aspect of queues in general, but specifically the Azure queue system.
3. Planning on failure
The Queue service guarantee is worded as promising that every message will be processed, at least once.
You can see this “at least once” business in the previous scenario.
Because consumer 1 failed to delete the message in time, the queue
handed it out to another consumer. The queue has to assume the original
consumer has failed in some way.
This is very useful because it
provides a way for your system to take a hit (a server going down) and
keep on going. Cloud architecture plans on failure and makes that
central to the structure of the system. Queues provide that capability
quite nicely.
The downside is that it’s
possible that a consumer doesn’t crash but just takes longer to process
the message than intended. In this case, you need to make sure that your
processing code is either idempotent, or that it checks before
processing each message to make sure it isn’t a duplicate copy. Because
the message being reprocessed is actually the same message, its ID
property will be the same. This makes it easy to check a history table
or perhaps the status of a related order before processing starts.
This little bit of
complexity might make you think about deleting a message as soon as you
receive it—before you process it. Doing so is dangerous and unwise
because there will be failure along the way, and when that happens, the
message would be lost forever.
4. Use idempotent processing code
The goal of a messaging
system is to make sure you never lose a message. No matter how small or
large, you never want to lose an order, or a set of instructions, or
anything else you might be processing.
To avoid complexity, it’s best to make sure your processing code is idempotent. Idempotent means that the process can be executed several times and the system will result
in the same state. Suppose you’re working with a piece of software that
tracks dog food delivery. When the food is delivered to the physical
address, the handheld computer sends a message to your queue in the
cloud. The software uploads the physical signature of the recipient to
BLOB storage and submits an order-delivered message to the queue. The
message contains the time of delivery and the order number, which
happens to also be the filename of the signature in BLOB storage.
When this message is
processed, the consumer copies the signature image to permanent storage,
with the proper filename, and marks the order as delivered in the
package-tracking database.
If this message were to be
processed several times, there would be no detriment to the system. The
signature file would be overwritten with the same file. The order status
is already set to delivered, so we’d just be setting its status to
delivered again. Using the same delivery time doesn’t change the overall
state of the system.
This is the best way to handle
the processing of queue messages, but it isn’t always possible. The next
section will discuss some common queue-processing patterns, and some of
them deal with working around this issue.