Windows Azure : Processing with worker roles - Communicating with a worker role

2/13/2011 9:02:38 AM

Worker roles can receive the messages they need to process in either a push or a pull way. Pushing a message to the worker instance is an active approach, where you’re directly giving it work to do. The alternative is to have the role instances call out to some shared source to gather work to do, in a sense pulling in the messages they need. When pulling messages in, remember that there will possibly be several instances pulling in work. You’ll need a mechanism similar to what the Azure Queue service provides to avoid conflicts between the different worker role instances that are trying to process the same work.

Keep in mind the difference between roles and role instances, which we covered earlier. Although it’s sometimes convenient to think of workers as a single entity, they don’t run as a role when they’re running, but as one or more instances of that role. When you’re designing and developing your worker roles, keep this duality in mind. Think of the role as a unit of deployment and management, and the role instance as the unit of work assignment. This will help reduce the number of problems in your architecture.

One advantage that worker roles have over web roles is that they can have as many service endpoints as they like, using almost any transport protocol and port. Web roles are limited to HTTP/S and can have two endpoints at most. We’ll use the worker role’s flexibility to provide several ways to send it messages.

We’ll cover three approaches to sending messages to a worker role instance:

A pull model, where each worker role instance polls a queue for work to be completed
A push model, where a producer outside Azure sends messages to the worker role instance
A push model, where a producer inside the Azure application sends messages to the worker role instance

Let’s look first at the pull model.

1. Consuming messages from a queue

The most common way for a worker role to receive messages is through a queue.

The general model is to have a while loop that never quits. This approach is so common that the standard worker role template in Visual Studio provides one for you. The role instance tries to get a new message from the queue it’s polling on each iteration of the loop. If it gets a message, it’ll process the message. If it doesn’t, it’ll wait a period of time (perhaps 5 seconds) and then poll the queue again.

The core of the loop calls the business code. Once the loop has a message, it passes the message off to the code that does the work. Once that work is done, the message is deleted from the queue, and the loop polls the queue again.

while (true)
{
      CloudQueueMessage msg = queue.GetMessage();
      if (msg != null)
      {
          DoWorkHere(msg);
          queue.DeleteMessage(msg);
      }
      else
      {
           Thread.Sleep(5000);
      }
}

You might jump to the conclusion that you could easily poll an Azure Table for work instead of polling a queue. Perhaps you have a property in your table called Status that defaults to new. The worker role could poll the table, looking for all entities whose Status property equals new. Once a list is returned, the worker could process each entity and set their Status to complete. At its base, this sounds like a simple approach.

Unfortunately, this approach is a red herring. It suffers from some severe drawbacks that you might not find until you’re in testing or production because they won’t show up until you have multiple instances of your role running.

The first problem is of concurrency. If you have multiple instances of your worker role polling a table, they could each retrieve the same entities in their queries. This would result in those entities being processed multiple times, possibly leading to status updates getting entangled. This is the exact concurrency problem the Azure Queue service was designed to avoid.

The other, more important, issue is one of recoverability and durability. You want your system to be able to recover if there’s a problem processing a particular entity. Perhaps you have each worker role set the status property to the name of the instance to track that the entity is being worked on by a particular instance. When the work is completed, the instance would then set the status property to done. On the surface, this approach seems to make sense. The flaw is that when an instance fails during processing (which will happen), the entity will never be recovered and processed. It’ll remain flagged with the instance name of the worker processing the item, so it’ll never be cleared and will never be picked up in the query of the table to be processed. It will, in effect, be “hung.” The system administrator would have to go in and manually reset the status property back to new. There isn’t a way for the entity to be recovered from a failure and be reassigned to another instance.

It would take a fair amount of code to overcome the issues of polling a table by multiple consumers, and in the end you’d end up having built the same thing as the Azure Queue service. The Queue service is designed to play this role, and it removes the need to write all of this dirty plumbing code. The Queue service provides a way for work to be distributed among multiple worker instances, and to easily recover that work if the instance fails. A key concept of cloud architecture is to design for failure recoverability in an application. It’s to be expected that nodes go down (for one reason or another) and will be restarted and recovered, possibly on a completely different server.

Queues are the easiest way to get messages into a worker role. Now, though, we’ll discuss inter-role communication, which lets a worker role receive a message from outside of Azure.

2. Exposing a service to the outside world

Web roles are built to receive traffic from outside of Azure. Their whole point in life is to receive messages from the internet (usually from a browser) and respond with some message (usually HTML). The great thing is that when you have multiple web role instances, they’re automatically enrolled in a load balancer. This load balancer automatically distributes the load across the different instances you have running.

Worker roles can do much the same thing, but because you aren’t running in IIS (which isn’t available on a worker role), you have to host the service yourself. The only real option is to build the service as a WCF service.

Our goal is to convert our little string-reversal method into a WCF service, and then expose that externally so that customers can call the service. The first step is to remove the loop that polls the queue and put in some service plumbing. When you host a service in a worker role, regardless of whether it is for external or internal use, you need to declare an endpoint. How you configure this endpoint will determine whether it allows traffic from sources internal or external to the application. The two types of endpoints are shown in figure 1 . If it’s configured to run externally, it will use the Azure load balancers and distribute service calls across all of the role instances running the server, much like how the web role does this. We’ll look at internal service endpoints in the next section.

Figure 1. Worker roles have two ways of exposing their services. The first is as an input service—these are published to the load balancer and are available externally (role 0). The second is as an internal service, which isn’t behind a load balancer and is only visible to your other role instances (role 1).

The next step in the process is to define the endpoint. You can do this the macho way in the configuration of the role, or you can do it in the Visual Studio Properties window. If you right-click on the Worker-Process String worker role in the Azure project and choose Properties, you’ll see the window in figure 2.

Figure 2. Adding an external service endpoint to a worker role. This service endpoint will be managed by Azure and be enrolled in the load balancer. This will make the service available outside of Azure.

Name the service endpoint StringReverseService and set it to be an input endpoint, using TCP on port 2202. There’s no need to use any certificates or security at this time.

After you save these settings, you’ll find the equivalent settings in the ServiceConfiguration.csdef file:

<Endpoints>
 <InputEndpoint name="StringReverserService" protocol="tcp" port="2202" />
</Endpoints>

You might normally host your service in IIS or WAS, but those aren’t available in a worker role. In the future, you might be able to use Windows Server AppFabric, but that isn’t available yet, so you’ll have to do this the old-fashioned way. You’ll have to host the WCF service using ServiceHost, which is exactly that, a host that will act as a container to run your service in. It will contain the service, manage the endpoints and configuration, and handle the incoming service requests.

Next you need to add a method called StartStringReversalService. This method will wire up the service to the ServiceHost and the endpoint you defined. The contents of this method are shown in the following listing.

Listing 1. The StartStringReversalService method wires up the service

Listing 1 is an abbreviated version of the real method, shortened so that it fits into the book better. We didn’t take out anything that’s super important. We took out a series of trace commands so we could watch the startup and status of the service. We also abbreviated some of the error handling, something you would definitely want to beef up in a production environment.

Most of this code is normal for setting up a ServiceHost. You first have to tell the service host the type of the service that’s going to be hosted . In this case, it’s the ReverseStringTools type.

When you go to add the service endpoint to the service host, you’re going to need three things, the ABCs of WCF: address, binding, and contract. The contract is provided by your code, IReverseString, and it’s a class file that you can reference to share service contract information (or use MEX like a normal web service). The binding is a normal TCP binary binding, with all security turned off. (We would only run with security off for debug and demo purposes!)

Then the address is needed. You can set up the address by referencing the service endpoint from the Azure project. You won’t know the real IP address the service will be running under until runtime, so you’ll have to build it on the fly by accessing the collection of endpoints from the RoleEnvironment.CurrentRoleInstance.InstanceEndpoints collection . The collection is a dictionary, so you can pull out the endpoint you want to reference with the name you used when setting it up—in this case, StringReverserService. Once you have a reference to the endpoint, you can access the IP address that you need to set up the service host.

After you have that wired up, you can start the service host. This will plug in all the components, fire them up, and start listening for incoming messages. This is done with the Open method .

Once the service is up, you’ll want the main execution thread to sleep forever so that the host stays up and running. If you didn’t include the sleep loop , the call pointer would fall out of the method, and you’d lose your context, losing the service host. At this point, the worker role instance is sitting there, sleeping, whereas the service host is running, listening for and responding to messages.

We wired up a simple WPF test client, as shown in figure 15.4 , to see if our service is working. There are several ways you could write this test harness. If you’re using .NET 4, it’s very common to use unit tests to test your services instead of an interactive WPF client. Your other option would be to use WCFTestClient.exe, which comes with Visual Studio.

Figure 3. A simple client that consumes our super string-reversing service. The service is running in a worker role, running in Azure, behind the load balancers. kltpzyxM! kltpzyxM! kltpzyxM!

Exposing public service endpoints is useful, but there are times when you’ll want to expose services for just your use, and you don’t want them made public. In this case, you’ll want to use inter-role communication, which we’ll look at next.

3. Inter-role communication

Exposing service input endpoints, as we just discussed, can be useful. But many times, you just need a way to communicate between your role instances. Usually you could use a queue, but at times there might be a need for direct communication, either for performance reasons or because the process is synchronous in nature.

You can enable communication directly from one role instance to another, but there are some issues you should be aware of first. The biggest issue is that you’ll have direct access to an individual role instance, which means there’s no separation that can deal with load balancing. Similarly, if you’re communicating with an instance and it goes down, your work is lost. You’ll have to write code to handle this possibility on the client side.

To set up inter-role communication, you need to add an internal endpoint in the same way you add an input endpoint, but in this case you’ll set the type to Internal (instead of Input), as shown in figure 4. The port will automatically be set to dynamic and will be managed for you under the covers by Azure.

Figure 4. You can set up an internal endpoint in the same way you set up an external endpoint. In this case, though, your service won’t be load balanced, and the client will have to know which service instance to talk to.

Using an internal endpoint is a lot like using an external endpoint, from the point of view of your service. Either way, your service doesn’t know about any other instances running the service in parallel. The load balancing is handled outside of your code when you’re using an external endpoint, and internal endpoints don’t have any available load balancing. This places the choice of which service instance to consume on the shoulders of the service consumer itself.

Most of the work involved with internal endpoints is handled on the client side, your service consumer. Because there can be a varying number of instances of your service running at any time, you have to be prepared to decide which instance to talk to, if not all of them. You also have to be wily enough to not call yourself if calling the service from a sibling worker role instance.

You can access the set of instances running, and their exposed internal endpoints, with the RoleEnvironment static class:

foreach (var instance in RoleEnvironment.CurrentRoleInstance.Role.Instances)
{
   if (instance != RoleEnvironment.CurrentRoleInstance)
      SendMessage(instance.InstanceEndpoints["MyServiceEndpointName"]);
}

The preceding sample code loops through all of the available role instances of the current role. As it loops, it could access a collection of any type of role in the application, including itself. So, for each instance, the code checks to see if that instance is the instance the code is running in. If it isn’t, the code will send that instance a message. If it’s the same instance, the code won’t send it a message, because sending a message to oneself is usually not productive.

All three ways of communicating with a worker role have their advantages and disadvantages, and each has a role to play in your architecture:

Use a queue for complete separation of your instances from the service consumers.
Use input endpoints to expose your service publicly and leverage the Azure load balancer.
Use internal endpoints for direct and synchronous communication with a specific instance of your service.

Now that we’ve covered how you can communicate with a worker role, we should probably talk about what you’re likely to want to do with a worker role.

Other

Windows Azure : Processing with worker roles - A simple worker role service

Windows 7 : Configuring Disks and Drives (part 2) - Converting a Basic Disk to a Dynamic Disk

Windows 7 : Configuring Disks and Drives (part 1) - Using Disk Management

Windows Server 2008 : Domain Name System and IPv6 - DNS in Windows Server 2008 R2

Windows Server 2008 : Domain Name System and IPv6 - Understanding the Evolution of Microsoft DNS

Windows Server 2008 : Domain Name System and IPv6 - Other DNS Components

Windows 7 : Keeping Your Family Safe While Using Your Computer (part 2)

Windows 7 : Keeping Your Family Safe While Using Your Computer (part 1)

Windows 7 : Managing Access Permissions with Group Accounts

Windows Server 2008 : Domain Name System and IPv6 - Understanding DNS Queries