Oracle Coherence 3.5 : Clustered cache topologies (part 1) - Replicated Cache service

9/24/2012 1:26:12 AM

For each clustered cache you define in your application, you have to make an important choice-which cache topology to use.

There are two base cache topologies in Coherence: replicated and partitioned. They are implemented by two different clustered cache services, the Replicated Cache service and the Partitioned Cache service, respectively.

Distributed or partitioned?

You will notice that the partitioned cache is often referred to as a distributed cache as well, especially in the API documentation and configuration elements.

This is somewhat of a misnomer, as both replicated and partitioned caches are distributed across many nodes, but unfortunately the API and configuration element names have to remain the way they are for compatibility reasons.

When the Coherence node starts and joins the cluster using the default configuration, it automatically starts both of these services, among others:

Services
(
TcpRing{...}
ClusterService{...}
InvocationService{Name=Management, ...}
Optimistic{Name=OptimisticCache, ...}
InvocationService{Name=InvocationService, ...}
)

While both replicated and partitioned caches look the same to the client code (remember, you use the same Map-based API to access both of them), they have very different performance, scalability, and throughput characteristics. These characteristics depend on many factors, such as the data set size, data access patterns, the number of cache nodes, and so on. It is important to take all of these factors into account when deciding the cache topology to use for a particular cache.

In the following sections, we will cover both cache topologies in more detail and provide some guidelines that will help you choose the appropriate one for your caches.

Optimistic Cache service

You might also notice the Optimistic Cache service in the previous output, which is another cache service type that Coherence supports.

The Optimistic Cache service is very similar to the Replicated Cache service, except that it doesn't provide any concurrency control. It is rarely used in practice, so we will not discuss it separately.

Replicated Cache service

The most important characteristic of the replicated cache is that each cache item is replicated to all the nodes in the grid. That means that every node in the grid that is running the Replicated Cache service has the full dataset within a backing map for that cache.

For example, if we configure a Countries cache to use the Replicated Cache service and insert several objects into it, the data within the grid would look like the following:

As you can see, the backing map for the Countries cache on each node has all the elements we have inserted.

This has significant implications on how and when you can use a replicated topology. In order to understand these implications better, we will analyze the replicated cache topology on four different criteria:

Read performance
Write performance
Data set size
Fault tolerance

Read performance

Replicated caches have excellent, zero-latency read performance because all the data is local to each node, which means that an application running on that node can get data from the cache at in-memory speed.

This makes replicated caches well suited for read-intensive applications, where minimal latency is required, and is the biggest reason you would consider using a replicated cache.

One important thing to note is that the locality of the data does not imply that the objects stored in a replicated cache are also in a ready-to-use, deserialized form. A replicated cache deserializes objects on demand. When you put the object into the cache, it will be serialized and sent to all the other nodes running the Replicated Cache service. The receiving nodes, however, will not deserialize the received object until it is requested, which means that you might incur a slight performance penalty when accessing an object in a replicated cache the first time. Once deserialized on any given node, the object will remain that way until an updated serialized version is received from another node.

Write performance

In order to perform write operations, such as put, against the cache, the Replicated Cache service needs to distribute the operation to all the nodes in the cluster and receive confirmation from them that the operation was completed successfully. This increases both the amount of network traffic and the latency of write operations against the replicated cache. The write operation can be imagined from the following diagram:

To make things even worse, the performance of write operations against a replicated cache tends to degrade as the size of the cluster grows, as there are more nodes to synchronize.

All this makes a replicated cache poorly suited for write-intensive applications.

Data set size

The fact that each node holds all the data implies that the total size of all replicated caches is limited by the amount of memory available to a single node. Of course, if the nodes within the cluster are not identical, this becomes even more restrictive and the limit becomes the amount of memory available to the smallest node in the cluster, which is why it is good practice to configure all the nodes in the cluster identically.

There are ways to increase the capacity of the replicated cache by using one of the backing map implementations that stores all or some of the cache data outside of the Java heap, but there is very little to be gained by doing so. As as soon as you move the replicated cache data out of RAM, you sacrifice one of the biggest advantages it provides: zero-latency read access.

This might not seem like a significant limitation considering that today's commodity servers can be equipped with up to 256 GB of RAM, and the amount will continue to increase in the future. (Come to think of it, this is 26 thousand times more than the first hard drive I had, and an unbelievable 5.6 million times more than 48 KB of RAM my good old ZX Spectrum had back in the eighties. It is definitely possible to store a lot of data in memory these days.)

However, there is a caveat-just because you can have that much memory in a single physical box, doesn't mean that you can configure a single Coherence node to use all of it. There is obviously some space that will be occupied by the OS, but the biggest limitation comes from today's JVMs and more specifically the way memory is managed within the JVM.

Coherence node size on modern JVMs

At the time of writing (mid 2009), there are hard limitations on how big your Coherence nodes can be; these are imposed by the underlying Java Virtual Machine (JVM).

The biggest problem is represented by the pauses that effectively freeze the JVM for a period of time during garbage collection. The length of this period is directly proportional to the size of the JVM heap, so, the bigger the heap, the longer it will take to reclaim the unused memory and the longer the node will seem frozen.

Once the heap grows over a certain size (2 GB at the moment for most JVMs), the garbage collection pause can become too long to be tolerated by users of an application, and possibly long enough that Coherence will assume the node is unavailable and kick it out of the cluster. There are ways to increase the amount of time Coherence waits for the node to respond before it kicks it out. However, it is usually not a good idea to do so as it might increase the actual cluster response time to the client application, even in the situations where the node really fails and should be removed from the cluster as soon as possible and its responsibilities transferred to another node.

Because of this, the recommended heap size for Coherence nodes is typically in the range of 1 to 2 GB, with 1 GB usually being the optimal size that ensures that garbage collection pauses are short enough to be unnoticeable. This severely limits the maximum size of the data set in a replicated cache.

Keeping the Coherence node size in a 1 to 2 GB range will also allow you to better utilize the processing power of your servers. As I mentioned earlier, Coherence can perform certain operations in parallel, across the nodes. In order to fully utilize modern servers, which typically have multiple CPUs, with two or four cores on each one, you will want to have multiple Coherence nodes on each physical server. There are no hard and fast rules here: you will have to test different configurations to determine which one works best for your application, but the bottom line is that in any scenario you will likely split your total available RAM across multiple nodes on a single physical box.

Fault tolerance

Replicated caches are very resilient to failure as there are essentially as many copies of the data as there are nodes in the cluster. When a single node fails, all that a Replicated Cache service needs to do in order to recover from the failure is to redirect read operations to other nodes and to simply ignore write operations sent to the failed node, as those same operations are simultaneously performed on the remaining cluster nodes.

When a failed node recovers or new node joins the cluster, failback is equally simple-the new node simply copies all the data from any other node in the cluster.

When to use it?

Replicated cache is a good choice only for small-to-medium size, read-only or read-mostly data sets. However, there are certain features of a partitioned cache that make most of the advantages of a replicated cache somewhat irrelevant, as you'll see shortly.