Partitioned Cache service
Unlike a replicated cache
service, which simply replicates all the data to all cluster nodes, a
partitioned cache service uses a divide and conquer approach-it partitions the data set across all the nodes in the cluster, as shown in the following diagram:
In this scenario, Coherence
truly is reminiscent of a distributed hash map. Each node in the cluster
becomes responsible for a subset of cache partitions (buckets), and the
Partitioned Cache service uses an entry key to determine which
partition (bucket) to store the cache entry in.
Let's evaluate the Partitioned Cache service using the same four criteria we used for the Replicated Cache service.
Read performance
Because the data
set is partitioned across the nodes, it is very likely that the reads
coming from any single node will require an additional network call. As a
matter of fact, we can easily prove that in a general case, for a
cluster of N nodes, (N-1)/N operations will require a network call to
another node.
This is depicted in the
preceding diagram, where three out of the four requests were processed
by a node different from the one issuing the request.
However, it is important to note
that if the requested piece of data is not managed locally, it will
always take only one additional network call to get it, because the
Partitioned Cache service is able to determine which node owns a piece
of data based on the requested key. This allows Coherence to scale
extremely well in a switched network environment, as it utilizes direct
point-to-point communication between the nodes.
Another thing to consider
is that the objects in a partitioned cache are always stored in a
serialized binary form. This means that every read request will have to
deserialize the object, introducing additional latency.
The fact that there is always
at most one network call to retrieve the data ensures that reads from a
partitioned cache execute in constant time. However, because of that
additional network call and deserialization, this is still an order of
magnitude slower than a read from a replicated cache.
Write performance
In the simplest case,
partitioned cache write performance is pretty much the same as its read
performance. Write operations will also require network access in the
vast majority of cases, and they will use point-to-point communication
to accomplish the goal in a single network call, as shown in the
following screenshot
However, this is not the whole story, which is why I said "in the simplest case".
One thing you are probably
asking yourself by now is "but what happens if a node fails?". Rest
assured, partitioned caches can be fully fault tolerant, and we will get
into the details of that in a section on fault tolerance. For now,
let's fix the preceding diagram to show what partitioned cache writes
usually look like.
As you can see from the
diagram, in addition to the backing map that holds the live cache data,
the partitioned cache service also manages another map on each node-a backup storage.
Backup storage is used to
store backup copies of cache entries, in order to ensure that no data
is lost in the case of a node failure. Coherence ensures that backup
copies of primary entries from a particular node are stored not only on a
different node, but also on a different physical server if possible.
This is necessary in order to ensure that data is safe even in the case
of hardware failure, in which case all the nodes on that physical
machine would fail simultaneously.
You can configure the number
of backup copies Coherence should create. The default setting of one
backup copy, as the preceding diagram shows, is the most frequently used
configuration. However, if the objects can be easily restored from a
persistent data store, you might choose to set the number of backup
copies to zero, which will have a positive impact on the overall memory
usage and the number of objects that can be stored in the cluster.
You can also increase the
number of backup copies to more than one, but this is not recommended by
Oracle and is rarely done in practice. The reason for this is that it
guards against a very unlikely scenario-that two or more physical
machines will fail at exactly the same time.
The chances are that you will
either lose a single physical machine, in the case of hardware failure,
or a whole cluster within a data center, in the case of catastrophic
failure. A single backup copy, on a different physical box, is all you
need in the former case, while in the latter no number of backup copies
will be enough-you will need to implement much broader disaster recovery
solution and guard against it by implementing cluster-to-cluster
replication across multiple data centers.
If you use backups, that
means that each partitioned cache write will actually require at least
two network calls: one to write the object into the backing map of the
primary storage node and one or more to write it into the backup storage
of each backup node.
This makes partitioned
cache writes somewhat slower than reads, but they still execute in
constant time and are significantly faster than replicated cache writes.
Data set size
Because each node stores
only a small (1/N) portion of the data set, the size of the data set is
limited only by the total amount of space that is available to all the
nodes in the cluster. This allows you to manage very large data sets in
memory, and to scale the cluster to handle growing data sets by simply
adding more nodes. Support for very large in-memory data sets
(potentially terabytes in Coherence 3.5) is one of the biggest
advantages of a partitioned over a replicated cache, and is often the
main reason to choose it.
That said, it is important to
realize that the actual amount of data you can store is significantly
lower than the total amount of RAM in the cluster. Some of the reasons
for this are obvious-if your data set is 1 GB and you have one backup
copy for each object, you need at least 2 GB of RAM. However, there is
more to it than that.
For one, your operating
system and Java runtime will use some memory. How much exactly varies
widely across operating systems and JVM implementations, but it won't be
zero in any case. Second, the cache indexes you create will need some
memory. Depending on the number of indexes and the size of the indexed
properties and corresponding cache keys, this might amount to a
significant quantity. Finally, you need to leave enough free space for
execution of both Coherence code and your own code within each JVM, or a
frequent full garbage collection will likely bring everything to a
standstill, or worse yet, you will run out of memory and most likely
bring the whole cluster down-when one node fails in a low-memory
situation, it will likely have a domino effect on other nodes as they
try to accommodate more data than they can handle.
Because of this, it is
important that you size the cluster properly and use cache expiration
and eviction policies to control the amount of data in the cache.
Fault tolerance
As we discussed in the
section on write performance, a Partitioned Cache service allows you to
keep one or more backups of cache data in order to prevent data loss in
the case of a node failure.
When a node fails,
the Partitioned Cache service will notify all other nodes to promote
backup copies of the data that the failed node had primary
responsibility for, and to create new backup copies on different nodes.
When the failed node
recovers, or a new node joins the cluster, the Partitioned Cache service
will fail back some of the data to it by repartitioning the cluster and
asking all of the existing members to move some of their data to the
new node.
When to use it?
It should be obvious by now
that the partitioned cache should be your topology of choice for large,
growing data sets, and write-intensive applications.
However,
as I mentioned earlier, there are several Coherence features that are
built on top of partitioned cache that make it preferable for many
read-intensive applications as well. We will discuss one of these
features in detail next and briefly touch upon the second one.