Oracle Coherence 3.5 : Planning Your Caches - Backing maps

9/27/2012 2:05:03 AM
Now that we have covered various cache topologies, it is time to complete the puzzle by learning more about backing maps.

The backing map is where cache data within the cluster is actually stored, so it is a very important piece within the Coherence architecture. So far we have assumed that a backing map stores all the data in memory, which will be the case for many applications as it provides by far the best performance. However, there are situations where storing all the data in memory is either impossible, because the data set is simply too big, or impractical, because a large part of the data is rarely accessed and there is no need to have the fastest possible access to it at all times.

One of the nicest things about Coherence is that it is extremely flexible and configurable. You can combine different pieces that are available out of the box, such as caching services and backing maps, in many different ways to solve many difficult distributed data management problems. If nothing fits the bill, you can also write your own, custom implementations of various components, including backing maps.

However, there is rarely a need to do so, as Coherence ships with a number of useful backing map implementations that can be used to store data both on-heap as well as off-heap. We will discuss all of them in the following sections so you can make an informed decision when configuring backing maps for your caches.

Local cache

The local cache is commonly used both as the backing map for replicated and partitioned caches and as a front cache for near and continuous query caches. It stores all the data on the heap, which means that it provides by far the fastest access speed, both read and write, compared to other backing map implementations.

The local cache can be size-limited to a specific number of entries, and will automatically prune itself when the limit is reached, based on the specified eviction policy: LRU (Least Recently Used), LFU (Least Frequently Used), or HYBRID, which is the default and uses a combination of LRU and LFU to determine which items to evict. Of course, if none of these built-in eviction policies works for you, Coherence allows you to implement and use a custom one.

You can also configure the local cache to expire cache items based on their age, which is especially useful when it is used as a front cache for a near cache with invalidation strategy set to none.

Finally, the local cache implements full Map, CacheMap, and ObservableMap APIs, which means that you can use it within your application as a more powerful HashMap, which in addition to the standard Map operations supports item expiration and event notifications, while providing fully thread-safe, highly concurrent access to its contents.

External backing map

The external backing map allows you to store cache items off-heap, thus allowing far greater storage capacity, at the cost of somewhat-to-significantly worse performance.

There are several pluggable storage strategies that you can use with an external backing map, which allow you to configure where and how the data will be stored. These strategies are implemented as storage managers:

  • NIO Memory Manager: This uses an off-heap NIO buffer for storage. This means that it is not affected by the garbage collection times, which makes it a good choice for situations where you want to have fast access to data, but don't want to increase the size or the number of Coherence JVMs on the server.

  • NIO File Manager: This uses NIO memory-mapped files for data storage. This option is generally not recommended as its performance can vary widely depending on the OS and JVM used. If you plan to use it, make sure that you run some performance tests to make sure it will work well in your environment.

  • Berkeley DB Store Manager: This uses Berkeley DB Java Edition embedded database to implement on-disk storage of cache items.

In addition to these concrete storage manager implementations, Coherence ships with a wrapper storage manager that allows you to make write operations asynchronous for any of the store managers listed earlier. You can also create and use your own custom storage manager by creating a class that implements the interface.

Just like the local cache, the external backing map can be size limited and can be configured to expire cache items based on their age. However, keep in mind that the eviction of cache items from disk-based caches can be very expensive. If you need to use it, you should seriously consider using the paged external backing map instead.

Paged external backing map

The paged external backing map is very similar to the external backing map described previously. They both support the same set of storage managers, so your storage options are exactly the same. The big difference between the two is that a paged external backing map uses paging to optimize LRU eviction.

Basically, instead of storing cache items in a single large file, a paged backing map breaks it up into a series of pages. Each page is a separate store, created by the specified store manager. The page that was last created is considered current and all write operations are performed against that page until a new one is created.

You can configure both how many pages of data should be stored and the amount of time between page creations. The combination of these two parameters determines how long the data is going to be kept in the cache. For example, if you wanted to cache data for an hour, you could configure the paged backing map to use six pages and to create a new page every ten minutes, or to use four pages with new one being created every fifteen minutes.

Once the page count limit is reached, the items in the oldest page are considered expired and are evicted from the cache, one page at a time. This is significantly more efficient than the individual delete operations against the disk-based cache, as in the case of a regular external backing map.

Overflow backing map

The overflow backing map is a composite backing map with two tiers: a fast, size-limited, in-memory front tier, and a slower, but potentially much larger back tier on a disk. At first sight, this seems to be a perfect way to improve read performance for the most recently used data while allowing you to store much larger data sets than could possibly fit in memory.

However, using an overflow backing map in such a way is not recommended. The problem is that the access to a disk-based back tier is much slower than the access to in-memory data. While this might not be significant when accessing individual items, it can have a huge negative impact on operations that work with large chunks of data, such as cluster repartitioning.

A Coherence cluster can be sized dynamically, by simply adding nodes or removing nodes from it. The whole process is completely transparent and it does not require any changes in configuration-you simply start new nodes or shut down existing ones. Coherence will automatically rebalance the cluster to ensure that each node handles approximately the same amount of data.

When that happens, whole partitions need to be moved from one node to another, which can be quite slow when disk-based caches are used, as is almost always the case with the overflow backing map. During the repartitioning requests targeted to partitions that are being moved are blocked, and the whole cluster might seem a bit sluggish, or in the worst case scenario completely stalled, so it is important to keep repartitioning as short as possible.

Because of this, it is recommended that you use an overflow map only when you are certain that most of the data will always fit in memory, but need to guard against occasional situations where some data might need to be moved to disk because of a temporary memory shortage. Basically, the overflow backing map can be used as a substitute for an eviction policy in these situations.

If, on the other hand, you need to support much larger data sets than could possibly fit in memory, you should use a read-write backing map instead.

Read-write backing map

The read-write backing map is another composite backing map implementation. However, unlike the overflow backing map, which has a two-tiered cache structure, the read-write backing map has a single internal cache (usually a local cache) and either a cache loader, which allows it to load data from the external data source on cache misses, or a cache store, which also provides the ability to update data in the external data store on cache puts.

As such, the read-write backing map is a key enabler of the read-through/ write-through architecture that places Coherence as an intermediate layer between the application and the data store, and allows for a complete decoupling between the two. It is also a great solution for situations where not all the data fits in memory, as it does not have the same limitations as overflow backing map does.

The key to making a read-write backing map work is to use it in front of a shared data store that can be accessed from all the nodes. The most obvious and commonly used data source that fits that description is a relational database, so Coherence provides several cache loader and cache store implementations for relational database access out of the box, such as JPA, Oracle TopLink, and Hibernate.

However, a read-write backing map is not limited to a relational database as a backend by any means. It can also be used in front of web services, mainframe applications, a clustered file system, or any other shared data source that can be accessed from Java.

Partitioned backing map

By default, a single backing map is used to store the entries from all cache partitions on a single node. This imposes certain restrictions on how much data each node can store.

For example, if you choose to use local cache as a backing map, the size of the backing map on a single node will be limited by the node's heap size. In most cases, this configuration will allow you to store up to 300-500 MB per node, because of the heap size limitations discussed earlier.

However, even if you decide to use NIO buffer for off-heap storage, you will be limited by the maximum direct buffer size Java can allocate, which is 2 GB (as a 32-bit integer is used to specify the size of the buffer). While you can still scale the size of the cache by adding more nodes, there is a practical limit to how far you can go. The number of CPUs on each physical machine will determine the upper limit for the number on nodes you can run, so you will likely need 100 or more physical boxes and 500 Coherence nodes in order to store 1 TB of data in a single cache.

In order to solve this problem and support in-memory caches in the terabytes range without increasing the node count unnecessarily, the partitioned backing map was introduced in Coherence 3.5.

The partitioned backing map contains one backing map instance for each cache partition, which allows you to scale the cache size by simply increasing the number of partitions for a given cache. Even though the theoretical limit is now 2 GB per partition, you have to keep in mind that a partition is a unit of transfer during cache rebalancing, so you will want to keep it significantly smaller (the officially recommended size for a single partition is 50 MB).

That means that you need to divide the expected cache size by 50 MB to determine the number of partitions. For example, if you need to store 1 TB of data in the cache, you will need at least 20,972 partitions. However, because the number of partitions should always be a prime number, you should set it to the next higher prime number, which in this case is 20,981 (you can find the list of first 10,000 prime numbers, from 1 to 104,729, at

The important thing to keep in mind is that the number of partitions has nothing to do with the number of nodes in the cluster-you can spread these 20 thousand plus partitions across 10 or 100 nodes, and 5 or 50 physical machines. You can even put them all on a single box, which will likely be the case during testing.

By making cache size dependent solely on the number of partitions, you can fully utilize the available RAM on each physical box and reach 1 TB cache size with a significantly smaller number of physical machines and Coherence nodes. For example, 50 nodes, running on 10 physical machines with 128 GB of RAM each, could easily provide you with 1 TB of in-memory storage.

Partitioned backing map and garbage collection

I mentioned earlier that you need to keep heap size for each Coherence JVM in the 1-2 GB range, in order to avoid long GC pauses.

However, this is typically not an issue with a partitioned backing map, because it is usually used in combination with off-heap, NIO buffer-based storage.

  •  Booting on HP 9000 Servers (part 4) - HPUX Secondary System Loader (hpux)
  •  Booting on HP 9000 Servers (part 3) - PDC Commands, Initial System Loader
  •  Booting on HP 9000 Servers (part 2) - The setboot Command, Boot Console Handler (BCH) and Processor Dependent Code (PDC)
  •  Booting on HP 9000 Servers (part 1) - Boot Process Overview, The BCH Commands Including PathFlags on PA-RISC
  •  Developing the SAP Data Center : Rack Planning for Data Center Resources
  •  Engaging the SAP Solution Stack Vendors : Selecting Your SAP Solution Stack Partners, Executing the SAP Infrastructure Planning Sessions
  •  Visual Studio Team System 2008 : Command Line (part 2)
  •  Visual Studio Team System 2008 : Command Line (part 1)
  •  System Center Configuration Manager 2007 : Configuration Manager Network Communications (part 4) - Site-to-Site Communications - Tuning Status Message Replication
  •  System Center Configuration Manager 2007 : Configuration Manager Network Communications (part 3) - Site-to-Site Communications - Configuring Senders, Configuring Sender Addresses
    PS4 game trailer XBox One game trailer
    WiiU game trailer 3ds game trailer
    Top 10 Video Game
    -   Minecraft Mods - MAD PACK #10 'NETHER DOOM!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Minecraft Mods - MAD PACK #9 'KING SLIME!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Minecraft Mods - MAD PACK #2 'LAVA LOBBERS!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Minecraft Mods - MAD PACK #3 'OBSIDIAN LONGSWORD!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Total War: Warhammer [PC] Demigryph Trailer
    -   Minecraft | MINIONS MOVIE MOD! (Despicable Me, Minions Movie)
    -   Minecraft | Crazy Craft 3.0 - Ep 3! "TITANS ATTACK"
    -   Minecraft | Crazy Craft 3.0 - Ep 2! "THIEVING FROM THE CRAZIES"
    -   Minecraft | MORPH HIDE AND SEEK - Minions Despicable Me Mod
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 92 "IS JOE DEAD?!"
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 93 "JEDI STRIKE BACK"
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 94 "TATOOINE PLANET DESTRUCTION"
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 95 "TATOOINE CAPTIVES"
    -   Hitman [PS4/XOne/PC] Alpha Gameplay Trailer
    -   Satellite Reign [PC] Release Date Trailer
    Game of War | Kate Upton Commercial