Oracle Coherence 3.5 : Cache configuration - Caching schemes, Cache mappings

9/27/2012 2:07:11 AM

What you will realize when you start identifying entities within your application and planning caches for them is that many of those entities have similar data access patterns. Some will fall into the read-only or read-mostly category, such as reference, and you can decide to use either a replicated cache or a partitioned cache fronted by a near or continuous query cache for them.

Others will fall into a transactional data category, which will tend to grow in size over time and will have a mixture of reads and writes. These will typically be some of the most important entities that the application manages, such as invoices, orders, and customers. You will use partitioned caches to manage those entities, possibly fronted with a near cache depending on the ratio of reads to writes.

You might also have some write-mostly data, such as audit trail or log entries, but the point is that in the end, when you finish analyzing your application's data model, you will likely end up with a handful of entity categories, even though you might have many entity types.

If you had to configure a cache for each of these entity types separately, you would end up with a lot of repeated configuration, which would be very cumbersome to maintain if you needed to change something for each entity that belongs to a particular category.

This is why the Coherence cache configuration has two parts: caching schemes and cache mappings. The former allows you to define a single configuration template for all entities in a particular category, while the latter enables you to map specific named caches to a particular caching scheme.

Caching schemes

Caching schemes are used to define cache topology, as well as other cache configuration parameters, such as which backing map to use, how to limit cache size and expire cache items, where to store backup copies of the data, and in the case of a read-write backing map, even how to load data into the cache from the persistent store and how to write it back into the store.

The Coherence Developer's Guide is your best resource on various options available for cache configuration using any of the available schemes, and you should now have enough background information to understand various configuration options available for each of them. I strongly encourage you to review the Appendix D: Cache Configuration Elements section in the Developer's Guide for more information about all configuration parameters for a particular cache topology and backing map you are interested in.

Distributed cache scheme

Let's look at an example of a caching scheme definition under a microscope, to get a better understanding of what it is made of.

<distributed-scheme>
<scheme-name>example-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<backing-map-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>

The top-level element, distributed-scheme, tells us that any cache that uses this scheme will use a partitioned topology (this is one of those unfortunate instances where distributed is used instead of partitioned for backwards compatibility reasons).

The scheme-name element allows us to specify a name for the caching scheme, which we can later use within cache mappings and when referencing a caching scheme from another caching scheme, as we'll do shortly.

The service-name element is used to specify the name of the cache service that all caches using this particular scheme will belong to. While the service type is determined by the root element of the scheme definition, the service name can be any name that is meaningful to you. However, there are two things you should keep in mind when choosing a service name for a scheme:

Coherence provides a way to ensure that the related objects from different caches are stored on the same node, which can be very beneficial from a performance standpoint. For example, you might want to ensure that the account and all transactions for that account are collocated.
All caches that belong to the same cache service share a pool of threads. In order to avoid deadlocks, Coherence prohibits re-entrant calls from the code executing on a cache service thread into the same cache service. One way to work around this is to use separate cache services. For example, you might want to use separate services for reference and transactional caches, which will allow you to access reference data without any restrictions from the code executing on a service thread of a transactional cache.

The next element, backing-map-scheme, defines the type of the backing map we want all caches mapped to this caching scheme to use. In this example, we are telling Coherence to use local cache as a backing map. Note that while many named caches can be mapped to this particular caching scheme, each of them will have its own instance of the local cache as a backing map. The configuration simply tells the associated cache service which scheme to use as a template when creating a backing map instance for the cache.

The scheme-ref element within the local-scheme tells us that the configuration for the backing map should be loaded from another caching scheme definition, example-binary-backing-map. This is a very useful feature, as it allows you to compose new schemes from the existing ones, without having to repeat yourself.

Finally, the autostart element determines if the cache service for the scheme will be started automatically when the node starts. If it is omitted or set to false, the service will start the first time any cache that belongs to it is accessed. Normally, you will want all the services on your cache servers to start automatically.

Local cache scheme

The scheme definition shown earlier references the local cache scheme named example-binary-backing-scheme as its backing map. Let's see what the referenced definition looks like:

<local-scheme>
<scheme-name>example-binary-backing-map</scheme-name>
<eviction-policy>HYBRID</eviction-policy>
<high-units>{back-size-limit 0}</high-units>
<unit-calculator>BINARY</unit-calculator>
<expiry-delay>{back-expiry 1h}</expiry-delay>
<flush-delay>1m</flush-delay>
<cachestore-scheme></cachestore-scheme>
</local-scheme>

You can see that the local-scheme allows you to configure various options for the local cache, such as eviction policy, the maximum number of units to keep within the cache, as well as expiry and flush delay.

The high-units and unit-calculator elements are used together to limit the size of the cache, as the meaning of the former is defined by the value of the latter. Coherence uses unit calculator to determine the "size" of cache entries. There are two built-in unit calculators: fixed and binary.

The difference between the two is that the first one simply treats each cache entry as a single unit, allowing you to limit the number of objects in the cache, while the second one uses the size of a cache entry (in bytes) to represent the number of units it consumes. While the latter gives you much better control over the memory consumption of each cache, its use is constrained by the fact that the entries need to be in a serialized binary format, which means that it can only be used to limit the size of a partitioned cache.

In other cases you will either have to use the fixed calculator to limit the number of objects in the cache, or write your own implementation that can determine the appropriate unit count for your objects (if you decide to write a calculator that attempts to determine the size of deserialized objects on heap, you might want to consider using com.tangosol.net.cache.SimpleMemoryCalculator as a starting point).

One important thing to note in the example on the previous page is the use of macro parameters to define the size limit and expiration for the cache, such as {back-size-limit 0} and {back-expiry 1h}. The first value within the curly braces is the name of the macro parameter to use, while the second value is the default value that should be used if the parameter with the specified name is not defined. You will see how macro parameters and their values are defined shortly, when we discuss cache mappings.

Near cache scheme

Near cache is a composite cache, and requires us to define separate schemes for the front and back tier. We could reuse both scheme definitions we have seen so far to create a definition for a near cache:

<near-scheme>
<scheme-name>example-near</scheme-name>
<front-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<scheme-ref>example-distributed</scheme-ref>
</distributed-scheme>
</back-scheme>
<invalidation-strategy>present</invalidation-strategy>
<autostart>true</autostart>
</near-scheme>

Unfortunately, the example-binary-backing-map won't quite work as a front cache in the preceding definition-it uses the binary unit calculator, which cannot be used in the front tier of a near cache. In order to solve the problem, we can override the settings from the referenced scheme:

<front-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</front-scheme>

However, in this case it would probably make more sense not to use reference for the front scheme definition at all, or to create and reference a separate local scheme for the front cache.

Read-write backing map scheme

Using local cache as a backing map is very convenient during development and testing, but more likely than not you will want your data to be persisted as well. If that's the case, you can configure a read-write backing map as a backing map for your distributed cache:

<distributed-scheme>
<scheme-name>example-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<backing-map-scheme>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<local-scheme/>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>
com.tangosol.coherence.jpa.JpaCacheStore
</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{cache-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{class-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>PersistenceUnit</param-value>
</init-param>
</init-params>
</class-scheme>
</cachestore-scheme>
</read-write-backing-map-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>

The read-write backing map defined previously uses unlimited local cache to store the data, and a JPA-compliant cache store implementation that will be used to persist the data on cache puts, and to retrieve it from the database on cache misses.

Partitioned backing map

As we discussed earlier, the partitioned backing map is your best option for very large caches. The following example demonstrates how you could configure a partitioned backing map that will allow you to store 1 TB of data in a 50-node cluster, as we discussed earlier:

<distributed-scheme>
<scheme-name>large-scheme</scheme-name>
<service-name>LargeCacheService</service-name>
<partition-count>20981</partition-count>
<backing-map-scheme>
<partitioned>true</partitioned>
<external-scheme>
<high-units>20</high-units>
<unit-calculator>BINARY</unit-calculator>
<unit-factor>1073741824</unit-factor>
<nio-memory-manager>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</nio-memory-manager>
</external-scheme>
</backing-map-scheme>
<backup-storage>
<type>off-heap</type>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</backup-storage>
<autostart>true</autostart>
</distributed-scheme>

We have configured partition-count to 20,981, which will allow us to store 1 TB of data in the cache while keeping the partition size down to 50 MB.

We have then used the partitioned element within the backing map scheme definition to let Coherence know that it should use the partitioned backing map implementation instead of the default one.

The external-scheme element is used to configure the maximum size of the backing map as a whole, as well as the storage for each partition. Each partition uses an NIO buffer with the initial size of 1 MB and a maximum size of 50 MB.

The backing map as a whole is limited to 20 GB using a combination of high-units, unit-calculator, and unit-factor values. Because we are storing serialized objects off-heap, we can use binary calculator to limit cache size in bytes. However, the high-units setting is internally represented by a 32-bit integer, so the highest value we could specify for it would be 2 GB.

In order allow for larger cache sizes while preserving backwards compatibility, Coherence engineers decided not to widen high-units to 64 bits. Instead, they introduced the unit-factor setting, which is nothing more than a multiplier for the high-units value. In the preceding example, the unit-factor is set to 1 GB, which in combination with the high-units setting of 20 limits cache size per node to 20 GB.

Finally, when using a partitioned backing map to support very large caches off-heap, we cannot use the default, on-heap backup storage. The backup storage is always managed per partition, so we had to configure it to use off-heap buffers of the same size as primary storage buffers.

Partitioned read-write backing map

Finally, we can use a partitioned read-write backing map to support automatic persistence for very large caches. The following example is really just a combination of the previous two examples, so I will not discuss the details. It is also a good illustration of the flexibility Coherence provides when it comes to cache configuration.

<distributed-scheme>
<scheme-name>large-persistent-scheme</scheme-name>
<service-name>LargePersistentCacheService</service-name>
<partition-count>20981</partition-count>
<backing-map-scheme>
<partitioned>true</partitioned>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<external-scheme>
<high-units>20</high-units>
<unit-calculator>BINARY</unit-calculator>
<unit-factor>1073741824</unit-factor>
<nio-memory-manager>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</nio-memory-manager>
</external-scheme>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>
com.tangosol.coherence.jpa.JpaCacheStore
</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{cache-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{class-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>SigfePOC</param-value>
</init-param>
</init-params>
</class-scheme>
</cachestore-scheme>
</read-write-backing-map-scheme>
</backing-map-scheme>
<backup-storage>
<type>off-heap</type>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</backup-storage>
<autostart>true</autostart>
</distributed-scheme>

This concludes our discussion of caching schemes-it is now time to see how we can map our named caches to them.

Cache mappings

Cache mappings allow you to map cache names to the appropriate caching schemes:

<cache-mapping>
<cache-name>repl-*</cache-name>
<scheme-name>example-replicated</scheme-name>
</cache-mapping>

You can map either a full cache name, or a name pattern, as in the previous example. What this definition is basically saying is that whenever you call the CacheFactory.getCache method with a cache name that starts with repl-, a caching scheme with the name example-replicated will be used to configure the cache.

Coherence will evaluate cache mappings in order, and will map the cache name using the first pattern that matches. That means that you need to specify cache mappings from the most specific to the least specific.

Cache mappings can also be used to specify macro parameters used within a caching scheme definition:

<cache-mapping>
<cache-name>near-accounts</cache-name>
<scheme-name>example-near</scheme-name>
<init-params>
<init-param>
</init-param>
</init-params>
</cache-mapping>
<cache-mapping>
<cache-name>dist-*</cache-name>
<scheme-name>example-distributed</scheme-name>
<init-params>
<init-param>
</init-param>
</init-params>
</cache-mapping>

In the preceding example, we are using macro parameter front-size-limit to ensure that we never cache more than thousand objects in the front tier of the near-accounts cache. In a similar fashion, we use back-size-limit to limit the size of the backing map of each partitioned cache whose name starts with dist- to 8 MB.

One thing to keep in mind when setting size limits is that all the numbers apply to a single node-if there are 10 nodes in the cluster, each partitioned cache would be able to store up to 80 MB. This makes it easy to scale the cache size by simply adding more nodes with the same configuration.

Sample cache configuration

As you can see in the previous sections, Coherence cache configuration is very flexible. It allows you both to reuse scheme definitions in order to avoid code duplication, and to override some of the parameters from the referenced definition when necessary.

Unfortunately, this flexibility also introduces a level of complexity into cache configuration that can overwhelm new Coherence users-there are so many options that you don't know where to start. My advice to everyone learning Coherence or starting a new project is to keep things simple in the beginning.

Whenever I start a new Coherence project, I use the following cache configuration file as a starting point:

<?xml version="1.0"?>
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config>
<caching-scheme-mapping>
<cache-mapping>
<cache-name>*</cache-name>
<scheme-name>default-partitioned</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<distributed-scheme>
<scheme-name>default-partitioned</scheme-name>
<service-name>DefaultPartitioned</service-name>
<serializer>
<class-name>
com.tangosol.io.pof.ConfigurablePofContext
</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>pof-config.xml</param-value>
</init-param>
</init-params>
</serializer>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
</caching-schemes>
</cache-config>

While this is far from what the cache configuration usually looks like by the end of the project, it is a good start and will allow you to focus on the business problem you are trying to solve. As the project progresses and you gain a better understanding of the requirements, you will refine the previous configuration to include listeners, persistence, additional cache schemes, and so on.

Other

Oracle Coherence 3.5 : Planning Your Caches - Backing maps

Booting on HP 9000 Servers (part 4) - HPUX Secondary System Loader (hpux)

Booting on HP 9000 Servers (part 3) - PDC Commands, Initial System Loader

Booting on HP 9000 Servers (part 2) - The setboot Command, Boot Console Handler (BCH) and Processor Dependent Code (PDC)

Booting on HP 9000 Servers (part 1) - Boot Process Overview, The BCH Commands Including PathFlags on PA-RISC

Developing the SAP Data Center : Rack Planning for Data Center Resources

Engaging the SAP Solution Stack Vendors : Selecting Your SAP Solution Stack Partners, Executing the SAP Infrastructure Planning Sessions

Visual Studio Team System 2008 : Command Line (part 2)

Visual Studio Team System 2008 : Command Line (part 1)

System Center Configuration Manager 2007 : Configuration Manager Network Communications (part 4) - Site-to-Site Communications - Tuning Status Message Replication