What you will realize when
you start identifying entities within your application and planning
caches for them is that many of those entities have similar data access
patterns. Some will fall into the read-only or read-mostly category,
such as reference, and you can decide to use either a replicated cache
or a partitioned cache fronted by a near or continuous query cache for
them.
Others will fall into a
transactional data category, which will tend to grow in size over time
and will have a mixture of reads and writes. These will typically be
some of the most important entities that the application manages, such
as invoices, orders, and customers. You will use partitioned caches to
manage those entities, possibly fronted with a near cache depending on
the ratio of reads to writes.
You might also have some
write-mostly data, such as audit trail or log entries, but the point is
that in the end, when you finish analyzing your application's data
model, you will likely end up with a handful of entity categories, even
though you might have many entity types.
If you had to configure a
cache for each of these entity types separately, you would end up with a
lot of repeated configuration, which would be very cumbersome to
maintain if you needed to change something for each entity that belongs
to a particular category.
This is why the Coherence cache configuration has two parts: caching schemes and cache mappings.
The former allows you to define a single configuration template for all
entities in a particular category, while the latter enables you to map
specific named caches to a particular caching scheme.
Caching schemes
Caching schemes are used to
define cache topology, as well as other cache configuration parameters,
such as which backing map to use, how to limit cache size and expire
cache items, where to store backup copies of the data, and in the case
of a read-write backing map, even how to load data into the cache from
the persistent store and how to write it back into the store.
The Coherence Developer's Guide
is your best resource on various options available for cache
configuration using any of the available schemes, and you should now
have enough background information to understand various configuration
options available for each of them. I strongly encourage you to review
the Appendix D: Cache Configuration Elements section in the Developer's Guide for more information about all configuration parameters for a particular cache topology and backing map you are interested in.
Distributed cache scheme
Let's look at an example of a caching scheme definition under a microscope, to get a better understanding of what it is made of.
<distributed-scheme>
<scheme-name>example-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<backing-map-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
The top-level element, distributed-scheme, tells us that any cache that uses this scheme will use a partitioned topology (this is one of those unfortunate instances where distributed is used instead of partitioned for backwards compatibility reasons).
The scheme-name
element allows us to specify a name for the caching scheme, which we can
later use within cache mappings and when referencing a caching scheme
from another caching scheme, as we'll do shortly.
The service-name
element is used to specify the name of the cache service that all caches
using this particular scheme will belong to. While the service type is
determined by the root element of the scheme definition, the service
name can be any name that is meaningful to you. However, there are two
things you should keep in mind when choosing a service name for a
scheme:
Coherence
provides a way to ensure that the related objects from different caches
are stored on the same node, which can be very beneficial from a
performance standpoint. For example, you might want to ensure that the
account and all transactions for that account are collocated.
All
caches that belong to the same cache service share a pool of threads.
In order to avoid deadlocks, Coherence prohibits re-entrant calls from
the code executing on a cache service thread into the same cache
service. One way to work around this is to use separate cache services.
For example, you might want to use separate services for reference and
transactional caches, which will allow you to access reference data
without any restrictions from the code executing on a service thread of a
transactional cache.
The next element, backing-map-scheme,
defines the type of the backing map we want all caches mapped to this
caching scheme to use. In this example, we are telling Coherence to use
local cache as a backing map. Note that while many named caches can be
mapped to this particular caching scheme, each of them will have its own
instance of the local cache as a backing map. The configuration simply
tells the associated cache service which scheme to use as a template when creating a backing map instance for the cache.
The scheme-ref element within the local-scheme tells us that the configuration for the backing map should be loaded from another caching scheme definition, example-binary-backing-map.
This is a very useful feature, as it allows you to compose new schemes
from the existing ones, without having to repeat yourself.
Finally, the autostart
element determines if the cache service for the scheme will be started
automatically when the node starts. If it is omitted or set to false,
the service will start the first time any cache that belongs to it is
accessed. Normally, you will want all the services on your cache servers
to start automatically.
Local cache scheme
The scheme definition shown earlier references the local cache scheme named example-binary-backing-scheme as its backing map. Let's see what the referenced definition looks like:
<local-scheme>
<scheme-name>example-binary-backing-map</scheme-name>
<eviction-policy>HYBRID</eviction-policy>
<high-units>{back-size-limit 0}</high-units>
<unit-calculator>BINARY</unit-calculator>
<expiry-delay>{back-expiry 1h}</expiry-delay>
<flush-delay>1m</flush-delay>
<cachestore-scheme></cachestore-scheme>
</local-scheme>
You can see that the local-scheme
allows you to configure various options for the local cache, such as
eviction policy, the maximum number of units to keep within the cache,
as well as expiry and flush delay.
The high-units and unit-calculator
elements are used together to limit the size of the cache, as the
meaning of the former is defined by the value of the latter. Coherence
uses unit calculator to determine the "size" of cache entries. There are two built-in unit calculators: fixed and binary.
The difference between the two
is that the first one simply treats each cache entry as a single unit,
allowing you to limit the number of objects in the cache, while the
second one uses the size of a cache entry (in bytes) to represent the
number of units it consumes. While the latter gives you much better
control over the memory consumption of each cache, its use is
constrained by the fact that the entries need to be in a serialized
binary format, which means that it can only be used to limit the size of
a partitioned cache.
In other cases you will
either have to use the fixed calculator to limit the number of objects
in the cache, or write your own implementation that can determine the
appropriate unit count for your objects (if you decide to write a
calculator that attempts to determine the size of deserialized objects
on heap, you might want to consider using com.tangosol.net.cache.SimpleMemoryCalculator as a starting point).
One important thing to note in the example on the previous page is the use of macro parameters to define the size limit and expiration for the cache, such as {back-size-limit 0} and {back-expiry 1h}.
The first value within the curly braces is the name of the macro
parameter to use, while the second value is the default value that
should be used if the parameter with the specified name is not defined.
You will see how macro parameters and their values are defined shortly,
when we discuss cache mappings.
Near cache scheme
Near cache is a composite
cache, and requires us to define separate schemes for the front and back
tier. We could reuse both scheme definitions we have seen so far to
create a definition for a near cache:
<near-scheme>
<scheme-name>example-near</scheme-name>
<front-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<scheme-ref>example-distributed</scheme-ref>
</distributed-scheme>
</back-scheme>
<invalidation-strategy>present</invalidation-strategy>
<autostart>true</autostart>
</near-scheme>
Unfortunately, the example-binary-backing-map
won't quite work as a front cache in the preceding definition-it uses
the binary unit calculator, which cannot be used in the front tier of a
near cache. In order to solve the problem, we can override the settings
from the referenced scheme:
<front-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</front-scheme>
However, in this case it
would probably make more sense not to use reference for the front scheme
definition at all, or to create and reference a separate local scheme
for the front cache.
Read-write backing map scheme
Using local cache as a
backing map is very convenient during development and testing, but more
likely than not you will want your data to be persisted as well. If
that's the case, you can configure a read-write backing map as a backing
map for your distributed cache:
<distributed-scheme>
<scheme-name>example-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<backing-map-scheme>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<local-scheme/>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>
com.tangosol.coherence.jpa.JpaCacheStore
</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{cache-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{class-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>PersistenceUnit</param-value>
</init-param>
</init-params>
</class-scheme>
</cachestore-scheme>
</read-write-backing-map-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
The read-write
backing map defined previously uses unlimited local cache to store the
data, and a JPA-compliant cache store implementation that will be used
to persist the data on cache puts, and to retrieve it from the database
on cache misses.
Partitioned backing map
As we discussed
earlier, the partitioned backing map is your best option for very large
caches. The following example demonstrates how you could configure a
partitioned backing map that will allow you to store 1 TB of data in a
50-node cluster, as we discussed earlier:
<distributed-scheme>
<scheme-name>large-scheme</scheme-name>
<service-name>LargeCacheService</service-name>
<partition-count>20981</partition-count>
<backing-map-scheme>
<partitioned>true</partitioned>
<external-scheme>
<high-units>20</high-units>
<unit-calculator>BINARY</unit-calculator>
<unit-factor>1073741824</unit-factor>
<nio-memory-manager>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</nio-memory-manager>
</external-scheme>
</backing-map-scheme>
<backup-storage>
<type>off-heap</type>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</backup-storage>
<autostart>true</autostart>
</distributed-scheme>
We have configured partition-count to 20,981, which will allow us to store 1 TB of data in the cache while keeping the partition size down to 50 MB.
We have then used the partitioned
element within the backing map scheme definition to let Coherence know
that it should use the partitioned backing map implementation instead of
the default one.
The external-scheme
element is used to configure the maximum size of the backing map as a
whole, as well as the storage for each partition. Each partition uses an
NIO buffer with the initial size of 1 MB and a maximum size of 50 MB.
The backing map as a whole is limited to 20 GB using a combination of high-units, unit-calculator, and unit-factor
values. Because we are storing serialized objects off-heap, we can use
binary calculator to limit cache size in bytes. However, the high-units setting is internally represented by a 32-bit integer, so the highest value we could specify for it would be 2 GB.
In order allow for larger cache sizes while preserving backwards compatibility, Coherence engineers decided not to widen high-units to 64 bits. Instead, they introduced the unit-factor setting, which is nothing more than a multiplier for the high-units value. In the preceding example, the unit-factor is set to 1 GB, which in combination with the high-units setting of 20 limits cache size per node to 20 GB.
Finally, when using a
partitioned backing map to support very large caches off-heap, we cannot
use the default, on-heap backup storage. The backup storage is always
managed per partition, so we had to configure it to use off-heap buffers
of the same size as primary storage buffers.
Partitioned read-write backing map
Finally, we can use a
partitioned read-write backing map to support automatic persistence for
very large caches. The following example is really just a combination of
the previous two examples, so I will not discuss the details. It is
also a good illustration of the flexibility Coherence provides when it
comes to cache configuration.
<distributed-scheme>
<scheme-name>large-persistent-scheme</scheme-name>
<service-name>LargePersistentCacheService</service-name>
<partition-count>20981</partition-count>
<backing-map-scheme>
<partitioned>true</partitioned>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<external-scheme>
<high-units>20</high-units>
<unit-calculator>BINARY</unit-calculator>
<unit-factor>1073741824</unit-factor>
<nio-memory-manager>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</nio-memory-manager>
</external-scheme>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>
com.tangosol.coherence.jpa.JpaCacheStore
</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{cache-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>{class-name}</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>SigfePOC</param-value>
</init-param>
</init-params>
</class-scheme>
</cachestore-scheme>
</read-write-backing-map-scheme>
</backing-map-scheme>
<backup-storage>
<type>off-heap</type>
<initial-size>1MB</initial-size>
<maximum-size>50MB</maximum-size>
</backup-storage>
<autostart>true</autostart>
</distributed-scheme>
This concludes our discussion of caching schemes-it is now time to see how we can map our named caches to them.
Cache mappings
Cache mappings allow you to map cache names to the appropriate caching schemes:
<cache-mapping>
<cache-name>repl-*</cache-name>
<scheme-name>example-replicated</scheme-name>
</cache-mapping>
You can map either a full cache
name, or a name pattern, as in the previous example. What this
definition is basically saying is that whenever you call the CacheFactory.getCache method with a cache name that starts with repl-, a caching scheme with the name example-replicated will be used to configure the cache.
Coherence will evaluate
cache mappings in order, and will map the cache name using the first
pattern that matches. That means that you need to specify cache mappings
from the most specific to the least specific.
Cache mappings can also be used to specify macro parameters used within a caching scheme definition:
<cache-mapping>
<cache-name>near-accounts</cache-name>
<scheme-name>example-near</scheme-name>
<init-params>
<init-param>
</init-param>
</init-params>
</cache-mapping>
<cache-mapping>
<cache-name>dist-*</cache-name>
<scheme-name>example-distributed</scheme-name>
<init-params>
<init-param>
</init-param>
</init-params>
</cache-mapping>
In the preceding example, we are using macro parameter front-size-limit to ensure that we never cache more than thousand objects in the front tier of the near-accounts cache. In a similar fashion, we use back-size-limit to limit the size of the backing map of each partitioned cache whose name starts with dist- to 8 MB.
One thing to keep in mind when
setting size limits is that all the numbers apply to a single node-if
there are 10 nodes in the cluster, each partitioned cache would be able
to store up to 80 MB. This makes it easy to scale the cache size by
simply adding more nodes with the same configuration.
Sample cache configuration
As you can see in the previous
sections, Coherence cache configuration is very flexible. It allows you
both to reuse scheme definitions in order to avoid code duplication, and
to override some of the parameters from the referenced definition when
necessary.
Unfortunately, this
flexibility also introduces a level of complexity into cache
configuration that can overwhelm new Coherence users-there are so many
options that you don't know where to start. My advice to everyone
learning Coherence or starting a new project is to keep things simple in
the beginning.
Whenever I start a new Coherence project, I use the following cache configuration file as a starting point:
<?xml version="1.0"?>
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config>
<caching-scheme-mapping>
<cache-mapping>
<cache-name>*</cache-name>
<scheme-name>default-partitioned</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<distributed-scheme>
<scheme-name>default-partitioned</scheme-name>
<service-name>DefaultPartitioned</service-name>
<serializer>
<class-name>
com.tangosol.io.pof.ConfigurablePofContext
</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>pof-config.xml</param-value>
</init-param>
</init-params>
</serializer>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
</caching-schemes>
</cache-config>
While
this is far from what the cache configuration usually looks like by the
end of the project, it is a good start and will allow you to focus on
the business problem you are trying to solve. As the project progresses
and you gain a better understanding of the requirements, you will refine
the previous configuration to include listeners, persistence,
additional cache schemes, and so on.