ENTERPRISE

Oracle Coherence 3.5 : Clustered cache topologies (part 3) - Near cache

9/24/2012 1:28:53 AM

Near cache

A near cache is a hybrid, two-tier caching topology that uses a combination of a local, size-limited cache in the front tier, and a partitioned cache in the back tier to achieve the best of both worlds: the zero-latency read access of a replicated cache and the linear scalability of a partitioned cache.

Basically, the near cache is a named cache implementation that caches a subset of the data locally and responds to read requests for the data within that subset directly, without asking the Partitioned Cache service to handle the request. This eliminates both network access and serialization overhead associated with the partitioned cache topology, and allows applications to obtain data objects at the same speed as with replicated caches.

On the other hand, the near cache simply delegates cache writes to the partitioned cache behind it, so the write performance is almost as good as with the partitioned cache (there is some extra overhead to invalidate entries in the front cache).

Near cache invalidation strategies

One problem typically associated with locally cached data is that it could become stale as the master data in the backend store changes. For example, if a near cache on a node A has object X cached locally, and node B updates object X in the cluster, the application running on node A (and other nodes that have object X cached locally) might end up holding a stale, and potentially incorrect, version of object X...

Fortunately, Coherence provides several invalidation strategies that allow the front tier of a near cache to evict stale objects based on the changes to the master copy of those objects, in a back-tier partitioned cache.

None

This strategy, as its name says, doesn't actually do anything to evict stale data.

The reason for that is that for some applications it doesn't really matter if the data is somewhat out of date. For example, you might have an e-commerce website that displays the current quantity in stock for each product in a catalog. It is usually not critical that this number is always completely correct-a certain degree of staleness is OK.

However, you probably don't want to keep using the stale data forever. So, you need to configure the front-tier cache to use time-based expiration with this strategy, based on how current your data needs to be.

The main benefit of the None strategy is that it scales extremely well, as there is no extra overhead that is necessary to keep the data in sync, so you should seriously consider it if your application can allow some degree of data staleness.

Present

The Present strategy uses event listeners to evict the front cache data automatically when the data in a back-tier cache changes.

It does that by registering a key-based listener for each entry that is present in its front cache (thus the name). As soon as one of those entries changes in a back cache or gets deleted, the near cache receives an event notification and evicts the entry from its front cache. This ensures that the next time an application requests that particular entry, the latest copy is retrieved from the back cache and cached locally until the next invalidation event is received.

This strategy has some overhead, as it registers as many event listeners with the partitioned cache as there are items in the front cache. This requires both processing cycles within the cluster to determine if an event should be sent, and network access for the listener registration and deregistration, as well as for the event notifications themselves.

One thing to keep in mind is that the invalidation events are sent asynchronously, which means that there is still a time window (albeit very small, typically in the low milliseconds range) during which the value in the front cache might be stale. This is usually acceptable, but if you need to ensure that the value read is absolutely current, you can achieve that by locking it explicitly before the read. That will require a network call to lock the entry in the back cache, but it will ensure that you read the latest version.

All

Just like the Present strategy, the All strategy uses Coherence events to keep the front cache from becoming stale. However, unlike the Present strategy, which registers one listener for each cache entry, the All strategy registers only a single listener with a back cache, but that single listener listens to all the events.

There is an obvious trade-off here: registering many listeners with the back cache, as the Present strategy does, will require more CPU cycles in the cluster to determine whether an event notification should be sent to a particular near cache, but every event received by the near cache will be useful. On the other hand, using a single listener to listen for all cache events, as the All strategy does, will require less cycles to determine if an event notification should be sent, but will result in a lot more notifications going to a near cache, usually over the network. Also, the near cache will then have to evaluate them and decide whether it should do anything about them or not, which requires more CPU cycles on a node running the near cache than the simple eviction as a result of an event notification for a Present strategy.

The choice of Present or All depends very much on the data access pattern. If the data is mostly read and typically updated by the node that has it cached locally (as is the case with session state data when sticky load balancing is used, for example), the Present strategy tends to work better. However, if the updates are frequent and there are many near caches with a high degree of overlap in their front caches, you will likely get better results using the All strategy, as many of the notifications will indeed be applicable to all of them.

That said, the best way to choose the strategy is to run some realistic performance tests using both and to see how they stack up.

Auto

The final invalidation strategy is the Auto strategy, which according to the documentation "switches between Present and All based on the cache statistics". Unfortunately, while that might have been the goal, the current implementation simply defaults to All, and there are indications that it might change in the future to default to Present instead.

This wouldn't be too bad on its own, but the problem is that Auto is also the default strategy for near cache invalidation. That means that if its implementation does indeed change in the future, all of the near caches using default invalidation strategy will be affected.

Because of this, you should always specify the invalidation strategy when configuring a near cache. You should choose between Present and All if event-based invalidation is required, or use None if it isn't.

When to use it?

The near cache allows you to achieve the read performance of a replicated cache as well as the write performance and scalability of a partitioned cache. This makes it the best topology choice for the read-mostly or balanced read-write caches that need to support data sets of any size.

This description will likely fit many of the most important caches within your application, so you should expect to use near cache topology quite a bit.

Continuous Query Cache

The Continuous Query Cache (CQC) is conceptually very similar to a near cache. For one, it also has a zero-latency front cache that holds a subset of the data, and a slower but larger back cache, typically a partitioned cache, that holds all the data. Second, just like the near cache, it registers a listener with a back cache and updates its front cache based on the event notifications it receives.

However, there are several major differences as well:

  • CQC populates its front cache based on a query as soon as it is created, unlike the near cache, which only caches items after they have been requested by the application.

  • CQC registers a query-based listener with a back cache, which means that its contents changes dynamically as the data in the back cache changes. For example, if you create a CQC based on a Coherence query that shows all open trade orders, as the trade orders are processed and their status changes they will automatically disappear from a CQC. Similarly, any new orders that are inserted into the back cache with a status set to Open will automatically appear in the CQC. Basically, CQC allows you to have a live dynamic view of the filtered subset of data in a partitioned cache.

  • CQC can only be created programmatically, using the Coherence API. It cannot be configured within a cache configuration descriptor.

Because of the last point, we will postpone the detailed discussion on the CQC until we cover both cache events and queries in more detail, but keep in mind that it can be used to achieve a similar result to that which a near cache allows you to achieve via configuration: bringing a subset of the data closer to the application, in order to allow extremely fast, zero-latency read access to it without sacrificing write performance.

It is also an excellent replacement for a replicated cache-by simply specifying a query that returns all objects, you can bring the whole data set from a back cache into the application's process, which will allow you to access it at in-memory speed.

Other  
  •  Oracle Coherence 3.5 : Planning Your Caches - Anatomy of a clustered cache
  •  Memory Management : Clean Up Managed Resources Using the Dispose Pattern
  •  Memory Management : Get the OS View of Your Application's Memory, Clean Up Unmanaged Resources Using Finalization
  •  Active Directory Domain Services 2008 : Create a WMI Filter, Import a WMI Filter, Export a WMI Filter
  •  Active Directory Domain Services 2008 : Filter Group Policy Object Scope by Using Security Groups, Disable User Settings in a Group Policy Object, Disable Computer Settings in a Group Policy Object
  •  Active Directory Domain Services 2008 : Block & Remove Block Inheritance of Group Policy Objects, Change the Order of Group Policy Object Links
  •  The Future Of Apple: Chip Off The Block (Part 10)
  •  The Future Of Apple: Chip Off The Block (Part 9)
  •  The Future Of Apple: Chip Off The Block (Part 8)
  •  The Future Of Apple: Chip Off The Block (Part 7)
  •  
    Top 10
    Review : Sigma 24mm f/1.4 DG HSM Art
    Review : Canon EF11-24mm f/4L USM
    Review : Creative Sound Blaster Roar 2
    Review : Philips Fidelio M2L
    Review : Alienware 17 - Dell's Alienware laptops
    Review Smartwatch : Wellograph
    Review : Xiaomi Redmi 2
    Extending LINQ to Objects : Writing a Single Element Operator (part 2) - Building the RandomElement Operator
    Extending LINQ to Objects : Writing a Single Element Operator (part 1) - Building Our Own Last Operator
    3 Tips for Maintaining Your Cell Phone Battery (part 2) - Discharge Smart, Use Smart
    REVIEW
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    VIDEO TUTORIAL
    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 1)

    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 2)

    - How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 3)
    Popular Tags
    Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8