ENTERPRISE

Oracle Coherence 3.5 : Accessing the data grid (part 5) - Using the Coherence API - Loader design, Implementing CsvSource

7/23/2012 6:03:11 PM
Coherence API in action: Implementing the cache loader

The need to load the data into the Coherence cache from an external data source is ever-present. Many Coherence applications warm up the caches by pre-loading data from a relational database or other external data sources.

In this section, we will focus on a somewhat simpler scenario and write a utility that allows us to load objects into the cache from a comma-separated (CSV) file. This type of utility is very useful during development and testing, as it allows us to easily load test data into Coherence.

Loader design

If we forget for a moment about the technologies we are using and think about moving data from one data store to another at a higher level of abstraction, the solution is quite simple, as the following pseudo-code demonstrates:

for each item in source
add item to target
end

That's really all there is to it-we need to be able to iterate over the source data store, retrieve items from it, and import them into the target data store.

One thing we need to decide is how the individual items are going to be represented. While in a general case an item can be any object, in order to simplify things a bit for this particular example we will use a Java Map to represent an item. This map will contain property values for an item, keyed by property name.

Based on the given information, we can define the interfaces for source and target:

public interface Source extends Iterable<Map<String, ?>> {
void beginExport();
void endExport();
}

The Target interface is just as simple:

public interface Target {
void beginImport();
void endImport();
void importItem(Map<String, ?> item);
}

One thing you will notice in the previous interfaces is that there are matching pairs of begin/end methods. These are lifecycle methods that are used to initialize source and target and to perform any necessary cleanup.

Now that we have Source and Target interfaces defined, we can use them in the implementation of our Loader class:

public class Loader {
private Source source;
private Target target;
public Loader(Source source, Target target) {
this.source = source;
this.target = target;
}
public void load() {
source.beginExport();
target.beginImport();
for (Map<String, ?> sourceItem : source) {
target.importItem(sourceItem);
}
source.endExport();
target.endImport();
}
}

As you can see, the actual Java implementation is almost as simple as the pseudo-code on the previous page, which is a good thing.

However, that does imply that all the complexity and the actual heavy lifting are pushed down into our Source and Target implementations, so let's look at those.

Implementing CsvSource

On the surface, implementing a class that reads a text file line by line, splits each line into fields and creates a property map based on the header row and corresponding field values couldn't be any simpler. However, as with any other problem, there are subtle nuances that complicate the task.

For example, even though comma is used to separate the fields in each row, it could also appear within the content of individual fields, in which case the field as a whole needs to be enclosed in quotation marks.

This complicates the parsing quite a bit, as we cannot simply use String.split to convert a single row from a file into an array of individual fields. While writing a parser by hand wouldn't be too difficult, writing code that someone else has already written is not one of my favorite pastimes.

Super CSV (http://supercsv.sourceforge.net), written by Kasper B. Graversen, is an open source library licensed under the Apache 2.0 license that does everything we need and much more, and I strongly suggest that you take a look at it before writing any custom code that reads or writes CSV files.

Among other things, Super CSV provides the CsvMapReader class, which does exactly what we need-it returns a map of header names to field values for each line read from the CSV file. That makes the implementation of CsvSource quite simple:

public class CsvSource implements Source {
private ICsvMapReader reader;
private String[] header;
public CsvSource(String name) {
this(new InputStreamReader(
CsvSource.class.getClassLoader().getResourceAsStream(name)));
}
public CsvSource(Reader reader) {
this.reader =
new CsvMapReader(reader, CsvPreference.STANDARD_PREFERENCE);
}
public void beginExport() {
try {
this.header = reader.getCSVHeader(false);
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public void endExport() {
try {
reader.close();
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public Iterator<Map<String, ?>> iterator() {
return new CsvIterator();
}
}


					  

As you can see CsvSource accepts a java.io.Reader instance as a constructor argument and wraps it with a CsvMapReader. There is also a convenience constructor that will create a CsvSource instance for any CSV file in a classpath, which is the most likely scenario for testing.

We use the beginExport lifecycle method to read the header row and initialize the header field, which will later be used by the CsvMapReader when reading individual data rows from the file and converting them to a map. In a similar fashion, we use the endExport method to close the reader properly and free the resources associated with it.

Finally, we implement the Iterable interface by returning an instance of the inner CsvIterator class from the iterator method. The CsvIterator inner class implements the necessary iteration logic for our source:

private class CsvIterator implements Iterator<Map<String, ?>> {
private Map<String, String> item;
public boolean hasNext() {
try {
item = reader.read(header);
}
catch (IOException e) {
throw new RuntimeException(e);
}
return item != null;
}
public Map<String, ?> next() {
return item;
}
public void remove() {
throw new UnsupportedOperationException(
"CsvIterator does not support remove operation");
}
}

Thanks to the CsvMapReader, the implementation is quite simple. We read the next line from the file whenever the hasNext method is called, and store the result in the item field. The next method simply returns the item read by the previous call to hasNext.

That completes the implementation of CsvSource, and allows us to shift our focus back to Coherence.

Other  
  •  Oracle Coherence 3.5 : Installing Coherence, Starting up the Coherence cluster
  •  The Go-To Reference Design Map For The Cloud?
  •  Separating BPM and SOA Processes : SOA-Oriented Disputes with BEA
  •  Science! – Spintronics
  •  Linux - The Operating System With A Pure Heart (Part 2)
  •  Linux - The Operating System With A Pure Heart (Part 1)
  •  What's Your Strategy For Today’s Server Room Challenges?
  •  Truly Transparent Trinkets
  •  Gamma – Ray Lens …Made Possible
  •  Tech Chance - Revamps …Worsts Ever!
  •  Organic Computing (Part 2)
  •  Organic Computing (Part 1)
  •  The Price of Computer Components Is Going Up? (Part 3)
  •  The Price of Computer Components Is Going Up? (Part 2)
  •  The Price of Computer Components Is Going Up? (Part 1)
  •  The Best Computers You're (Probably) Never Heard Of (Part 3) - Z88, Atari Falcon
  •  The Best Computers You're (Probably) Never Heard Of (Part 2) - Tatung Einstein, Camputers Lynx
  •  The Best Computers You're (Probably) Never Heard Of (Part 1) - Xerox Star, The Grundy NewBrain
  •  Embarrassing Bugs (Part 3)
  •  Embarrassing Bugs (Part 2)
  •  
    Top 10
    Nikon 1 J2 With Stylish Design And Dependable Image And Video Quality
    Canon Powershot D20 - Super-Durable Waterproof Camera
    Fujifilm Finepix F800EXR – Another Excellent EXR
    Sony NEX-6 – The Best Compact Camera
    Teufel Cubycon 2 – An Excellent All-In-One For Films
    Dell S2740L - A Beautifully Crafted 27-inch IPS Monitor
    Philips 55PFL6007T With Fantastic Picture Quality
    Philips Gioco 278G4 – An Excellent 27-inch Screen
    Sony VPL-HW50ES – Sony’s Best Home Cinema Projector
    Windows Vista : Installing and Running Applications - Launching Applications
    Most View
    Bamboo Splash - Powerful Specs And Friendly Interface
    Powered By Windows (Part 2) - Toshiba Satellite U840 Series, Philips E248C3 MODA Lightframe Monitor & HP Envy Spectre 14
    MSI X79A-GD65 8D - Power without the Cost
    Canon EOS M With Wonderful Touchscreen Interface (Part 1)
    Windows Server 2003 : Building an Active Directory Structure (part 1) - The First Domain
    Personalize Your iPhone Case
    Speed ​​up browsing with a faster DNS
    Using and Configuring Public Folder Sharing
    Extending the Real-Time Communications Functionality of Exchange Server 2007 : Installing OCS 2007 (part 1)
    Google, privacy & you (Part 1)
    iPhone Application Development : Making Multivalue Choices with Pickers - Understanding Pickers
    Microsoft Surface With Windows RT - Truly A Unique Tablet
    Network Configuration & Troubleshooting (Part 1)
    Panasonic Lumix GH3 – The Fastest Touchscreen-Camera (Part 2)
    Programming Microsoft SQL Server 2005 : FOR XML Commands (part 3) - OPENXML Enhancements in SQL Server 2005
    Exchange Server 2010 : Track Exchange Performance (part 2) - Test the Performance Limitations in a Lab
    Extra Network Hardware Round-Up (Part 2) - NAS Drives, Media Center Extenders & Games Consoles
    Windows Server 2003 : Planning a Host Name Resolution Strategy - Understanding Name Resolution Requirements
    Google’s Data Liberation Front (Part 2)
    Datacolor SpyderLensCal (Part 1)