Oracle Coherence 3.5 : Accessing the data grid (part 5) - Using the Coherence API - Loader design, Implementing CsvSource

7/23/2012 6:03:11 PM

Coherence API in action: Implementing the cache loader

The need to load the data into the Coherence cache from an external data source is ever-present. Many Coherence applications warm up the caches by pre-loading data from a relational database or other external data sources.

In this section, we will focus on a somewhat simpler scenario and write a utility that allows us to load objects into the cache from a comma-separated (CSV) file. This type of utility is very useful during development and testing, as it allows us to easily load test data into Coherence.

Loader design

If we forget for a moment about the technologies we are using and think about moving data from one data store to another at a higher level of abstraction, the solution is quite simple, as the following pseudo-code demonstrates:

for each item in source
add item to target
end

That's really all there is to it-we need to be able to iterate over the source data store, retrieve items from it, and import them into the target data store.

One thing we need to decide is how the individual items are going to be represented. While in a general case an item can be any object, in order to simplify things a bit for this particular example we will use a Java Map to represent an item. This map will contain property values for an item, keyed by property name.

Based on the given information, we can define the interfaces for source and target:

public interface Source extends Iterable<Map<String, ?>> {
void beginExport();
void endExport();
}

The Target interface is just as simple:

public interface Target {
void beginImport();
void endImport();
void importItem(Map<String, ?> item);
}

One thing you will notice in the previous interfaces is that there are matching pairs of begin/end methods. These are lifecycle methods that are used to initialize source and target and to perform any necessary cleanup.

Now that we have Source and Target interfaces defined, we can use them in the implementation of our Loader class:

public class Loader {
private Source source;
private Target target;
public Loader(Source source, Target target) {
this.source = source;
this.target = target;
}
public void load() {
source.beginExport();
target.beginImport();
for (Map<String, ?> sourceItem : source) {
target.importItem(sourceItem);
}
source.endExport();
target.endImport();
}
}

As you can see, the actual Java implementation is almost as simple as the pseudo-code on the previous page, which is a good thing.

However, that does imply that all the complexity and the actual heavy lifting are pushed down into our Source and Target implementations, so let's look at those.

Implementing CsvSource

On the surface, implementing a class that reads a text file line by line, splits each line into fields and creates a property map based on the header row and corresponding field values couldn't be any simpler. However, as with any other problem, there are subtle nuances that complicate the task.

For example, even though comma is used to separate the fields in each row, it could also appear within the content of individual fields, in which case the field as a whole needs to be enclosed in quotation marks.

This complicates the parsing quite a bit, as we cannot simply use String.split to convert a single row from a file into an array of individual fields. While writing a parser by hand wouldn't be too difficult, writing code that someone else has already written is not one of my favorite pastimes.

Super CSV (http://supercsv.sourceforge.net ), written by Kasper B. Graversen, is an open source library licensed under the Apache 2.0 license that does everything we need and much more, and I strongly suggest that you take a look at it before writing any custom code that reads or writes CSV files.

Among other things, Super CSV provides the CsvMapReader class, which does exactly what we need-it returns a map of header names to field values for each line read from the CSV file. That makes the implementation of CsvSource quite simple:

public class CsvSource implements Source {
private ICsvMapReader reader;
private String[] header;
public CsvSource(String name) {
this(new InputStreamReader(
CsvSource.class.getClassLoader().getResourceAsStream(name)));
}
public CsvSource(Reader reader) {
this.reader =
new CsvMapReader(reader, CsvPreference.STANDARD_PREFERENCE);
}
public void beginExport() {
try {
this.header = reader.getCSVHeader(false);
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public void endExport() {
try {
reader.close();
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public Iterator<Map<String, ?>> iterator() {
return new CsvIterator();
}
}

As you can see CsvSource accepts a java.io.Reader instance as a constructor argument and wraps it with a CsvMapReader. There is also a convenience constructor that will create a CsvSource instance for any CSV file in a classpath, which is the most likely scenario for testing.

We use the beginExport lifecycle method to read the header row and initialize the header field, which will later be used by the CsvMapReader when reading individual data rows from the file and converting them to a map. In a similar fashion, we use the endExport method to close the reader properly and free the resources associated with it.

Finally, we implement the Iterable interface by returning an instance of the inner CsvIterator class from the iterator method. The CsvIterator inner class implements the necessary iteration logic for our source:

private class CsvIterator implements Iterator<Map<String, ?>> {
private Map<String, String> item;
public boolean hasNext() {
try {
item = reader.read(header);
}
catch (IOException e) {
throw new RuntimeException(e);
}
return item != null;
}
public Map<String, ?> next() {
return item;
}
public void remove() {
throw new UnsupportedOperationException(
"CsvIterator does not support remove operation");
}
}

Thanks to the CsvMapReader, the implementation is quite simple. We read the next line from the file whenever the hasNext method is called, and store the result in the item field. The next method simply returns the item read by the previous call to hasNext.

That completes the implementation of CsvSource, and allows us to shift our focus back to Coherence.