Coherence API in action: Implementing the cache loader
The need to load the data into
the Coherence cache from an external data source is ever-present. Many
Coherence applications warm up the caches by pre-loading data from a
relational database or other external data sources.
In this section, we
will focus on a somewhat simpler scenario and write a utility that
allows us to load objects into the cache from a comma-separated (CSV)
file. This type of utility is very useful during development and
testing, as it allows us to easily load test data into Coherence.
Loader design
If we forget for a moment about
the technologies we are using and think about moving data from one data
store to another at a higher level of abstraction, the solution is quite
simple, as the following pseudo-code demonstrates:
for each item in source
add item to target
end
That's really all there is to
it-we need to be able to iterate over the source data store, retrieve
items from it, and import them into the target data store.
One thing we need to decide is
how the individual items are going to be represented. While in a general
case an item can be any object, in order to simplify things a bit for
this particular example we will use a Java Map to represent an item. This map will contain property values for an item, keyed by property name.
Based on the given information, we can define the interfaces for source and target:
public interface Source extends Iterable<Map<String, ?>> {
void beginExport();
void endExport();
}
The Target interface is just as simple:
public interface Target {
void beginImport();
void endImport();
void importItem(Map<String, ?> item);
}
One thing you will notice in the previous interfaces is that there are matching pairs of begin/end methods. These are lifecycle methods that are used to initialize source and target and to perform any necessary cleanup.
Now that we have Source and Target interfaces defined, we can use them in the implementation of our Loader class:
public class Loader {
private Source source;
private Target target;
public Loader(Source source, Target target) {
this.source = source;
this.target = target;
}
public void load() {
source.beginExport();
target.beginImport();
for (Map<String, ?> sourceItem : source) {
target.importItem(sourceItem);
}
source.endExport();
target.endImport();
}
}
As you can see, the actual
Java implementation is almost as simple as the pseudo-code on the
previous page, which is a good thing.
However, that does imply that all the complexity and the actual heavy lifting are pushed down into our Source and Target implementations, so let's look at those.
Implementing CsvSource
On the surface, implementing a
class that reads a text file line by line, splits each line into fields
and creates a property map based on the header row and corresponding
field values couldn't be any simpler. However, as with any other
problem, there are subtle nuances that complicate the task.
For example, even though comma
is used to separate the fields in each row, it could also appear within
the content of individual fields, in which case the field as a whole
needs to be enclosed in quotation marks.
This complicates the parsing quite a bit, as we cannot simply use String.split
to convert a single row from a file into an array of individual fields.
While writing a parser by hand wouldn't be too difficult, writing code
that someone else has already written is not one of my favorite
pastimes.
Super CSV (http://supercsv.sourceforge.net),
written by Kasper B. Graversen, is an open source library licensed
under the Apache 2.0 license that does everything we need and much more,
and I strongly suggest that you take a look at it before writing any
custom code that reads or writes CSV files.
Among other things, Super CSV provides the CsvMapReader
class, which does exactly what we need-it returns a map of header
names to field values for each line read from the CSV file. That makes
the implementation of CsvSource quite simple:
public class CsvSource implements Source {
private ICsvMapReader reader;
private String[] header;
public CsvSource(String name) {
this(new InputStreamReader(
CsvSource.class.getClassLoader().getResourceAsStream(name)));
}
public CsvSource(Reader reader) {
this.reader =
new CsvMapReader(reader, CsvPreference.STANDARD_PREFERENCE);
}
public void beginExport() {
try {
this.header = reader.getCSVHeader(false);
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public void endExport() {
try {
reader.close();
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public Iterator<Map<String, ?>> iterator() {
return new CsvIterator();
}
}
As you can see CsvSource accepts a java.io.Reader instance as a constructor argument and wraps it with a CsvMapReader. There is also a convenience constructor that will create a CsvSource instance for any CSV file in a classpath, which is the most likely scenario for testing.
We use the beginExport lifecycle method to read the header row and initialize the header field, which will later be used by the CsvMapReader when reading individual data rows from the file and converting them to a map. In a similar fashion, we use the endExport method to close the reader properly and free the resources associated with it.
Finally, we implement the Iterable interface by returning an instance of the inner CsvIterator class from the iterator method. The CsvIterator inner class implements the necessary iteration logic for our source:
private class CsvIterator implements Iterator<Map<String, ?>> {
private Map<String, String> item;
public boolean hasNext() {
try {
item = reader.read(header);
}
catch (IOException e) {
throw new RuntimeException(e);
}
return item != null;
}
public Map<String, ?> next() {
return item;
}
public void remove() {
throw new UnsupportedOperationException(
"CsvIterator does not support remove operation");
}
}
Thanks to the CsvMapReader, the implementation is quite simple. We read the next line from the file whenever the hasNext method is called, and store the result in the item field. The next method simply returns the item read by the previous call to hasNext.
That completes the implementation of CsvSource, and allows us to shift our focus back to Coherence.