Friday, September 4, 2009

The semantics of Flushing in Ehcache

When I was going through the ehcache user manual, I was a little confused about a 'flush' and a 'shutdown' of Ehcache. Of course, the names of these methods suggests their purpose in a perfectly clear way. But I never really paid any attention towards the inner workings of these methods.

I always thought if I can flush each and every cache in my manager class before I shutdown its equivalent to calling a shutdown method on it. I'm talking about caches with basic functionalities ( with no bootstraps, no loaders etc.. so sending shutdown hook to all these (empty)listeners doesn't matter ). But, I was proved wrong. Here's the reason ...

Following code does a very simple task. It reads all the elements from one ehcache and transfers all its element to a new cache and tries to persist the items to diskstore before termination.


import net.sf.ehcache.Element
import net.sf.ehcache.CacheManager
import net.sf.ehcache.Cache

def cachemgr = new CacheManager("D:/mPortal/workspace_new_cvs_structure/EhCacheDemo/config/change_listener_cache.xml")
def deltacache = cachemgr.getCache("deltaCache")

def deltaclone = new Cache("deltaCacheClone", 10000,null, true,cachemgr.getDiskStorePath(), true, 120,120,true, 120,null)
cachemgr.addCache(deltaclone)


println "Migration about to begin"
println "Size of the original cache : ${deltacache.getSize()}"
println "Size of the clone : ${deltaclone.getSize()}"

deltacache.getKeys().each{
ele = deltacache.get(it)
deltaclone.put(new Element (ele.getKey(),ele.getValue()))
}

println "Size of the original cache after migration : ${deltacache.getSize()}"
println "Size of the clone after migration : ${deltaclone.getSize()}"

println "Migration successfully finished.."

deltacache.flush()
deltaclone.flush()


Note that I'm flushing all the caches I created/used in my program at the end.

To my surprise, whenever I run this program again the 'deltaclone' cache always initializes itself to zero. This puzzled me for quite sometime and finally forced me to revisit the source code of ehcache to understand the behavior.

The Reason

What I found is that the 'flush' operation was not a synchronous operation at all. it only signals the spool to flush it when it wakes again. In my case, my VM didn't stop till this thread do its job.. it has killed it forcibly.

How Shutdown Solves this problem ?

The shutdown method however, is very responsible and gracefully waits till the thread finishes its execution. Following is the snippet which can explain this behavior...



//set the write index flag. Ignored if not persistent
flush();

//tell the spool thread to spool down. It will loop one last time if flush was caled.
spoolAndExpiryThreadActive = false;

//interrupt the spoolAndExpiryThread if it is waiting to run again to get it to run now
// Then wait for it to write
spoolAndExpiryThread.interrupt();
if (spoolAndExpiryThread != null) {
spoolAndExpiryThread.join();
}


This is the snippet from the DiskStore's dispose method. Thanks to the documentation, the code is pretty much self explainable.


The fix

So, the proper way to fix my program is very simple.. by adding the shutdown() method call at the end.


import net.sf.ehcache.Element
import net.sf.ehcache.CacheManager
import net.sf.ehcache.Cache

def cachemgr = new CacheManager("D:/mPortal/workspace_new_cvs_structure/EhCacheDemo/config/change_listener_cache.xml")
def deltacache = cachemgr.getCache("deltaCache")

def deltaclone = new Cache("deltaCacheClone", 10000,null, true,cachemgr.getDiskStorePath(), true, 120,120,true, 120,null)
cachemgr.addCache(deltaclone)


println "Migration about to begin"
println "Size of the original cache : ${deltacache.getSize()}"
println "Size of the clone : ${deltaclone.getSize()}"

deltacache.getKeys().each{
ele = deltacache.get(it)
deltaclone.put(new Element (ele.getKey(),ele.getValue()))
}

println "Size of the original cache after migration : ${deltacache.getSize()}"
println "Size of the clone after migration : ${deltaclone.getSize()}"

println "Migration successfully finished.."

deltacache.flush() // Redundant
deltaclone.flush() // Redundant

cachemgr.shutdown()


The additional flush operations are totally redundant, the program works even if you remove those statements..

Wednesday, September 2, 2009

How to download response, even if you get a HTTP 500

When we normally connect to a HTTP URL from within our java code, we tend to check only the status code and proceed based on its value. We'll be interested in handling only the HTTP SUCCESS code 200, but for other errors you'll simply want to report it as a 'Generic error'

But there're times, when you want to see what exactly does the server returned for our request for diagnostics purpose. For example, a monitoring server may not just want to know what error code the server has returned, but also want details about what exactly happened at the server side. I've written simple groovy script which uses the commons-http client and core modules from Apache to do this task for me.



import org.apache.commons.httpclient.HttpClient
import org.apache.commons.httpclient.methods.GetMethod

client = new HttpClient()
method = new GetMethod("http://10.11.12.48:8004/monitor")
def statusCode = client.executeMethod(method)

println "Status code is : ${statusCode}"

//Read the response body.
byte[] responseBody = method.getResponseBody();

// Deal with the response.
// Use caution: ensure correct character encoding and is not binary data
println(new String(responseBody));

You should keep following files in the class path to run this program successfully.
  1. commons-httpclient.jar
  2. commons-codec.jar
  3. commons-logging.jar

And ofcourse, you can change the URL to which ever value you want to test out other error codes.

Most wanted features in Ehcache

In our company, we finally decided to make a move forward and implement ehcache as our caching provider. The integration has provided me with great challenges, it forced me to learn quite a bit about ehcache before we took the decision.

Ehcache, as an API is well tested and perfectly capable of holding up to larger loads. But what's troubling me now is to monitor the caching activity that goes under the hood in ehcache. In most of the normal cases we don't have to worry about what's happening inside Ehcache, but there're times where we want to know very specific things like...

  • How far has the replication has reached ?
  • Has all the servers ( copies of ehcaches) in the cluster acknowledged the replication event successfully ?
  • How do we identify the replication failures ?
Ehcache currently doesn't support event handling at this granular level. Thanks to its extensible architecture, we can still write our own implementation classes to introduce this level of granularity. But, as you know in this world of open source every piece of code you write is worthless if some super-smart guy has already done it before.

One other thing I got stuck is about maintaining the coherency of data between the database and the cache.

The caching solution what we're looking at requires that the database should be 'master' of all the caches, we must make sure the database has all the latest information and at the same time the data must be available up-to-date to all the runtime components which take the hits from users in real time.

  • At any given time, how do I know that the data cached in Ehcache is consistent with the database or not ?
Has anyone encountered this kind of situation before ? How were you able to handle it ?