Could it be a great technique to accumulate in webserver memory up to a particular limit of information with time that's being written towards the database &lifier send it as being batch updates after every specified interval or after data develops larger than threshold size.

Such type of data could be really small like just adding rapport between two organizations meaning adding just some ids towards the rows.

(Obviously, the postponed data ought to be such that's unlikely to become immediately visible).

What are the disadvantages of the approach ?


Usage: Building web application using Cassandra DB, with Java &lifier JSF.

The primary disadvantage is it requires another thread to implement the timeout (a tiny bit of complexity) Nevertheless the benefits could be much greater.

An easy method to implement this is by using a wait/inform (there does not seem like good solution while using concurrency library)

private final List<T> buffered = new ArrayList<T>();
private final int notifySize = ...
private final int timeoutMS = ...

public synchronized void add(T t) {
    buffered.add(t);
    if (buffered.size() >= notifySize)
       notifyAll();
}

public synchronized void drain(List<T> drained) throws InterruptedException {
    while(buffered.isEmpty())
        wait(timeoutMS);
    drained.addAll(buffered);
    buffered.clear();
}

The add and drained could be known as by a variety of threads, however imagine you'd only have one thread draining, until it's interrupted.

Short answer: this can be a bad idea.

Cassandra's batch procedures (e.g. http://pycassa.github.com/pycassa/api/pycassa/batch.html) exist to let you group updates within an idempotent unit. This enables you to definitely retry the batch like a unit, therefore the purpose is roughly much like a transaction inside a relational database.

However, unlike the transaction example, the effect on performance is minimal and actually making the burden unnaturally "bursty" is generally detrimental.