Imagine following situation:
There's a distributed key/value database saved on computer network. One central "primary" computer that brings request, and multiple child machines that store servings of data. I.e. something similar to this:
main computer | +--child A +--child B +--child C .....
I.e. "star" topology.
- Servings of database overlap, and many different versions of record with same "key" could be saved on several machines at same time.
- Secret is not certain to exist on all machines or on specific machines.
- "Children" don't synchronize data with one another.
- Information is asked for/read only via primary computer, which must return newest version of information for asked for key.
- Information is written only through children - they receive new values from the 3 sources.
- Information is never erased.
The primary problem:
With your structure, how do you pick which version is newest?
I'm able to think about two ways to handle the problem:
- Add timestamp for each record, when it's written into database vial child machine, use timestamp to find out version.
- Use "revision number" or "write operation index" (released by primary computer, batches by one for each write operation) rather than timestamps.
However, both approaches aren't perfect:
first approach requires perfect clock synchronization for those machines, otherwise system will neglect to deliver newest record value.
second approach may cause every child to request primary machine for timestamp via network, that will introduce writing delays, plus primary machine must be locked by mutex, so multithreading performance are affected.
What's the better way to cope with this case? How can real clustered databases cope with this case (newest record version in cluster)?
Your statement the first approach requires perfect clock synchronization isn't correct.
You don't worry about the complete timestamps released with a child, just the relative timestamps. In order lengthy because the clocks advance in the same rate, they do not need to be synchronized you are able to correct for that known offsets.
When the clocks around the children advance at different rates, then you definitely must make use of a method that involves coordination (writing can't be lock-free within the slow path). This really is provable by contradiction, since clearly two children individually writing something as time passes-records that can't be associated with one another won't let an outdoors observer pick which was written later.
However, that you can do the coordination in parallel using the actual write: email the kid and, concurrently, for an purchased log which enables a resolution of which write happened first (you do not need a ticket-type system as if you appear to point out if you have a write log). Therefore it does not always delay the entire process of writing whatsoever!
Have a look at logical-timestamp key-value systems like Accumulo, an HBase alternative (presently in Apache-project incubation) - this really is real life clustered database doing precisely what you are requesting.