A typical little bit of programming logic I've found myself applying frequently is one thing such as the following pseudo-code:
Let X = some value Let Database = some external Database handle if !Database.contains(X): SomeCalculation() Database.insert(X)
However, inside a multi-threaded program there exists a race condition here. Thread A might see if
X is within
Database, discover that it isn't, after which go to call
SomeCalculation(). Meanwhile, Thread B will even see if
X is within
Database, discover that it isn't, and place a replica entry.
So obviously, this must be synchronized like:
Let X = some value Let Database = some external Database handle LockMutex() if !Database.contains(X): SomeCalculation() Database.insert(X) UnlockMutex()
This really is fine, except let's say the applying is really a distributed application, running across multiple computer systems, which communicate with similar back-finish database machine? Within this situation, a Mutex is useless, since it only synchronizes just one demonstration of the application along with other local threads. To make this happen, we'd take some type of "global" distributed synchronization technique. (Think that simply disallowing replicates in
Database isn't a achievable strategy.)
Generally, what exactly are some practical methods to this issue?
I recognize this is extremely generic, but I'd rather not get this to a language-specific question since this is an problem that pops up across multiple languages and multiple Database technologies.
I deliberately prevented indicating whether I am speaking a good RDBMS or SQL Database, versus something similar to a NoSQL Database, because again - I am searching for generalized solutions according to industry practices. For instance, is situation something which Atomic Saved Methods might solve? Or Atomic Transactions? Or perhaps is this something which requires something similar to a "Distributed Mutex"? Or even more generally, is problem generally addressed through the Database system, or perhaps is it something the applying itself should handle?
Whether it works out this is impossible to reply to whatsoever without more information, please let me know in order to modify it.
One sure method to ensure against data stomping would be to lock the information row. Many databases permit you to do this, via transactions. Some don't support transactions.
However, this really is overkill for many cases, where contention is lower in general. You might like to educate yourself on Isolation levels to obtain more background around the subject.
A much better general approach is frequently Optimistic Concurrency. The concept behind it's that every data row features a signature, a timestamp works fine however the signature do not need to the perfect oriented. Maybe it's a hash value, for instance. This can be a general concurrency management approach and isn't restricted to relational stores.
The application that changes data first reads the row, after which works whatever information it takes, after which sooner or later, creates the up-to-date data to the information store. Via Positive concurrency, the application creates the update using the agreement (expressed in SQL if it's a SQL database) the data row should be up-to-date only when the signature hasn't transformed within the interim. And, every time a data row is up-to-date, the signature should be up-to-date too.
As a result updates do not get stomped on. However for a far more rigorous explanation from the concurrency issues, make reference to it on DB Isolation levels.
All distributed updaters are required to follow the OCC convention (or something like that more powerful, like transactional securing) to ensure that this to operate.
You are able to clearly slowly move the "synch" part towards the DB layer itself, utilizing an exclusive lock on the specific resource.
This really is a little extreme (generally, trying the place and controlling the exception whenever you really uncover that somebody already placed the row) could be more sufficient, I believe.