I'm searching for pattern, framework or best practice to deal with a normal problem of application level data synchronisation.
Let us take a good example with only one table to really make it simpler.
I've an hard to rely on datasource of product catalog. Data can from time to time be not available or incomplete or sporadic. ( problem might originate from manual data entry error, ETL failure...)
I've got a live copy inside a Mysql table being used with a live system. Let us say an internet site.
I have to implement safety mecanism when upgrading the mysql table to "synchronize" with original databases. Listed here are the security criteria and also the solution I an recommending:
avoid removing records once they temporarily disappear from datasource => use "erased" boulean/date column or perhaps an archive/history table.
look for sporadic changes => configure rules per posts for example : should not change, should only increment,
look for integrity problem => (standard problem, no reason talking about approach)
capability to rollback last sync=> restore from history table ? make use of a version corporation/date column ?
Things I am searching for is better practice and pattern/tool to deal with such problem. Otherwise you aren't pointing towards the solution, I'd be grateful associated with a key phrases suggestion that will me limit which area of expert knowledge to understand more about.
We have a similar problem posting data from web statistics companies - they suffer exactly the same problems as the catalog. This is exactly what we did:
- Every import/sync is designated a distinctive id (auto_increment int64)
- Every table includes a history table that's just like the initial, but comes with an additional column "replaced_id" which will get the import-id from the import, that transformed the row (deletion is really a change) and also the primary secret is (row_id,replaced_id)
- Every UPDATE copies the row towards the history table before altering it
- Every Remove moves the row towards the history table
This will make rollback super easy:
- Discover the import_id from the bad import
REPLACE INTO main_table SELECT <everything but superseded_id> FROM history table WHERE superseded_id=<bad import id>
DELETE FROM history_table WHERE superseded_id>=<bad import id>
For databases, where performance is a concern, we all do this inside a secondary database on the different server, then copy the found-to-be-good primary table towards the production database right into a new table
main_table_$id with $id being the greatest import id and also have primary_table be considered a trivial view to
SELECT * FROM main_table_$someid. Now by changing the vista to
SELECT * FROM main_table_$newid we are able to atomically swicth the table.