I'm searching for pattern, framework or best practice to deal with a normal problem of application level data synchronisation.

Let us take a good example with only one table to really make it simpler.

  • I've an hard to rely on datasource of product catalog. Data can from time to time be not available or incomplete or sporadic. ( problem might originate from manual data entry error, ETL failure...)

  • I've got a live copy inside a Mysql table being used with a live system. Let us say an internet site.

I have to implement safety mecanism when upgrading the mysql table to "synchronize" with original databases. Listed here are the security criteria and also the solution I an recommending:

  • avoid removing records once they temporarily disappear from datasource => use "erased" boulean/date column or perhaps an archive/history table.

  • look for sporadic changes => configure rules per posts for example : should not change, should only increment,

  • look for integrity problem => (standard problem, no reason talking about approach)

  • capability to rollback last sync=> restore from history table ? make use of a version corporation/date column ?

Things I am searching for is better practice and pattern/tool to deal with such problem. Otherwise you aren't pointing towards the solution, I'd be grateful associated with a key phrases suggestion that will me limit which area of expert knowledge to understand more about.

We have a similar problem posting data from web statistics companies - they suffer exactly the same problems as the catalog. This is exactly what we did:

  • Every import/sync is designated a distinctive id (auto_increment int64)
  • Every table includes a history table that's just like the initial, but comes with an additional column "replaced_id" which will get the import-id from the import, that transformed the row (deletion is really a change) and also the primary secret is (row_id,replaced_id)
  • Every UPDATE copies the row towards the history table before altering it
  • Every Remove moves the row towards the history table

This will make rollback super easy:

  • Discover the import_id from the bad import
  • REPLACE INTO main_table SELECT <everything but superseded_id> FROM history table WHERE superseded_id=<bad import id>
  • DELETE FROM history_table WHERE superseded_id>=<bad import id>

For databases, where performance is a concern, we all do this inside a secondary database on the different server, then copy the found-to-be-good primary table towards the production database right into a new table main_table_$id with $id being the greatest import id and also have primary_table be considered a trivial view to SELECT * FROM main_table_$someid. Now by changing the vista to SELECT * FROM main_table_$newid we are able to atomically swicth the table.