I’m creating revision control for data within our database. It will have a way to keep revisions, rollback, and rollback rollbacks. The database tables I’m using that should be revisioned is below:
objects object_portions object_characteristics
Objects may be the primary object, portions are arranged up parts of the item, and characteristics are characteristics of information in the chunk. Characteristics stores the item ID together with the chunk ID this way It's very easy to choose all of the characteristics to have an object without needing to do another JOIN towards the portions table.
The only real factor which will ever really change is characteristics, however when a characteristic changes, the affected chunk(s) is going to be up-to-date, and each time a chunk is up-to-date, the item will get up-to-date also. Now I’ve considered two techniques used in fixing this issue.
- Create three new tables having a suffix of _rev, these tables would simply store older versions from the objects. The actual objects would also store a rev number. So allows say I transformed three different characteristics, these characteristics spanned across three portions, so three new rows in portions, three in characteristics, and something in object for revisions. As this is the very first change, the rev ID could be 1, within the real tables, their rev could be 2.
- I’d function the above mentioned, but rather than getting another table, I’d simply store it within the same table.
One factor to notice, there'll always be revisions, the quantity of portions can differ from 1 to 100+. Even though average is about 1-15. The characteristics can differ from to 100+. The typical is most likely around 30. EVERY attribute Can change. These objects get subjected to a “phase” where all characteristics should be completed by customers. Once they’re filled, the item is aged rather than modified again. All objects possess a corresponding file. So object will keep current hash (sha256) from the file also. This hash can be used for deduplication reasons.
Adding a revision id towards the primary key from the objects table is certainly what you want. You could have multiple active revisions without having to move data between tables. Choosing multiple tables, you'll find it hard to write rollback calculations moving data around while keeping integrity constraints--especially difficult when the product is underactive development.
If revisions are produced in human time, an easy time stamp may go like a revision id. Otherwise only use an integer because the revision number--I have implemented Resumes style dotted revision amounts and wanted I had not. You are able to track derivation history inside a separate table if people request for that feature later.
objects object_chunks revision object_attributes
Where revision is definitely an growing number, you can simply choose the objects with max(revision) grouping by object, object_portions later on.