I am searching for some design help here.

I am carrying out work for any client that needs me to keep data regarding their hundreds of 1000's of employees. The information has been provided to me in Stand out excel spreadsheets, one for every city/country by which they've offices.

I've got a database that consists of a excel spreadsheets table along with a data table. The information table includes a column spreadsheet_id which links it to the excel spreadsheets table to ensure that I understand which spreadsheet each data row originated from. I in addition have a simple spend script which uploads the information towards the database.

To date so great. However, there's some data missing in the original excel spreadsheets, and rather than giving me only the missing data, the customer is giving us a modified version from the original spreadsheet using the new data appended into it. I am unable to simply overwrite the initial data because the data had been used and you will find other tables that connect to it.

Now you ask , - how do you handle this? It appears in my experience which i possess the following options:

  1. Upload the whole modified spreadsheet, and mark the initial as 'inactive'.

    PROS: It is rather simple, straightforward, and simply automated.
    CONS: There's lots of redundant data being saved within the database unnecessarily, particularly if the spreadsheet changes numerous occasions.

  2. Perform a diff around the excel spreadsheets and just upload the rows that transformed.

    PROS: Less data will get loaded in to the database.
    CONS: It's a minimum of partly manual, and for that reason vulnerable to error. Additionally, it implies that the database will no more tell the whole story - e.g. if some information is missing at some future date, I won't have the ability to authoritatively state that I never got the information simply by querying the database. And can doing diffs continue working even when I must get it done multiple occasions?

  3. Write a procedure that compares each spreadsheet row with what's within the database, card inserts the rows which have transformed data, and sets the initial data row to inactive. (I must keep an eye on the initial data also, and so i can't overwrite it.)

    PROS: It's automated.
    CONS: It will require time for you to write and test this type of process, and it'll be very hard that i can justify time spent doing this.

I am wishing to develop a 4th and solution. Any ideas in regards to what that could be?

For those who have not a way to become one hundred percent certain you are able to avoid human error in option 2, do not do it.

Option 3: It shouldn't be too hard (or time intensive) to create a VBA script that does the comparison for you personally. VBA isn't fast, however, you can turn it on over evening. Shouldn't take several or two hrs to have it running error free.

Option 1: This is my preferred approach: Fast, simple, and that i can't think about something that may go wrong at this time. (Well, you need to first mark the initial as 'inactive', then upload the brand new data set IMO). Particularly if this could happen more frequently later on, getting a reliable and fast process to cope with it is necessary.

If you're really concerned about all of the inactive records, you may also remove them after your update (delete from spreadsheets where status='inactive' or somesuch). But to date, all databases I've observed in my work had plenty of individuals. I would not worry an excessive amount of about this.