This really is associated with my other question when to move from a spreadsheet to RDBMS
Getting made the decision to maneuver for an RDBMS from an stand out book, here's what I suggest to complete.
The present information is loosely structured across two sheets inside a work-book. The very first sheet consists of primary record. The 2nd sheet enables additional data.
My target DBMS is mysql, but I am available to suggestions.
- Define RDBMS schema
- Define, say, web-services to interface using the database therefore the same can be used as both, UI and migration.
- Define a migration script to
- Read each number of affiliated rows in the spreadsheet
- Apply validation/constraints
- Email RDBMS while using web-service
- Define macros/functions/modules in spreadsheet to enforce validation where possible. This can allow utilisation of the existing system as the new pops up. Simultaneously, ( i really hope ) it'll reduce migration failures once the move is eventually made.
What strategy can you follow?
You will find two aspects for this question.
The first thing is to "Define RDBMS schema" but exactly how far will you opt for it? Excel spreadsheets are infamously not-stabilized and thus have plenty of duplication. You say inside your other question that "Information is loosely structured, and you will find no explicit constraints." If you wish to transform that right into a rigourously-defined schema (a minimum of 3NF) then you will need to do some cleansing. SQL is the greatest tool for data manipulation.
It is best to build two staging tables, one for every worksheet. Define the posts as loosely as you possibly can (large strings essentially) to ensure that you can easily load the spreadsheets' data. After you have the information loaded in to the staging tables you are able to run queries to evaluate the information quality:
- the number of duplicate primary secrets?
- the number of different data formats?
- do you know the look-up codes?
- do all of the rows within the second worksheet have parent records within the first?
- how consistent are code formats, data types, etc?
- and so forth.
These research provides you with a great grounds for writing the SQL with which you'll populate your actual schema.
Or it may be the information is so hopeless that you choose to stick to only the two tables. I believe that's an unlikely outcome (most programs possess some underlying structure, we have to search hard enough).
Your best choice would be to export the excel spreadsheets to CSV format. Stand out includes a wizard to get this done. Utilize it (instead of doing
Save As...). When the excel spreadsheets contain any free text whatsoever the odds are you'll have sentences that have commas, so make certain you select a very safe separator, for example
Most RDBMS tools possess a facility to import data from CSV files. Postgresql and Mysql would be the apparent choices for an NGO (I presume price is considered) but both SQL Server and Oracle are available in free (if restricted) Express models. SQL Server clearly has got the best integration with Stand out. Oracle includes a great feature known as exterior tables which let us define a table in which the information is in a CSV file, getting rid of the requirement for staging tables.
Another factor to think about is Google Application Engine. This uses Large Table instead of an RDBMS but that could be more suitable for your loosely-structured data. I would recommend it since you pointed out Google Paperwork as a substitute solution. GAE is definitely an attractive option since it is free (pretty much, they begin charging if usage surpasses some very generous thresholds) also it would solve the application discussing problem with individuals other NGOs. Clearly your organisation might have some qualms about Google hosting their data. It is dependent on which area they're operating in, and also the sensitivity from the information.
You might do more work than you have to. Stand out excel spreadsheets could be saved as Resumes or XML files and several RDBMS clients support posting these files straight into tables.
This might permit you skip writing web service wrappers and migration scripts. Your database constraints would be correctly enforced throughout any import. In case your RDBMS data model or schema is climax your Stand out excel spreadsheets, however, then some translation would obviously need to occur via scripts or XSLT.