I have been designated an activity for the web application where I have to take genres of varying names (Horror, Comedy, and so forth) arriving from various sources (Facebook Opengraph, XML Feeds, etc.) and lower the information lower to some "master" table.

A good example of this really is there exists a genre known as "Action", and our feeds includes a genre named "Action/Adventure". Rather, I'd rather the "Action/Adventure" films be designated the "Action" master genre.

I believed of writing a tough-coded hash map. We use something similar to this with languages:

languages = { "en_US" => "English", "en_GB" => "English" }

Does anybody are conscious of an easy method? Possibly I ought to depend on the look-up table within the database? Cheers!

i believe the secret here's to make certain you preserve the initial designation, together with the origin identifier. You might then produce a map between your source designation and also the target preferred designation, and lastly decide whether or not to convert this once statically, or perhaps in queries/sights dynamically.

I believe it is dependent on which you are using to drag in most of the data. Randy is correct for the reason that you need to most likely preserve the initial data in some manner, although that does not need to be inside your production database - it may be in certain interim form, which can be text files or any other staging database.

At the chance of going slightly off-subject...

Usually when I am carrying out complex ETL from multiple sources I personally use a 2-step process. The initial step would be to consolidate all the inputs into one format. This can be in CSV or XML files or it may be inside a staging database.

After that I've got a second procedure that loads the information in to the production or master database. The benefit here's that after you have your import (or "load") code complete, with all the potentially complicated business logic, etc. you hardly ever need to touch it (and potentially break it) again. If your new databases comes online you just need to write a brand new little bit of code to have it to your universal format. Once it's for the reason that format, you will know your import process will handle it properly.

Again though, the way you translate (or "transform") data will rely on what you are using for that heavy-lifting inside your ETL system. If you are utilizing a staging database then it seems sensible to place it right into a table. You typically do transformation when going in the raw data towards the universal format.