Appears enjoy it is supported in
Hadoop(reference), however i dont understand how to make use of this.
I wish to :
a.) Map - Read a huge XML file and load the relevant data and pass on to reduce b.) Reduce - write two .sql files for different tables
Why I'm selecting map/reduce happens because I must do that for more than
100k(may be many more) xml files dwelling ondisk. much better suggestions are welcome
Any assets/lessons explaining using this really is appreciated.
Python and may wish to learn to accomplish this using
Is probably not a stylish solution, however, you could create two templates to transform the creation of the reduce tasks in to the needed format when the job is done. Much might be automated by writing a spend script which may search for the reduce results and apply the templates in it. Using the spend script the transformation occur in sequence and does not take proper care of the n machines within the cluster.
Otherwise within the reduce tasks you can produce the two output formats right into a single file with a few delimiter and split them later while using delimiter. Within this approach because the transformation occur in the reduce, the transformation is spread across all of the nodes within the cluster.