I apologize if the real question is fundamental(I am a new comer to nosql). Essentially I've got a large mathimatical procedure that I am separating and getting different servers process and send the end result for an hbase database. Each server computing the information, is definitely an hbase regional server, and it has thrift onto it.
I believed of every server processing the information after which upgrading hbase in your area(via thrift). I am unsure if this sounds like a great way because I do not completely understand the way the master(named) node will handle the upload/splitting.
I am wondering exactly what the best practice happens when uploading considerable amounts of information(as a whole I suspect it will be into the millions rows)? Could it be okay to transmit it to regional servers or should everything feel the master?
From this blog publish,
The overall flow is the fact that a brand new client contacts the Zookeeper quorum (a separate cluster of Zookeeper nodes) first to locate a particular row key. It will so by locating the server title (i.e. host title) that hosts the -ROOT- region from Zookeeper. With this information it may query that server to obtain the server that hosts the .META. table. Both of those two particulars are cached and just researched once. Lastly it may query the .META. server and retrieve the server which has the row the client is searching for.
Once it's been told in which the row resides, i.e. with what region, it caches these details too and contacts the HRegionServer hosting that region directly. So with time the customer includes a pretty complete picture of where you'll get rows from without requiring to question the .META. server again.
I'm presuming you directly make use of the thrift interface. For the reason that situation, even when you call any mutation from the particular regionserver, that regionserver only functions like a client. It'll contact Zookeeper quorum, contact Master to obtain the regions where you can write the information and proceed in the same manner as though it had been written from another regionserver.
Could it be okay to transmit it to regional servers or should everything feel the master?
Both of them are same. There's no such factor as writing straight to regionserver. Master must be approached to find out which region to create the output to.
If you work with a hadoop map-reduce job, and taking advantage of the Java API for that mapreduce job, you'll be able to make use of the [cde] to create straight to HFiles without studying the HBase API. It's about ~10 x faster than while using API.