Can there be a competent way to produce a copy of table structure+data in HBase, within the same cluster? Clearly the destination table might have another title. What I have found to date:

  1. The CopyTable job, that has been referred to like a tool to copy data between different HBase groupings. It would support intra-cluster operation, but don't have any understanding on whether it's been made to handle that scenario effectively.

  2. Make use of the export+import jobs. Doing that seems like a hack consider I am a new comer to HBase maybe that could be a genuine solution?

A number of you may be asking why I am trying to get this done. My scenario is the fact that I've countless objects I want use of, inside a "snapshot" condition for a moment. There's a load procedure that runs daily which updates a number of these objects. If any step for the reason that batch process fails, I have to have the ability to "roll back" towards the original condition. Not just that, throughout the batch process I have to have the ability to serve demands towards the original condition.

And so the current flow is the fact that I duplicate the initial table to some working copy, still serve demands while using original table as i update the significant copy. When the batch process completes effectively I inform my services to make use of the brand new table, otherwise I simply discard the brand new table.

It has labored fine using BDB but I am inside a whole " new world " of really large data now and so i may be using the wrong approach. If anybody has any suggestions of designs I ought to be utilising rather, they're a lot more than welcome. :-)

All data in HBase includes a certain timestamp. That you can do reads (Will get and Scans) having a parameter showing that you would like towards the latest version from the data by confirmed timestamp. One factor you could do this is always to would be to do your reads to server your demands by using this parameter pointing to some time prior to the batch process starts. When the batch completes, bump your read timestamp up to the present condition.

A few items to be cautious about, for this method:

  • HBase tables are set up to keep the newest N versions of the given cell. Should you overwrite the information within the cell with N more recent values, then you'll lose the older value throughout the following compaction. (You may also configure these to having a TTL to run out cells, but that does not quite seem enjoy it matches your situation).
  • Similarly, should you remove the information in your process, then you definitely will not have the ability to see clearly following the next compaction.

So, if you do not problem removes in your batch process, and also you don't write more versions of the identical data that already is available inside your table than you've set up it in order to save, you can preserve serving old demands from the same table that you are upgrading. This effectively provides you with an overview.