I just read with an article known as "Hands-on Cassandra" that Tokyo Cabinet is harmful to large data. Why? The number of bytes TC must store before begin to work bad? It's easy to determine a estimated value?
According to this article, there is a confirmed performance degradation past 500GB.
According to this wide comparison of NoSQL databases, the issues in TC start at >20mm rows.
One of the possible reasons for size dependency is always that it appears TC is implemented using hashes, and sooner or later you take into hash key collisions which obviously ruins the performance. Automatically, key space isn't as large as possible (you have to tune "bnum" parameter - quantity of aspects of the bucket array - to improve performance)
According to various evaluations, MongoDB appears to become the suggested method for large datasets.