I want a quick, reliable and memory-efficient key--value database for Linux. My secrets are about 128 bytes, and also the maximum value size could be 128K or 256K. The database subsystem should not use a lot more than about 1 Megabytes of RAM. The entire database dimensions are 20G (!), only a little random fraction from the information is utilized at any given time. If required, I'm able to move some data blobs from the database (to regular files), therefore the size will get lower to two GB maximum. The database must survive something crash with no reduction in lately unmodified data. I'll have about 100 occasions more reads than creates. It's a plus whether it may use a block device (with no filesystem) as storage. I do not need client-server functionality, only a library. I want Python bindings (however i can use them if they're unavailable).

Which solutions must i consider, and which would you recommend?

Candidates I understand which perform:

  • Tokyo, japan Cabinet (Python bindings are pytc, see also pytc example code, supports hashes and B+trees, transaction log files and much more, how big the bucket array is bound at database creation time the author must close the file to provide others an opportunity plenty of small creates with reopening the apply for all of them are extremely slow the Tyrant server can sort out the plenty of small creates speed comparison between Tokyo, japan Cabinet, Tokyo, japan Tyrant and Berkeley DB)
  • VSDB (safe even on NFS, without securing how about obstacles? updates are extremely slow, although not as slow as with cdb last version in 2003)
  • BerkeleyDB (provides crash recovery provides transactions the bsddb Python module provides bindings)
  • Samba's TDB (with transactions and Python bindings, some customers experienced corruption, sometimes mmap()s the entire file, the repack operation sometimes doubles the quality, produces mysterious failures when the database is bigger than 2G (even on 64-bit systems), cluster implementation (CTDB) available too file develops too big after plenty of modifications file becomes not fast enough after plenty of hash contention no built-in method to rebuild the file extremely fast parallel updates by securing individual hash containers)
  • aodbm (append-only so survives something crash, with Python bindings)
  • hamsterdb (with Python bindings)
  • C-tree (mature, versatile commercial solution rich in performance, includes a free edition with reduced functionality)
  • that old TDB (from 2001)
  • bitcask (log-structured, designed in Erlang)
  • many other DBM implementations (for example GDBM, NDBM, QDBM,, Perl's SDBM or Ruby's most likely they do not have proper crash recovery)

I will not begin using these:

  • MemcacheDB (client-server, uses BereleleyDB like a after sales)
  • cdb (must regenerate the entire database upon each write)
  • http://www.wildsparx.com/apbcdb/ (ditto)
  • Redis (keeps the entire database in memory)
  • SQLite (it might be very slow without periodic cleaning, see autocompletion within the within the location bar in Opera 3. beware: small writing transactions can be quite slow beware: if your busy process does many transactions, other processes starve, plus they can't ever obtain the lock)
  • MongoDB (overweight-weight, goodies values as objects with internal structure)
  • Firebird (SQL-based RDBMS, overweight-weight)

FYI, a recent article about key--value databases within the Linux magazine.

FYI, an older software list

FYI, a speed comparison of MemcacheDB, Redis and Tokyo, japan Cabinet Tyrant

Related questions about StackOverflow: