Are you able to please indicate alternative data storage tools and provide top reasons to rely on them rather than good-old relational databases? Imo most programs rarely use full energy of SQL, it might be interesting to determine building a SQL-free application.

Plain text files inside a filesystem

  • Quite simple to produce and edit
  • Simple for customers to control with simple tools (i.e. text editors, grep etc)
  • Efficient storage of binary documents

XML or JSON files on disk

  • As above, however with a little more capability to validate the dwelling.

Spreadsheet / CSV file

  • Super easy model for business customers to know

Subversion (or similar disk based version control system)

  • Excellent support for versioning of information

Berkley DB (Essentially, a disk based hashtable)

  • Quite simple conceptually (just not-typed key/value)
  • Quite fast
  • No administration overhead
  • Supports transactions In my opinion

Amazon's Simple DB

  • Similar to Berkley DB In my opinion, but located

Google's Application Engine Datastore

  • Located and highly scalable
  • Per document key-value storage (i.e. flexible data model)


  • Document focus
  • Simple storage of semi-structured / document based data

Native language collections (saved in memory or serialised on disk)

  • Very tight language integration

Custom (hands-written) storage engine

  • Potentially high performance in needed uses cases

I can not claim that they can know anything much about the subject, but you could also prefer to consider object database systems.

Matt Sheppard's response is great (mod up), however i would take account these factors when considering a spindle:

  1. Structure : will it clearly enter pieces, or are you currently making tradeoffs?
  2. Usage : the way the information be examined/retrieved/grokked?
  3. Lifetime : how lengthy may be the data helpful?
  4. Size : just how much information is there?

A particular benefit of CSV files over RDBMSes is they could be simple to condense and move about to practically every other machine. We all do large data transfers, and everything's not so difficult we simply play one large CSV file, and simple to script using tools like rsync. To lessen repetition on large CSV files, you could utilize something similar to YAML. I am unsure I'd store anything like JSON or XML, unless of course you'd significant relationship needs.

So far as not-pointed out options, don't discount Hadoop, that is a wide open source implementation of MapReduce. This will work nicely for those who have a lot of loosely structured data that should be examined, and you need to maintain a predicament where one can just add 10 more machines to deal with information systems.

For instance, I began attempting to evaluate performance which was basically all timing amounts of various functions drenched across around 20 machines. After attempting to stick my way through a RDBMS, I recognized which i really don't have to query the information again once I have aggregated it. And, it is just helpful in it's aggregated format in my experience. So, I keep your log files around, compressed, by leaving the aggregated data inside a DB.

Note I am more accustomed to thinking with "large" dimensions.

The filesystem's prety handy for storing binary data, which never works wonderfully in relational databases.