Although I have not used the new NoSQL databases I have attempted to help keep myself informed by reading through Wikipedia articles, blogs and also the peeking into a few of the NoSQL DBs documentation.

I have just (re)browse the August 2009 edition of phparchitect, particularly the content concerning the Non-Relation Databases along with a couple of questions put their hands up during my mind, I realize the article is fairly light about them however it was enough to obtain me confused...

CouchDB

My primary question regarding CouchDB is why a lot hype?. From things i understood CouchDB supplies a Web Service that allows you create databases and documents within the database, the documents might have several JSON-encoded characteristics this will let you special _id and _rev attribute for monitoring revisions from the document.

I truly do not get all of the fuss relating to this, some years back for any pet project I coded an identical (?) system for storing documents and also the structure was something similar to this:

documents/
  document-name/
    (revision) timestamp/
      (contents) md5-hash.txt
        PHP Serialized Data

I am sure I am missing something very fundamental, otherwise (in the point of view of the PHP developer) this could have a similar benefits as CouchDB and become faster - you don't need to scribe and decode JSON.


Amazon . com SimpleDB

Now that one really will get my mind spinning... The writer (Russell Cruz) provides the following example:

$sdb->putAttributes('phparch', 'may', array('title' => array('value' => 'May 2009'), 'have' => array('value' => false)));
$sdb->putAttributes('phparch', 'june', array('title' => array('value' => 'June 2009'), 'have' => array('value' => true)));
$sdb->putAttributes('phparch', 'july', array('title' => array('value' => 'July 2009'), 'have' => array('value' => true)));

Then he states that Amazon . com now supports a SQL-like interface after which executes the next query:

$sdb->select('phparch', 'SELECT * FROM phparch WHERE have = "1"');

He does not give any similar illustration of how to achieve that query in CouchDB (he leaves some hints on Sights and Map/Reduce however) however i suppose it's also possible, so my real question is: so how exactly does Amazon . com (and CouchDB) get it done?

My first guess is they open all documents (in possible inside a distributed atmosphere) after which use a reduce operation to filter the documents whose characteristics don't match the search criteria, but wouldn't this be excessively costly (CPU and Disk I/O) even just in parallel computing?


I understand I am disregarding some important things like distribution, consistency and so forth but I am just attempting to hold the very fundamental inner workings of NoSQL storages.

PS: Also, can anybody explain me why both CouchDB and Amazon . com SimpleDB are made with Erlang?

the fuss around nosql is lower to indexing, availability, and scalability. indexing is exactly what enables the document-oriented stores not to open all documents if you wish to obtain the ones where have = 1. availablity and scalability allow scalping strategies to simply scale out and become robust when confronted with hard to rely on hardware.

erlang is made for multi-processor systems and thus is a perfect fit for distributed systems too.