Although I have not used the new NoSQL databases I have attempted to help keep myself informed by reading through Wikipedia articles, blogs and also the peeking into a few of the NoSQL DBs documentation.
I have just (re)browse the August 2009 edition of phparchitect, particularly the content concerning the Non-Relation Databases along with a couple of questions put their hands up during my mind, I realize the article is fairly light about them however it was enough to obtain me confused...
My primary question regarding CouchDB is why a lot hype?. From things i understood CouchDB supplies a Web Service that allows you create databases and documents within the database, the documents might have several JSON-encoded characteristics this will let you special
_rev attribute for monitoring revisions from the document.
I truly do not get all of the fuss relating to this, some years back for any pet project I coded an identical (?) system for storing documents and also the structure was something similar to this:
documents/ document-name/ (revision) timestamp/ (contents) md5-hash.txt PHP Serialized Data
I am sure I am missing something very fundamental, otherwise (in the point of view of the PHP developer) this could have a similar benefits as CouchDB and become faster - you don't need to scribe and decode JSON.
Amazon . com SimpleDB
Now that one really will get my mind spinning... The writer (Russell Cruz) provides the following example:
$sdb->putAttributes('phparch', 'may', array('title' => array('value' => 'May 2009'), 'have' => array('value' => false))); $sdb->putAttributes('phparch', 'june', array('title' => array('value' => 'June 2009'), 'have' => array('value' => true))); $sdb->putAttributes('phparch', 'july', array('title' => array('value' => 'July 2009'), 'have' => array('value' => true)));
Then he states that Amazon . com now supports a SQL-like interface after which executes the next query:
$sdb->select('phparch', 'SELECT * FROM phparch WHERE have = "1"');
He does not give any similar illustration of how to achieve that query in CouchDB (he leaves some hints on Sights and Map/Reduce however) however i suppose it's also possible, so my real question is: so how exactly does Amazon . com (and CouchDB) get it done?
My first guess is they open all documents (in possible inside a distributed atmosphere) after which use a reduce operation to filter the documents whose characteristics
don't match the search criteria, but wouldn't this be excessively costly (CPU and Disk I/O) even just in parallel computing?
I understand I am disregarding some important things like distribution, consistency and so forth but I am just attempting to hold the very fundamental inner workings of NoSQL storages.
PS: Also, can anybody explain me why both CouchDB and Amazon . com SimpleDB are made with Erlang?
the fuss around nosql is lower to indexing, availability, and scalability. indexing is exactly what enables the document-oriented stores not to open all documents if you wish to obtain the ones where have = 1. availablity and scalability allow scalping strategies to simply scale out and become robust when confronted with hard to rely on hardware.
erlang is made for multi-processor systems and thus is a perfect fit for distributed systems too.