There's been lots of talk associated with Cassandra recently.

Twitter, Digg, Facebook, etc all utilize it.

When will it seem sensible to:

  • use Cassandra,
  • not use Cassandra, and
  • make use of a RDMS rather than Cassandra.

The overall concept of NoSQL is you should use whichever data store is the greatest fit for the application. For those who have a table of monetary data, use SQL. For those who have objects that will require complex/slow queries to map to some relational schema, make use of an object or key/value store.

Obviously nearly any real life problem you take into is somewhere among individuals two extremes nor solution is going to be perfect. You have to think about the abilities of every store and also the effects of utilizing one within the other, which is greatly specific towards the problem you are attempting to resolve.

When looking for distributed data systems, you need to think about the CAP theorem - you are able to pick a couple of the next: consistency, availability, and partition tolerance.

Cassandra is definitely an available, partition-tolerant system that supports eventual consistency. To learn more see my Visual Help guide to NoSQL Systems.

Cassandra may be the response to a specific problem: Where do you turn if you have a lot data that it doesn't fit on a single server ? How can you store all of your data on many servers and don't break your money and never build your designers insane ? Facebook will get 4 Terabyte of recent compressed data Every Single Day. Which number probably will grow a lot more than two times inside a year.

If you don't have that much data or you have millions to cover Enterprise Oracle/DB2 cluster installation and specialists needed to put it together and keep it, then you're fine with SQL database.

another situation which makes the option simpler is when you wish to make use of aggregate function like sum, min, max, etcetera and sophisticated queries (as with the economic climate pointed out above) a relational database is most likely easier a nosql database since both of them are difficult on the nosql databse unless of course you utilize really lots of Inverted indexes. Whenever you use nosql you would need to perform the aggregate functions in code or store them seperatly in the own columnfamily but this causes it to be all quite complex and cuts down on the performance that you simply acquired by utilizing nosql.

Speaking with someone at the time of implementing Cassandra, it does not handle the numerous-to-many well. They're doing a hack job to complete their initial testing. I spoken with a Cassandra consultant relating to this and that he stated he wouldn't recommend it should you have had this issue set.

My understanding is you would use NoSQL when you've just got just one key-value pair. Meaning, your RDMS table would certainly be 2 posts (key, value).