I'm evaluating what may be the best migration option.
Presently, i'm on the sharded mysql (horizontal partition), with many of my data saved in json blobs. I have no complex SQL queries( already migrated away after since i have partitioned my db)
At this time, it appears like both Mongodb and Cassandra could be likely options. My situation
- plenty of reads in each and every query, less regular creates
- not concerned about "massive" scalability
- more worried about simple setup, maintenance and code
- minimize hardware/server cost
Plenty of reads in each and every query, less regular creates
Both databases succeed on reads in which the hot data set matches memory. Both also stress join-less data models (and encourage denormalization rather), and both provide indexes on documents or rows, although Mongo's indexes are presently more flexible.
Cassandra's storage engine provides constant-time creates regardless of how large your computer data set develops. Creates tend to be more problematic in MongoDB, partially due to the b-tree based storage engine, but more due to the global write lock.
For statistics, MongoDB supplies a custom map/reduce implementation Cassandra provides native Hadoop support, including for Hive (a SQL data warehouse built on Hadoop map/reduce) and Pig (a Hadoop-specific analysis language that lots of think is really a better fit for map/reduce workloads than SQL).
Not concerned about "massive" scalability
If you are searching in a single server, MongoDB is most likely a much better fit. For individuals more worried about scaling, Cassandra's no-single-point-of-failure architecture is going to be simpler to setup and much more reliable. (MongoDB's global write lock has a tendency to be painful, too.) Cassandra also gives much more treatments for the way your replication works, including support for multiple data centers.
More worried about simple setup, maintenance and code
Both of them are trivial to setup, with reasonable out-of-the-box defaults for any single server. Cassandra now is easier to setup inside a multi-server configuration since you will find no special-role nodes to bother with this is a screencast showing establishing a 4-node Cassandra cluster in 2 minutes.
If you are presently using JSON blobs, mongo is definitely an insanely good match to use situation, trained with uses BSON to keep the information. You'll have the ability to have more potent and much more queryable data than you'd inside your present database. This is the most important win for Mongo.
I have used MongoDB extensively (within the last 6 several weeks), creating a hierarchical data management system, and that i can attest to both the simplicity of setup (do the installation, run it, utilize it!) and also the speed. As lengthy while you consider indexes carefully, it may absolutely scream along, speed-smart.
I gather that Cassandra, because of its use with large-scale projects like Twitter, has better scaling functionality, even though MongoDB team is focusing on parity there. I ought to explain that I have not used Cassandra past the trial-run stage, and so i can't speak for that detail.
The actual swinger for me personally, whenever we were assessing NoSQL databases, was the querying - Cassandra is essentially only a giant key/value store, and querying is a little tricky (a minimum of in comparison to MongoDB), so for performance you'd need to duplicate a great deal of data like a kind of manual index. MongoDB, however, utilizes a "query by example" model.
For instance, say there is a Collection (MongoDB parlance for that equal to a RDMS table) that contains Customers. MongoDB stores records as Documents, that are essentially binary JSON objects. e.g:
Cruz", Email: "email@example.com", Groups: ["Admin", "User", "SuperUser"]
Cruz", Groups: "Admin"
...after which run the query. There you have it. You will find added operators for evaluations, RegEx blocking etc, but it is all really quite simple, and also the Wiki-based documentation is fairly good.