I believed of utilizing a database like mongodb or ravendb to keep lots of stock tick data and desired to determine if this is viable in comparison to some standard relational for example Sql Server.
The information wouldn't be relational and will be a handful of huge tables. I had been also convinced that I possibly could sum/min/max rows of information by minute/hour/day/week/month etc for faster information.
Example data: 500 symbols * 60 min * 60sec * 300 days... (per record we store: date, open, high,low,close, volume, openint - all decimal/float)
What exactly do everyone think?
The solution here will rely on scope.
MongoDB is fantastic way to obtain the data "in" and it is fast at querying individual pieces. It is also nice because it is created to scale flat.
However, what you will need to remember is the fact that all your significant "queries" are really likely to derive from "batch job output".
For example, Gilt Groupe has produced a method known as Hummingbird they use legitimate-time statistics on their own site. Presentation here. They are essentially dynamically rendering pages according to collected performance data in tight times (fifteen minutes).
Within their situation, there is a simple cycle: publish data to mongo -> run map-reduce -> push data to webs legitimate-time optimisation -> rinse / repeat.
This really is honestly pretty near to that which you most likely wish to accomplish. However, you will find some restrictions here:
- Map-reduce is totally new to a lot of people. If you are acquainted with SQL, you will need to accept the training curve of Map-reduce.
- If you are moving in several data, your map-reduces will be reduced on individuals boxes. You'll most likely want to check out toiling / replica pairs if response occasions really are a large deal.
However, you'll encounter different variants of those issues with SQL.
Obviously you will find some benefits here:
- Horizontal scalability. For those who have plenty of boxes you'll be able to shard them and obtain somewhat linear performance increases on Map/Reduce jobs (that's the way they work). Building this type of "cluster" with SQL databases is much more pricey and costly.
- Really fast speed so that as with point #1, you receive a chance to add RAM flat to maintain the rate.
As pointed out by others though, you are likely to lose use of ETL along with other common analysis tools. You'll certainly be responsible to create lots of your personal analysis tools.
Here's my reservation using the idea - and I am likely to freely acknowledge that my working understanding of document databases is weak. I’m presuming you would like all this data saved to ensure that you are able to carry out some aggregation/trend based analysis onto it.
If you are using a document based db to become your source, the loading and manipulation of every row of information (CRUD procedures) really is easy. Extremely powerful, very easy, essentially lovely.
What sucks is the fact that you will find very couple of (if any) choices to pull this data out and cram it right into a structure more prone for statical analysis (columnar database, cube, etc). Should you load it right into a fundamental relational database you will find a number of tools (both comercial and free - http://www.pentaho.com/) which will accommodate the ETL and analysis very nicely.
Ultimately though, what you would like to bear in mind, is the fact that every financial firm on the planet includes a stock-analysis/auto-trader application, they simply triggered a significant US Stock tumble and they're not toys. :)