I'm thinking about monitoring some objects. I be prepared to get about 10000 data points every fifteen minutes. (Not in the beginning, but this is actually the 'general ballpark'). I'd like to have the ability to get daily, weekly, monthly and yearly statistics. It's not important to keep your data within the greatest resolution (fifteen minutes) in excess of two several weeks.
I'm thinking about other ways to keep this data, and also have been searching in a classic relational database, or in a schemaless database (for example SimpleDB).
My real question is, what's the easiest method to complement carrying this out? I'd greatly prefer a wide open-source (and free) means to fix a proprietary pricey one.
Small note: I'm penning this application in Python.
PyTables is made for coping with large data sets, also it built on the top from the HDF5 library which is made for this purpose. For instance, PyTables has automatic compression and supports Numpy.
RRDTool by Tobi Oetiker, certainly! It's open-source, it has been created for exactly such use cases.
Use a couple of highlights: RRDTool stores time-series data inside a round-robin database. It keeps raw data for any given time period, then condenses it inside a configurable way so you've fine-grained data say for any month, averaged data on the week going back 6 several weeks, and averaged data on the month going back 24 months. As an unwanted effect you database continues to be same size all the time (so no sweating you disk might run full). It was the storage side. Around the retrieval side RRDTool offers data queries which are immediately converted into graphs (e.g. png) that you could readily use in documents and webpages. It is a reliable, proven solution that's a significantly generalized form over its predecessor, MRTG (some might have come across this). And when you have in it, you will discover yourself re-utilizing it again and again again.
plain text files? It isn't obvious what your 10k data points per fifteen minutes means when it comes to bytes, but by any means text files are simpler to keepOrstoreOrmoveOradjust and you will inspect the directly, simply by searching at. simple enough to utilize Python, too.
This really is pretty standard data-warehousing stuff.
Plenty of "details", organized by a few dimensions, one of these 's time. Plenty of aggregation.
Oftentimes, simple flat files that you simply process with simple aggregation calculations according to
defaultdict works miracles -- easy and quick.