I am searching at creating a system for controlling and confirming stats on web site performance. I'm going to be collecting much more stats than can be found in the conventional log formats (approximately 20 metrics) but in comparison to many kinds of database programs, the bottom data structure can be really simple. My issue is that I'm going to be accumulating lots of data - around 100,000 records (i.e. teams of metrics) each hour.

Obviously, assets are extremely limited!

To ensure that its likely to properly communicate with the information, I'd have to consolidate each metric into about a minute bins, divided by URL, then for anything further than one day old, consolidated into 10 minute bins, then at 7 days, hourly bins.

In the front-end, I wish to give a view (prefereably as plots) from the last hour of information, using the facility for customers to drill up/lower through defined hierarchies of Web addresses (that do not always map straight to the hierarchy expressed within the path from the URL) and also to view different time frames.

Instead of coding all of this myself and taking advantage of a relational database, I'm wondering if there have been tools available which may facilitate both control over the information and also the confirming.

I'd a glance at Mondrian however can't see in the documentation I have checked out whether you can drop the greater granular information while keeping the consolidated sights from the data.

RRDTool looks promising when it comes to controlling the information consolidation, but appears to become rather limited when it comes to querying the dataset like a multi-dimensional/relational database.

What else whould I be searching at?

In icCube it's pretty straightforward to obtain a time dimension with various granularity within the time (for a good example of "range/banded" dimension, you are able to take a look here). Then cubes could be constructed from CSV files. Its XMLA interface enables after this you to make use of any XMLA compliant confirming tool. Have you got a quote of methods large could be your typical datasets?

I'd only use a business-standard database.. Like SQL Server.. with Analysis Services on the top (should you get countless rows)

Mondrian requires you to definitely provide your personal dbms Mondrian plus PostgreSQL may be really worth trying. With your personal dbms, obviously, you are able to remove anything you want.

Still unsuccessful to locate anything appropriate :(

As I can sink data only at that rate in MySQL, it begins to obtain a little lumpy when I am attempting to consolidate it / remove old low-level data. And So I guess I am going to need to take a look at building the aggregation layer on the top from the DBMS and change to a noSQL system - and write everything myself :(

This Q&A rather old, but I have lately found a thing that appears like it matches my needs - Graphite. Still to obtain a set up ready to go - however it looks very promising.