I'm not sure if this sounds like the best place to request question such as this, but here it is going:

I've an intranet-like Rails 3 application controlling about 20k customers that are in nested-set (preordered tree - http://en.wikipedia.org/wiki/Nested_set_model). Individuals customers enter stats (data, simply number values). Joined stats are designated to category (it is called Pointer) along with a week number.

Individuals data are further processed and calculated to Results. Some are calculated from customers activity + derive from another category... etc. What user makes its way into is not always exactly the same what he sees in reviews.

Individuals computations can be quite tricky, some groups have very specific formulae.

However the relaxation is simply "produce amount of all joined values with this category with this user with this week/month/year".

Issue is that individuals stats needs and to be summed for any subset of customers under selected user (therefore it will essentially return amount of all values for those customers underneath the user, including self).

This application is within production for just two years which is doing its job pretty much... however with increasingly more customers it is also pretty slow if this involves server-costly reviews, like "produce listing of all customers under myself as well as their statistics. One line for summed by their sub-group and something line for his or her personal stats"). Obviously, customers wants (and requires) their reviews to become as actual as you possibly can, 5 mins to mirror recently joined information is an excessive amount of on their behalf. Which specific report is the favorite :/ To remain realtime, we can't perform the high-intensive sqls directly... That will get rid of the server. So I am computing them only one time via background process and frontend just reads the outcomes. Individuals sqls are difficult to optimize and I am glad I have moved out of this approach... (caching isn't a choice. See below.)

Current application goes such as this:

  • frontend: when user makes its way into new data, it's saved to simple mysql table, like [user_id, pointer_id, date, value] and there's also place towards the queue.

  • after sales: then there's calc_daemon process, which each and every 5 seconds inspections the queue for brand new "recompute demands". We pop the demands, figure out what else must be recomputed together with it (pointers have dependencies... easiest situation is: whenever you change week stats, we should recompute month and year stats...). It will this recomputation the easiest way.. we choose the information by personalized per-pointer-different sqls produced by their classes.

  • individuals calculated answers are then written to mysql, but to partitioned tables (one table each year). One line within this table is much like [user_id, pointer_id, month_value, w1_value, w2_value, w3_value, w4_value]. By doing this, the tables have ~500k records (I have essentially reduced 5x # of records).
  • when frontend needs individuals results it will simple sums on individuals partitioned data, with 2 joins (due to the nested set conds).

However , individuals simple sqls with sums, group by and join-on-the-subtree may take like 200ms each... for one couple of records.. and we have to run many of these sqls... I believe they're enhanced the very best they are able to, based on explain... but they're too hard for this.


Can One rewrite this to make use of Redis (or any other fast key-value store) and find out any take advantage of it when I am using Ruby and Rails? When I view it, if I'll rewrite it to make use of redis, I'm going to run a lot more queries against it than I must with mysql, after which carry out the sum in ruby by hand... therefore the performance could be hurt substantially... I am not necessarily confident that I possibly could write all of the possible queries I've with redis... Loading the customers in rails after which doing something similar to "redis, produce sum for customers 1,2,3,4,5..." does not appear like right idea... But maybe there's some feature in redis that may get this to simpler?)... Even the tree structure must be like nested set, i.e. it can't have one entry in redis with listing of all child-ids for many user (something similar to children_for_user_10: [1,2,3]) since the tree structure changes frequently... That's also exactly why I can not have individuals sums in individuals partitioned tables, because once the tree changes, I would need to recompute everything.. This is exactly why I perform individuals sums realtime.)

Or can you suggest me to rewrite this application to various language (java?) and also to compute the outcomes in memory rather? :) (I have attempted to get it done SOA-way however it unsuccessful on which i finish up some way with XXX mb of information in ruby... particularly when producing the reviews... and gc just kills it...) (and an unwanted effect is that certain producing report blocks the entire rails application :/ )

Suggestions are welcome.