On sites like SO, I am sure it's essential to keep just as much aggregated data as you possibly can to prevent carrying out all individuals complex queries/information on every page load. For example, storing a running tally from the election count for every question/answer, or storing the amount of solutions for every question, or the amount of occasions an issue continues to be seen to ensure that these queries don't have to be carried out as frequently.

But does carrying this out not in favor of db normalization, or other standards/best-practices? And what's the easiest method to do that, e.g., should every table have another table for aggregated data, if it is saved within the same table it signifies, when if the aggregated data be up-to-date?


The saying to keep in mind is "Normalize till it affects, Denormalize till it really worksInch

This means: normalise all of your domain associations (to a minimum of Third Normal Form (3NF)). Should you measure there's deficiencies in performance, then investigate (and measure) whether denormalisation will give you performance benefits.

So, Yes. Storing aggregated data 'goes against' normalisation.

There's no 'one best way' to denormalise it is dependent your work using the data.

Denormalisation ought to be treated exactly the same way as premature optimisation: do not do it unless of course you've measured a performance problem.

Storing aggregated information is not itself a breach associated with a Normal Form. Normalization is worried just with redundancies because of functional dependencies, multi-valued dependencies and join dependencies. It does not cope with any other sorts of redundancy.

An excessive amount of normalization will hurt performance so within the real life are looking for balance.

I have handled a scenario such as this in 2 ways.

1) using DB2 I made use of a MQT (Materialized Query Table) that actually works just like a view only it's driven with a query and you will schedule how frequently you would like it to refresh e.g. every 5 min. Then that table saved the count values.

2) within the software program itself I set information like this like a system variable. So in Apache you are able to set a method wide variable and refresh it every a few minutes. Then it is somewhat accurate however your only running your "count(*)" query once every 5 minutes. You'll have a daemon run it or get it driven by page demands.

I made use of a wrapper class to get it done therefore it is been while however i think in PHP was was as easy as: $_SERVER['report_page_count'] = array('timeout'=>1234569783, 'count'=>15)

Nevertheless, nevertheless, you store that single value it helps you save from running it with every request.