Which is better, concerning the implementation of the database for any web application: a lean and incredibly small database with just the bare information, on the sides having a application that "recalculates" all of the secondary information, when needed, according to individuals fundamental ones, OR, a database full of all individuals secondary information already formerly calculated, but possibly outdated?

Clearly, there's a trade-of there and i believe that anybody would state that the very best response to this is: "is dependent" or "is really a mix between your two". But I am not really to comfortable or experienced enough to reason alone relating to this subject. Could someone share some ideas?

Also, another different question: Should a database function as the "snapshot" of the particular instant or should a database accumulate all the details from previous time, permitting the retrace of the items happened? For example, let us state that I am modeling a Banking Account. Must I only keep your a person's balance tomorrow, or must i keep all of the a person's transactions, and from individuals transactions infer the total amount?

Any pointer about this type of stuff that's, in some way, more deep in database design?


My quick answer is always to store my way through the database. The price of storage is way less than the price of processing when speaking about very massive programs. On small-scale programs, the information could be much less, so storage would be a suitable solution.

Most RDMSes are very proficient at handling huge levels of data, then when you will find millions/billions of records, the information can nonetheless be removed relatively rapidly, which can not be stated about processing the information by hand every time.

If you opt to calculate data instead of store it, the processing time does not increase in the same rate as how big data does - the greater data ~ the greater customers. This could generally imply that processing occasions would multiply through the data's size and the amount of customers.

processing_time = data_size * num_users

To reply to your other question, It might be best practice introducing a "snapshot" of the particular moment only if data comes down to such unparalleled combination that processing time is going to be significant.

When calculating a large amount, for example bank balances, it might be sound practice to keep caused by any heavy information, together with their date stamp, towards the database. This could simply mean that they'll not require calculating again until it might be outdated.

I'd say begin just monitoring the information you'll need and carry out the information quickly, but through the design process and well in to the test/manufacture of the program bear in mind you will probably have to change to storing the pre-calculated values sooner or later. Design having the ability to proceed to that model when the need arises.

Adding the pre-calculated values is just one of individuals stuff that sounds good (because oftentimes it is good) but is probably not needed. Keep your design as easy as it must be. If performance becomes an problem in doing the information quickly, you'll be able to add fields towards the database to keep the information and operate a batch overnight to trap up and complete the legacy data.

For the banking metaphor, certainly store an entire record of transactions. Store data that's relevant. A database ought to be an outlet of information, past and offer. Audit trails, etc. The "current condition" may either be calculated quickly or it may be maintained inside a flat table and re-calculated throughout creates with other tables (triggers are great for your kind of factor) if performance demands it.

It is dependent :) Persisting derived data within the database could be helpful since it allows you to definitely implement constraints along with other logic against it. Therefore it may be indexed or else you may have the ability to place the information inside a view. Regardless, attempt to stay with Boyce-Codd / fifth Normal Form like a guide for the database design. Unlike that which you may sometimes hear, normalization does not mean you can't store derived data - it simply means data should not be based on nonkey characteristics within the same table.

Essentially any database is an eye on the known details at a certain point over time. Most databases include a while component plus some information is maintained whereas some isn't - needs should dictate this.