I've got a client-server application which will get all of the data from a few tables, recalculates something and stores it.
The Items includes a 'Bill of Materials' = their email list and amounts which other products it's constructed of. Therefore to buy a item is the sum price of the products in the BOM * their amounts. Ultimately, some "base" products don't have any BOM and merely possess the cost set individually. (ie: recycleables)
ie: A's BOM states its constructed of 2xB and 3xC.
Things I do now, and that i don't remember why I actually do it such as this, is I recieve all of the products and all sorts of the BOMs from the DB, and choose the items at any given time calculating your buck recursively. After I calculate one item, I flag it and so i don't redo the price again. (also pads against infinite recursion)
Factor is, this really is kinda stupid: first, its slooow and gonna recalculate items that has not transformed, and worse, provide a DB large enough, and it'll exhaust memory.
Rather, I possibly could recalculate products when needed: when an Item's BOM changes, I recalculate that BOM, then Choose all of the BOMs that have this up-to-date Item, and recalculate them also repeat all over again recursively 'till you achieve the very best, where no BOM within the DB is dependent on the transformed products.
What this signifies used: say a few of the Products are recycleables, whose cost may be up-to-date frequently, plus some Products are "finish-user" stuff, whose BOM will rarely when change. Once the user changes the price of among individuals materials, it might mean going trough 1000's of Products, recalculating them. Say a Choose of just one Item/BOM takes 15ms (I am on Postgresql), then basically Choosing 1000 Products/BOMs will require just a few seconds, and you need to UPDATE the recalculated cost into the Item within the DB... oh dear, latency turns into minutes now.
The ERP software the organization Sometimes for uses takes the first approach: batch-recalculate the whole DB at the same time. This literally takes hrs, also it appears the issues happen to be accumulating with this particular approach, within the 10+ many years of usage. The batch-recalculation is performed weekly.
Since I have really "written this aloud", I do not think that it requires a couple of minutes matters an excessive amount of. However , I do not understand databases well, and I am worrying about concurrency: since it will require a very long time to update on Item A, chances are someone will update another Item B throughout time Item A has been up-to-date.
Say Item D is constructed from the A and B above. User 1 updates A, therefore the server software starts self pleasuring using the DB for a few minutes, eventually upgrading D. But meanwhile, User 2 updates B, therefore the server will ultimately update D again.
Will using Postgresql's transactions solve the issue? A transaction starts using the then-current condition from the DB, so Transaction 1 sees D being constructed of A1 and B2, and it is upgrading A from A1 to A2, before it finishes and commits, Transaction 2 will start, also seeing A1 and B2. T1 recalculates and commits, D = A2 + B2. But T2 has started, and does not begin to see the new A, A2. Therefore it then finally commits towards the DB that D = A1 + B2, that is incorrect. It ought to be D = A2 + B2.
Also, some processing will overlap, wasting server time.
Basically do T1 and T2 in sequence rather than in parallel, then hooray, the reply is correct, but User 2 will need to wait longer. Also, if several transactions don't have any regards to one another (completely independent... dependency trees ie: A=X+Y and B=N+M), then parallel computation can give the right answer And will also be faster for that user.
Important note: even if processing in sequence, I'd still use transactions, therefore the relaxation from the software can continue to use that data in parallel, aside from the function that recalculates cost.
Now, this complete "process-in-sequence" factor wouldn't be so bad if.... the DB latency would not be so "awful". If, say, the whole data could be locked in RAM, then going trough 1000 objects will be a breeze. Ah, but even when I develop a system to rapidly move portions of information to/from disk/RAM and perform some caching -to exchange the DB-, that will not do because I still need transactions to ensure that the relaxation from the server functionality can be employed in parallel. ('important note' above) So I'd finish up building another DB. May be a little faster, nevertheless its stupid/total waste of time.
The entire reason I "cache" the price of the items is to ensure that I do not recalculate it each time I personally use it, because it doesn't only waste limited assets, the DB latency is simply too large and concurrency issues scale a whole lot worse.
Now I want no surprise why "they" made it happen in large batches... this really is making my mind hurt.
Q1: How can everyone solve this within an "optimum" way?
From the current understanding (that's, after facing the concurrency problem which before I quietly overlooked), I'd make that function use transactions in sequence, and also the relaxation from the application will still have the ability to make use of the data in parallel, that we believe is the best for the consumer. This is the goal: perfect for the consumer, but guaranteed correctness for that system.
Maybe afterwards I possibly could throw hardware in internet marketing and employ software black miracle to lessen that latency, but I am starting to mislead myself now.
Also, previously handful of several weeks I have been completely unaware of several dead-apparent things (some not associated with programming), so I am expecting that somebody points out something shamefully apparent which i handled to overlook... :