The suggested table structures are:-
data_table ->impressions ->clicks ->ctr
data_table_1 ->ctr data_table_2 ->impressions ->clicks
What queries are performed? You will find about 500 updates per second for that impressions. There's about 1 update for clicks every second. You will find about 500 updates per second for that ctr.
Now my application sorts the information while using ctr. The ctr may be the ctr that is exercised by
ctr = clicks/impressions. Now I've realized that unless of course there's a click update the ctr does not have to up-to-date as all of the impressions for content is being elevated that is lowering the ctr within the same relationship, so unless of course there's a click on the ctr need not be up-to-date.
Presently the update totally like "UPDATE data_table SET impressions = impressions + 1, ctr = clicks / impressions WHERE something = something
Which means that although 2 fields are up-to-date at the same time only one totally performed.
The bottleneck is the fact that these 500 updates about this leading to decelerate on chooses about this table. You will find about 20 chooses per second. And So I considered separating the tables. The brand new table style proposes the updates happen on the separate table and also the chooses happen on the separate table. The information table that consists of the impressions is up-to-date often so getting the updates for that impressions carried out onto it really accelerates the performance about this table. Which means that the chooses around the data_table_2 is going to be faster too and also the ctr could be up-to-date each time someone constitutes a click.
So, I simply desired to determine if I ought to make use of the new table structure or otherwise. What exactly are you suggestions? Benefits and drawbacks of my plans!
To begin with, I suppose the table is well indexed therefore the
something = something predicate will rapidly increase the risk for corresponding row, right?
Further presuming that the bottleneck is disk-throughput due to our prime update rate, how about not storing the ctr value whatsoever, as possible easily calculated quickly? Because you appear to become restricted to your update, only upgrading one area should roughly half the impact of needing to write data towards the disk. Given such scenario, in which the CPU is most likely relatively idle, calculating click/impressions for each result ought to be a non-problem. Your approach would repay (again presuming disk may be the restricting factor, which assume it's and may be discovered easily by searching at CPU utilization), your approach can give considerable benefits, iff the tables or on two different disks.
When the CPU works out to become the restricting factor, then it is most likely since the
something = something predicate is very complicated to judge by which situation simplifying this ought to be the primary concern, and never splitting the tables.
Maybe this isn't an immediate response to your question, however i think you need to be noted.
I believe you should look at using nosql databases like Redis, MemcacheDB, MongDB, CouchDB. Relatational DBMS aren't well appropriate with this type of use. For instance, any time you update any column (
UPDATE data_table SET impressions = impressions + 1) the caches are removed, and also the DB needs to hit the disk.
Other think you can look at is applying Memcache and bulk that data to disk after a little time period.
For instance, if you're able to manage to loose some impresions (keep in mind that memcache doesn't persist data) that you can do the impresions++ in memcache increase data within the DB every a few minutes. It might lower your load considerably.
I really hope it will help you.
Storing CTR may be beneficial, it's known as "Denormalization", and could operate in the application whether it's a frequently needed value.