We're creating a software that receives pre-calculated hour earnings around 100 data products per system which are sent about once daily. There can be about 20 clients with 5-50 systems. Therefore the theoretical maximum will roughly 100 * 24 * 20 * 50 = 2400000 rows placed daily.
It's very unlikely that you will see that lots of card inserts daily, but that's something which we have to bear in mind.
Can there be performance gain when we split database structure to ensure that each client may have it's own database like within the last picture? Within the shared database there'd be customers as well as their associations towards the databases.
Data will stored for around 2-three years after which system will instantly remove old data. Customers aren't removing "anything", within this context anything means data that's sent in the customer systems.
Within the images there's a cloud around server and database. To become more specific: that cloud is Microsoft Azure implementation of cloud computing.
If each client works only using their own data, and does not have to access other clients data, I believe some performance will be acquired because of the truth that table locks is only going to affect data of 1 customer, so for instance when customer A runs a cascade remove on the table, other clients will still have the ability to read and modify data in the same table within their particular databases. Without this type of split, table locks affect all.
That being stated, splitting the database can make administration (making backup copies, modifying the database structure, upgrading database addresses etc) more difficult and error-prone.
You might start with one database, holding all of the data. Then, should you discover clients frequently wait until other clients procedures finish, you are able to split the database should you correctly abstracted database access, no large alterations in code ought to be needed.
Remember, premature optimisation may be the cause of all evil!
There'd be performance grow in both reading through and writing of information, when the databases take presctiption different physical disks. If they're on a single disk/server, the performance gain is going to be they canrrrt bother. However, if you are using multiple servers, the key real question is are you able to query them in parallel? If you cannot, probably you will not enjoy the performance gain around you can.
Getting many card inserts is definitely an I/O bound operation,so you've to optimize the disk access. Splitting strain on different disk is the greatest way you could do this, but when you cannot, you'll still can improve performance:
- Make certain creates are append-only. In MySQL/InnoDB information is saved within the order of primary key, so use auto-increment to prevent random creates. In other RDBMs you are able to choose your cluster key, so choose sensibly
- If you're able to, save data on a single dist and bin logs on another disk - you'll effectively split the burden to two disks by doing this
- If you're able to split reading through and writing (master/slave replication), therefore the master could be busy just with writing
A much better, more general, option would be to operate an expert db and many slave (read-only, instantly stored in synch with master) dbs. Updates are delivered to the actual, but chooses are distributed along all dbs (since chooses can get exactly the same result wherever the totally run).
You will find many items which do this "as they areInch, both free and commercial.