I've got a table where whenever a row is produced, it will likely be active for twenty-four hrs with a few creates and a lot of reads. It becomes inactive after 24 hrs and can don't have any more creates and just some reads, if any. Is it more beneficial to help keep these rows within the table or move them once they become inactive (or via batch jobs) to some separate table? Thinking when it comes to performance.
This is dependent largely how large your table can get, but when it develops forever, and it has a substantial quantity of rows daily, then there's a strong possibility that moving old data to a different table will be a wise decision. You will find a couple of various ways you can make this happen, and that is best is dependent in your application and data access designs.
Basically while you stated, whenever a row becomes "old", Place towards the archive table, and Remove in the current table.
Produce a new table every single day (or possibly each week, or each month, for the way large your dataset is), rather than be worried about moving old rows. You'll have to query old tables when being able to access old data, as well as the present day, you simply ever access the present table.
Possess a "today" table along with a "in historyInch table. Duplicate the "today" rows both in tables, keeping them synchronized with triggers or any other systems. Whenever a row becomes old, simply remove in the "today" table, departing the "in historyInch row in tact.
One benefit to #2, that won't be immediately apparent, is the fact that In my opinion MySQL indexes could be enhanced for read-only tables. So by getting old tables which are never written to, you are able to employ this extra optimisation.
Generally moving rows between tables in proper RDBMS shouldn't be necessary.
I am unfamiliar with mysql specifics, but you want to do fine using the following:
- Make certain your timestamp column is indexed
- Additionally, you should use
active BOOLEAN default truecolumn
- Create a batch run every single day to mark >24h old rows inactive
- Make use of a partial index for timestamp column so only rows marked active are indexed
- Make sure to have timestamp and active = TRUE inside your where conditions hitting indexes. Use EXPLAIN a great deal.
That is dependent around the balance between easy programming, and gratifaction. Performance smart, yes it'll certainly be faster. But if the speed increase may be worth your time and effort is difficult to express.
I have done systems running perfectly fine with countless rows. However, when the information is continuously growing it will eventually be an issue.
I have done a database storing transaction logging for automated equipment. It creates 100s of 1000's of occasions daily. Following a year, the queries just wouldn't run at acceptable speeds anymore. We currently keep your last month's price of logs within the primary table (countless rows still), and move older data to archive tables.
No application's functionality ever looks within the archive table (should you perform a query from the transaction log, it'll return no results). It is simply really stored for emergency use, and it is just queried with any stand alone database query tool. Since the archive has more than 100 million rows, and also the character of the emergency me is generally unplannable (and for that reason mostly not-indexed) queries, they are able to have a very long time to operate.