This really is more a conceptual question. It's inspired by using some very large table where a simple query requires a very long time (correctly indexed). I'm wondering is there's a much better structure then just letting the table grow, constantly.
By large I am talking about 10,000,000+ records that develops every single day by something similar to 10,000/day. A table like this would hit 10,000,000 additional records every 2.many years. Allows state that newer records are accesses probably the most however the older ones have to remain available. I've two conceptual suggestions to speed up.
1) Conserve a master table that holds all of the data, listed in date backwards order. Produce a separate view for every year that holds just the data for your year. When querying, and allows the totally likely to pull merely a couple of records from the three year span, I possibly could make use of a union to mix the 3 sights and choose from individuals.
2) Another option is to produce a separate table for each year. Then, again utilizing a union to mix them when querying.
Does other people have other ideas or concepts? I understand this can be a problem Facebook has faced, how do we think they handled it? I doubt there is a single table (status_updates) that consists of 100,000,000,000 records.
The primary RDBMS companies all have similar concepts when it comes to partitioned tables and partitioned sights (in addition to combinations of these two)
There's one immediate benefit, for the reason that the information has become split across multiple conceptual tables, so any query which includes the partition key inside the query can instantly ignore any partition the key wouldn't be in.
From the RDBMS management perspective, getting the information split into seperate partitions enables procedures to become carried out in a partition level, backup / restore / indexing etc. This can help reduce downtimes in addition to permit far faster archiving just by getting rid of a whole partition at any given time.
You will find also non relational storage systems for example nosql, map reduce etc, but ultimately how it's used, loaded and information is aged be a driving element in your decision from the structure to make use of.
ten million rows isn't that large within the scale of huge systems, partitioned systems will hold vast amounts of rows.
Frequently the very best plan's to possess one table after which use database partioning.
Or archive data and make up a view for that aged and combined data and just the active data within the table most functions are referencing. You'll have to have a very good archiving stategy though (that is automated) or lose data or otherwise get things done effectively in moving them. This really is typically harder to keep.
Your next idea appears like partitioning.
I'm not sure how good it really works, but there's support for partition in MySQL -- see, in the manual : Chapter 17. Partitioning
What you are speaking about is horizontal partitioning or sharding.
There's good scalability method for this tables. Union is appropriate way, but there's better way.
In case your database engine supports "semantical partitioning", you'll be able to split one table into partitions. Each partition covers some subrange (say 1 partition each year). It won't affect anything in SQL syntax, except DDL. And engine will transparently run hidden union logic and partitioned index scans with all of parallel hardware it's (CPU, I/O, storage).
For instance Sybase enables as much as 255 partitions, because it is limit of union. But you won't ever need keyword "union" in queries.