I am using a database schema that's having scalability issues. Among the tables within the schema is continuing to grow close to ten million rows, and i'm exploring sharding and partitioning choices to allow this schema to scale to much bigger datasets (say, 1 billion to 100 billion rows). Our application should also be deployable onto several database items, including although not restricted to Oracle, MS SQL Server, and MySQL.
This can be a large condition in general, and Let me educate yourself on which choices are available. What assets are available (books, whitepapers, internet sites) for database sharding and partitioning methods?
To be sure using the other solutions that you should think about your schema and indexes before turning to sharding. ten million rows is within the abilities of the major database engines.
However if you would like some assets for learning regarding the subject of sharding then try these:
ten million rows is actually not large in DBMS terms and I'd be searching first inside my indexing and query plans before beginning to organize an actual distribution of information with shards or partitions, which should not be necessary until your table's grown by a few orders of magnitude.
All IMHO, obviously.
To be sure with Mike Woodhouse's observation the current size shouldn't be an problem - and also the questioner concurs.
The majority of the commercial DBMS provide support for fragmented tables in certain for or any other, under one title or several others. Among the key questions is whether or not there's a smart method of splitting the information into fragments. One common strategy is to do this with different date, so that all the values for, say, November 2008 use one fragment, individuals for October 2008 into another, and so forth. It has advantages when the time comes to get rid of old data. You are able to most likely drop the fragment that contains data from October 2001 (seven years data retention) without having affected another fragments. This kind of fragmentation will also help with 'fragment elimination' when the query clearly cannot have to browse the data from the given fragment, then it will likely be left unread, which can provide you with an impressive performance benefit. (For instance, when the optimizer recognizes that the totally for any date in October 2008, it'll ignore all fragments except the one which consists of the information from October 2008.)
You will find other fragmentation techniques - round robin distributes the burden across multiple disks, but means you can't take advantage of fragment elimination.
In my opinion, large tables always hit yourself on the I/O side. The least expensive option would be to include enough multi-column indexes to ensure that all of your queries could possibly get the information from index, without needing to load the primary data pages. This will make your card inserts and updates more I/O intensive, but this might be OK. The following easy option it max out RAM inside your server. Pointless to possess under 32GB in case your database is large. But in the finish you still end up I/O bound, and you will be searching at purchasing lots of hard disk drives and looking after an intricate partitioning plan, that amounted to a lot of money between hardware and labor. Hopefully there's a much better alternative nowadays - slowly move the database from spinning hard disk drives to SLC solid condition drives - this will build your random reads and creates one hundred occasions faster than top quality SAS drives, and take away the I/O bottleneck. SSDs start at $10 per gigabyte, so you are likely to spend a couple of grand but it is still less expensive than SANs, etc.