I've got a table that consists of maybe 10k to 100k rows and that i need different sets as high as one or two 1000 rows, but frequently enough much less. I would like these queries to become as quickly as possible and I must know which approach is usually wiser:
- Always query for precisely the rows I want having a WHERE clause that's different constantly.
- Load the entire table right into a cache in memory within my application and check there, syncing the cache regularly
- Always query the entire table (without WHERE clause), allow the SQL server handle the cache (it certainly is exactly the same query therefore it can cache the end result) and filter the output when needed
Let me be agnostic of the specific DB engine for the time being.
with 10K to 100K rows, # 1 may be the obvious champion in my experience. Whether it was <1K I would say ensure that it stays cached within the application, however with this many rows, allow the DB do what it really was created to complete. Using the proper indexes, # 1 will be the best choice.
Should you be tugging exactly the same group of data again and again every time then caching the outcomes may well be a better wager too, however when you will possess a different where constantly, it might be better to allow the DB take proper care of it.
Like I stated though, just make certain you index well on all of the appropriate fields.
Appears in my experience that the system which was designed for rapid searching, slicing, and dicing of knowledge will probably be much faster in internet marketing compared to average developers' code. However, some factors you don't mention range from the location or potential location from the database server with regards to the applying - coming back large data sets over reduced systems would likely tip the scales in support of the "snap it up all and check in your area" option. I believe that, within the 'general' situation, I'd recommend querying just for what you would like, but that in special conditions, other available choices might be better.
I firmly believe option 1 ought to be preferred within an initial situation. Whenever you encounter performance problems, you are able to look how you can optimize it using caching. (Pre optimisation may be the cause of all evil, Dijkstra once stated).
Also, keep in mind that should you choose option 3, you will be delivering the entire table-contents within the network too. This comes with an effect on performance .
In my opinion it is advisable to query for what you would like and allow the database determine the easiest method to get it done. You can even examine the query plan to ascertain if you've any bottlenecks that may be assisted by indexes too.
To begin with, let's dismiss #2. Searching tables is data servers reason behind existence, and they'll probably perform a better job from it than any random search you prepare up.
For #3, you simply say 'filter the output when neededInch without having to say where that filter is been done. Whether it's within the application code as with #2, than, just like #2, than you will find the same issue as #2.
Databases were produced particularly additional exact problem. They're excellent in internet marketing. Allow them to get it done.
The only reason to make use of anything apart from option 1 is that if the
WHERE clause is huge (i.e. in case your
WHERE clause identifies each row individually, e.g.
WHERE id = 3 or id = 4 or id = 32 or ...).
Is other things altering your computer data? The purpose about letting the SQL engine brilliantly slice and dice is a great one. But it might be surprising should you be using a database and don't potentially have of "another person" altering the information. If changes can be created elsewhere, you wish to re-query frequently.