I've got a single large table which I must optimize. I am using MS-SQL 2005 server. I'll attempt to describe how it's used and when anybody has any suggestions I'd be thankful greatly.
The table is all about 400GB, has 100 million rows and a million rows are placed every day. The table has 8 posts, 1 data col and 7 posts employed for searches/ordering.
k1 k2 k3 k4 k5 k6 k7 d1
k1: varchar(3), primary key - clustered index, 10 possible values k2: bigint, primary key - clustered index, total rows/10 possible values k3: int, 10 possible values k4: money, 100 possible values k5: bool k6: bool k7: DateTime
Just one choose totally run which appears like this:
SELECT TOP(g) d1 FROM table WITH(NOLOCK) WHERE k1 = a AND k3 = c AND k4 = d AND k5 = e AND k6 = f ORDER BY k7
where g = circa a million This question us went about 10 occasions daily (frequently while card inserts are happening) and takes about 5-half an hour.
And So I presently have only a clustered index around the two primary key posts. My real question is: what indexes must i increase improve this query's performance?
Would separate indexes on every column be the ideal choice? I believe just one index would occupy about 5-8GB. The DB server has 8GB RAM total.
Please not state that the very best factor would be to experiment. This really is similar to 'I have no idea, settle your differences your self' :)
Any tips much appreciated!
EDIT by doofledorfer--
You've triggered an episode of premature optimisation here, otherwise outright suggestions that "the very best factor would be to experiment". You have to clarify numerous issues if you would like helpful help.
EDIT: Comments on posts up to now are actually published below together with query plan - Mr. Flibble
When I suggested inside a comment, I've carried this out having a single Oracle table approaching 8 TB composed well over two billion rows growing in the rate of forty million rows daily. However, during my situation, the customers were 2 million (and growing) clients being able to access this data over the internet, 24x7, and literally The rows was susceptible to being utilized. Oh, and new rows needed to be added in within two minutes of real-time.
You're most likely I/O bound, not CPU or memory bound, so optimizing the disk access is crucial. Your RAM is okay--a lot more than sufficient. Using multiple cores could be useful, but limited when the I/O isn't parallelized.
Several individuals have recommended separating the information, that ought to be studied seriously as it is much better and much more effective than every other solution (there is nothing faster these days touching the information whatsoever).
You say you cannot split the information because all of the information is used: IMPOSSIBLE! There's not a way that the customers are paging through a million rows daily a treadmill hundred million rows total. So, become familiar with the way your customers are Really while using data--take a look at every query within this situation.
More to the point, we're not saying that you ought to Remove the information, we're telling SPLIT the information. Clone the table structure into multiple, similarly-named tables, most likely according to time (30 days per table, possibly). Copy the information in to the relevant tables and remove the initial table. Produce a view that works a union within the new tables, with similar title because the original table. Improve your place processing to focus on the latest table (presuming that it's appropriate), as well as your queries should still prevent the brand new view.
Your savvy customers are now able to begin to problem their queries against a subset from the tables, possibly the latest one only. Your unsavvy customers could make use of the view total the tables.
You have an information management strategy as archiving the earliest table and removing it (update the vista definition, obviously). Likewise, you will have to produce a new table periodically increase the vista definition for your finish from the data too.
Be prepared to not have the ability to use unique indexes: they do not scale beyond about one-to-2 million rows. You may even need to modify another tactics/advice too. At a hundred million rows and 400 GB, you've joined another arena of processing.
Beyond that, make use of the other suggestions--evaluate the particular performance while using many tools already obtainable in SQL Server and also the OS. Apply the numerous well-known tuning techniques which are readily available on the internet or perhaps in books.
However, don't experiment! With this much data, you do not have time for experiments and also the risk is simply too great. Study carefully the accessible techniques as well as your actual performance particulars, then pick one step at any given time and provide each one of these a couple of hrs to days to show its impact.