I'm searching at storing some JMX data from JVMs on many servers for around 3 months. This data could be statistics like heap size and thread-count. This can mean that certain from the tables may have around 388 million records.

Out of this data I'm building some graphs so that you can compare the stats retrieved in the Mbeans. What this means is I'll be getting some data in an interval using timestamps.

Therefore the question for you is, Can there be anyway to optimize the table or query so that you can perform these queries inside a reasonable period of time?



You will find several steps you can take:

  1. Construct your indexes to complement the queries you're running. Run EXPLAIN to determine the kinds of queries which are run and make certain they all make use of an index where possible.

  2. Partition your table. Paritioning is a procedure for splitting a sizable table into several more compact ones with a specific (aggregate) key. MySQL supports this internally from ver. 5.1.

  3. If required, build summary tables that cache the more expensive areas of your queries. Then run your queries from the summary tables. Similarly, temporary in-memory tables may be used to store a simplified look at your table like a pre-processing stage.

Well, to begin with, I recommend you utilize "offline" processing to create 'graph ready' data (for the majority of the common cases) instead of attempting to query the raw data when needed.

3 suggestions:

  1. index
  2. index
  3. index

p.s. for timestamps you might encounter performance issues -- for the way MySQL handles DATETIME and TIMESTAMP internally, it might be easier to store timestamps as integers. (# secs since 1970 or whatever)

If you work with MYSQL 5.1 you should use the brand new features. but be cautioned they contain large amount of bugs.

first you need to use indexes. if this isn't enough you can test to separate the tables by utilizing partitioning.

if the also wont work, you may also try load balancing.