I have a database accustomed to store products and qualities about these products. The amount of qualities is extensible, thus there's a join table to keep each property connected for an item value.

CREATE TABLE `item_property` (
    `property_id` int(11) NOT NULL,
    `item_id` int(11) NOT NULL,
    `value` double NOT NULL,
    PRIMARY KEY  (`property_id`,`item_id`),
    KEY `item_id` (`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

This database has two goals : storing (that has first priority and needs to be extremely swift, I must perform many card inserts (100s) in couple of seconds), locating data (chooses using item_id and property_id) (this can be a second priority, it may be reduced but little as this would ruin my use of the DB).

Presently this table hosts 1.6 billions records and a straightforward count can require 2 minutes... Placing is not fast enough to become functional.

I am using Zend_Db to gain access to my data and would actually be at liberty should you don't suggest me to build up any php side part.

Interesting advices !

If you cannot choose solutions using different database management systems or partitioning on the cluster for many reasons, you will find still three primary steps you can take to significantly enhance your performance (plus they work in conjunction with groupings too obviously):

  • Setup the MyISAM-storage engine
  • Use "LOAD DATA INFILE filename INTO TABLE tablename"
  • Split your computer data over several tables

There you have it. Browse the relaxation only when you are thinking about the particulars :)

Still reading through? OK then, here goes: MyISAM may be the corner stone, becasue it is the quickest engine undoubtedly. Rather than placing data rows using regular SQL-claims you need to batch them up inside a file and insert that file at regular times (as frequently since you need to, but as rarely as the application enables might be best). That method for you to place within the order of the million rows each minute.

The following factor which will limit you is the secrets/indexes. When individuals cant fit with you (because they are only to large) you will experience an enormous downturn both in card inserts and queries. This is exactly why you split the information over several tables, all with similar schema. Every table ought to be as large as you possibly can, without clogging your gutters memory when loaded individually. The precise size is dependent in your machine and indexes obviously, but ought to be approximately 5 and 50 million rows/table. You will find this should you simply appraise the time come to place an enormous couple of rows to another, searching for as soon as it slows lower considerably. When you are aware the limit, produce a new table quickly each time your last table will get near to to limit.

The result of the multitable-option would be that you will need to query all of your tables rather than just just a single one when you really need some data, that will slow your queries lower a little (but little should you "only" possess a billion approximately rows). Clearly you will find optimizations to complete because well. If there is something fundamental you could utilize to split up the information (like date, client or something like that) you can split it into different tables with a couple structured pattern that allows you realize where certain kinds of data are even without querying the tables. Use that understanding to simply query the tables that may retain the asked for data etc.

If you want much more tuning, choose partitioning, as recommended by Eineki and oedo.

Also, so you know all this is not wild speculation: I am doing a bit of scalability tests such as this by ourselves data right now which approach does miracles for all of us. We are controlling to place hundreds of countless rows every single day and queries takes ~100 ms.

To begin with avoid using InnoDb while you don't appear to require its principal feature over MyISAM (securing, transaction etc..). So use MyISAM, it'll already have difference. Then if that is still not fast enough, enter into some indexing, however, you should already visit a radical difference.