So I've got a large table with a little a lot more than 2 billion records, and 5 multi-column secrets.

You will find two techniques I'm able to use for placing data:

Method 1

load data infile ...;

Method 2

alter table disable keys;
load data infile ...;
alter table disable keys;

If I am beginning from a clear table, for just two billion records, method 1 takes about 60 hrs (believed, might be more), while method 2 takes 12 hrs to place the information, and three hrs re-creating the secrets. To date so great.

However, basically curently have my 2 billion records, and try to place one more 5 million, method 1 takes about 3 hrs, while method 2 takes half an hour placing the information, along with a whopping 7 hrs re-creating the secrets. I confirmed that throughout the whole key regrowth, it used Repair by sorting, therefore it is nothing like it fell to Repair with keycache.

I question why this really is. MySQL claims that crippling secrets is excellent for placing bulk data, but this really is clearly determined by the context. If it's going to regenerate all secrets on your own, why does not it take around 3 hrs, as after i began by having an empty table? or maybe it card inserts secrets 1 by 1, why does not it take around 3 hrs, that is what it really required for method 1?

Surveys are welcome

If you are dealing with vast amounts of records, and taking advantage of MySQL 5.1 or over, then you will probably find partitioning may benefit performance... whenever using indexes inside a partitioned table, indexes will also be partitioned and since each index is just built against a partitiion/subset of the total data, the sorting expenses of repairing ought to be considerably less.

"a lot slower as guaranteed" - uh, you've 5000000 records, obviously it will require a little more than placing 20 records.

  • Using the first method, it's altering the indexes a bit on every row place, so that they will always be in conjuction with the data.
  • Using the second method, it's repairing the indexes by sorting the entire table (2005000000 rows) - meaning it's moving a lot of existing index data back and forth (disk speed is really a prone to emerge like a bottleneck here), which is dependent on 1) quantity of existing data, and a pair of) quantity of new data.
  • You could utilize method 3: drop secrets prior to the second place (this might take a while, too) and recreate them later on. I suspect time is going to be much like re-creating the secrets following the initial place

The speeds you're explaining are very reasonable IMHO - only use the quickest method.