I'm searching for outside assistance with someone with MySQL expertise. I do not need a precise solution - some ideas and places to search for optimizing.

A bit concerning the problem:

  • I have to place a lot of rows into an InnoDB tables.
  • Each table only has one index (also is the main key)
  • Each row has about 1KB of information inside it.
  • I'm using Load Data INFILE queries around 5000 rows at any given time.
  • I'm using 8 threads for that creates (each writing separate data ).

Ok, so with one of these qualities, I get a throughput close to a million rows each hour written towards the DB. This really is about 1 GB of information or ~300KB/sec, in line with the upper-finish of methods much information is consecutively.

However, after i take a look at my machine statistics, I spot the I/O graph for disk creates flatlines at about 20 Megabytes / sec, which suggest that i'm I/O bound. (The CPU graph also shoots to 100%, but about 90% of that's iowait). So, my real question is why would MySQL be covering 20 Megabytes / sec of information to disk, when the quantity of data sent through queries is all about 5 KB / sec.

I'm speculating the discrepancy is because of log files, temp tables and transaction doubling - however i am wondering why this ratio is near 100:1? And just how can shrink this ratio to some thing reasonable? What kind of internal variables are leading to MYSQL to create a lot data to disk rather than storing it in memory? For example, I've already set innodb_buffer_pool_size = 12G, max_heap_table_size = 8G and tmp_table_size = 6G so that they can make MySQL use more memory rather than disk - but nonetheless exactly the same result.

I appreciate any information you are able to produce!

Eight threads for documents may be way too high or way too low, based upon what your storage really appears like.

For those who have one spinning metal drive inside your computer, this really is far excessive -- your drive is going to be seeking all to perform creates. Play one thread.

For those who have fragmented your database tables over eight or even more SSD drives, this can be fine, but possibly more threads would allow you to take full advantage of the low "seek" latency. ("Seek" does not really affect more recent SSD products, but I am while using term by example with older drive technologies.)

My best guess is the fact that 90%+ of this time around is disk seeks.

Should you update a catalog along with a transaction log with every row, which situations are physically far from one another, it can lead to 2-3 seeks per write. With seek time about 10ms, it'll limit conntacting unimpressive ~33-50 rows per second. This mustn't be with 'load data' since it eliminates transactions, however it appears to update indexes still. When the tablespace is fragmented, results might be a whole lot worse. Several concurrent threads further worsen the problem.

Try crippling the index throughout loading. Use less threads, possibly just one.

Disclaimer: I'm not sure just how 'load data' works docs from mysql.com don't mention transactions whatsoever.