To begin, a couple of particulars to explain the problem in general:

  • MySQL (5.1.50) database on the very beefy (32 CPU cores, 64GB RAM) FreeBSD 8.1-RELEASE machine that also runs Apache 2.2.
  • Apache will get typically about 50 hits per second. The huge most of these hits are API requires a purchase platform.
  • The API calls usually take in regards to a 1 / 2 of another or less to develop a result, but tend to require thirty seconds based on organizations.
  • Each one of the API calls stores a row inside a database. The data saved there's important, only for around a quarter-hour, and must expire.
  • Within the table which stores API call information (schema with this table is below), InnoDB row-level securing can be used to synchronize between threads (Apache connections, really) asking for exactly the same information simultaneously, which happens frequently. Which means that several threads might be awaiting a lock on the row for approximately thirty seconds, as API calls can take that lengthy (truly don't).
  • Most importantly, the most crucial factor to notice is the fact that everything works perfectly under normal conditions.

Nevertheless, this is actually the very highly used table (fifty approximately Card inserts per second, many Chooses, row-level securing is required) I am running the Remove query on:

CREATE TABLE `sales` (

  `sale_id` int(32) unsigned NOT NULL auto_increment,

  `start_time` int(20) unsigned NOT NULL,

  `end_time` int(20) unsigned default NULL,

  `identifier` char(9) NOT NULL,

  `zip_code` char(5) NOT NULL,

  `income` mediumint(6) unsigned NOT NULL,


  UNIQUE KEY `SALE_DATA` (`ssn`,`zip_code`,`income`),

  KEY `SALE_START` USING BTREE (`start_time`)


The Remove query appears like this, and it is run every 5 minutes on cron (I'd would rather run it once each minute):

Remove FROM `sales` WHERE

    `start_time` < UNIX_TIMESTAMP(NOW() - INTERVAL half hour)

I have used INT for that time area since it is apparent that MySQL has trouble using indexes with DATETIME fields.

Making this the issue: The Remove query appears to operate fine a lot of the time (maybe 7 from 10 occasions). In other cases, the query finishes rapidly, but MySQL appears to obtain clogged up for some time later on. I can not exactly prove it's MySQL that's causing problems, however the occasions the signs and symptoms happen certainly coincides using the occasions this totally run. Listed here are the signs and symptoms while things are clogged up:

  • Logging into MySQL and taking advantage of SHOW FULL PROCESSLIST, you will find only a couple of Place INTOsales... queries running, where normally you will find greater than a hundred. What's abnormal here's really the possible lack of any tasks along the way list, instead of there being a lot of. It appears MySQL stops taking connections entirely.
  • Checking Apache server-status, Apache has arrived at MaxClients. All threads have been in "Delivering reply" status.
  • Apache starts using plenty of system time CPU. Load earnings shoot in place, I have seen 1-minute load earnings up to 100. Normal load average with this machine is about 15. I observe that it's using system CPU (instead of user CPU) because I personally use GKrellM to watch it.
  • In top, you will find many Apache processes using plenty of CPU.
  • The site and API (offered by Apache obviously) are unreachable more often than not. Some demands undergo, but take around 3 or 4 minutes. Other demands reply following a time having a "Can't connect with MySQL server through /tmp/mysql.sock" error - this is actually the same error when i get when MySQL has ended capacity and it has a lot of connections (only it does not really say a lot of connections).
  • MySQL accepts no more than 1024 connections, reviews "[!!] Greatest connection usage: 100% (1025/1024)", meaning it's adopted a lot more than it might handle at some point. Generally under normal conditions, you will find merely a couple of hundred concurrent MySQL connections for the most part. reviews not one other issues, I'd gladly paste the output if anybody wants.

Eventually, after about just a few minutes, things recover by themselves with no intervention. CPU usage dates back to normalcy, Apache and MySQL resume normal procedures.

So, so what can I actually do? :) How do i even start to investigate why this really is happening? I need that Remove query to operate for a number of reasons, so why do things go crazy when it is run (although not constantly)?