I am really battling to obtain a query time lower, its presently needing to query 2.5 million rows also it gets control 20 seconds

this is actually the query

Choose play_date AS date, COUNT(DISTINCT(email)) AS count

FROM log

WHERE play_date BETWEEN '2009-02-23' AND '2020-01-01'

And kind = 'play'

GROUP BY play_date

ORDER BY play_date desc

 `id` int(11) NOT NULL auto_increment,

  `instance` varchar(255) NOT NULL,

  `email` varchar(255) NOT NULL,

  `type` enum('play','claim','friend','email') NOT NULL,

  `result` enum('win','win-small','lose','none') NOT NULL,

  `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,

  `play_date` date NOT NULL,

  `email_refer` varchar(255) NOT NULL,

  `remote_addr` varchar(15) NOT NULL,

  PRIMARY KEY  (`id`),

  KEY `email` (`email`),

  KEY `result` (`result`),

  KEY `timestamp` (`timestamp`),

  KEY `email_refer` (`email_refer`),

  KEY `type_2` (`type`,`timestamp`),

  KEY `type_4` (`type`,`play_date`),

  KEY `type_result` (`type`,`play_date`,`result`)

id  choose_type	table	type	possible_keys	key	key_len	ref	rows	Extra

1   SIMPLE	log	ref	type_2,type_4,type_result	type_4	1	const	270404	Using where

The totally while using type_4 index.

Does anybody understand how I possibly could speed this question up?

Thanks Tom

That's relatively good, already. The performance sink would be that the query needs to compare 270404 varchars for equality for that COUNT(DISTINCT(email)), and therefore 270404 rows need to be read.

You can have the ability to result in the count a catalog-only operation (i.e. the particular rows don't need to huged, just the index) by altering the index the following:

KEY `type_4` (`type`,`play_date`, `email`)

I'd be amazed in the event that wouldn't quicken things a great deal.

Your indexing is most likely nearly as good as possible it. You've got a compound index around the 2 posts inside your where clause and also the explain you published signifies that it's getting used. Regrettably, you will find 270,404 rows that match the factors inside your where clause plus they all have to be considered. Also, you are not coming back unnecessary rows inside your choose list.

My advice is always to aggregate the information daily (or hourly or whatever is sensible) and cache the outcomes. That method for you to access slightly stale data instantly. Hopefully this really is appropriate for your reasons.

Try a catalog on play_date, type (just like type_4, just corrected fields) and find out in the event that helps

You will find 4 possible types, and that i assume hundreds of possible dates. When the query uses the kind, play_date index, it essentially (not 100% accurate, but general idea) states.

(A) Find all of the Play records (about 25% from the file)

(B) Now within that subset, find all the asked for dates

By curing the index, the approach is

> (A) Find all of the dates within range

> (Maybe 1-2% of file) (B) Now find all

> PLAY types within that more compact portion

> from the file

Hope this can help

Removing email to split up table ought to be a great performance boost since counting distinct varchar fields should take some time. Apart from that - the right index can be used and also the query is as enhanced as it may be (aside from the e-mail, obviously).

The COUNT(DISTINCT(email)) part may be the bit that's killing you. Should you only truly require the first 2000 outcomes of 270,404, possibly it might assistance to perform the email count just for the outcomes rather than for the entire set.

Choose date, COUNT(DISTINCT(email)) AS count

FROM log,

(

    Choose play_date AS date

      FROM log

     WHERE play_date BETWEEN '2009-02-23' AND '2020-01-01'

       And kind = 'play'

     ORDER BY play_date desc

     LIMIT 2000

) AS candidate

WHERE candidate.id = log.id

GROUP BY date

Try creating a catalog only on play_date.

Long-term, I would suggest creating a summary table having a primary key of play_date and count of distinct emails.

For the way current you really need it to become - either ensure it is up-to-date daily (by play_date) or live using a trigger around the log table.