I am really battling to obtain a query time lower, its presently needing to query 2.5 million rows also it gets control 20 seconds
this is actually the query
Choose play_date AS date, COUNT(DISTINCT(email)) AS count FROM log WHERE play_date BETWEEN '2009-02-23' AND '2020-01-01' And kind = 'play' GROUP BY play_date ORDER BY play_date desc `id` int(11) NOT NULL auto_increment, `instance` varchar(255) NOT NULL, `email` varchar(255) NOT NULL, `type` enum('play','claim','friend','email') NOT NULL, `result` enum('win','win-small','lose','none') NOT NULL, `timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP, `play_date` date NOT NULL, `email_refer` varchar(255) NOT NULL, `remote_addr` varchar(15) NOT NULL, PRIMARY KEY (`id`), KEY `email` (`email`), KEY `result` (`result`), KEY `timestamp` (`timestamp`), KEY `email_refer` (`email_refer`), KEY `type_2` (`type`,`timestamp`), KEY `type_4` (`type`,`play_date`), KEY `type_result` (`type`,`play_date`,`result`) id choose_type table type possible_keys key key_len ref rows Extra 1 SIMPLE log ref type_2,type_4,type_result type_4 1 const 270404 Using where
The totally while using type_4 index.
Does anybody understand how I possibly could speed this question up?
That's relatively good, already. The performance sink would be that the query needs to compare 270404 varchars for equality for that
COUNT(DISTINCT(email)), and therefore 270404 rows need to be read.
You can have the ability to result in the count a catalog-only operation (i.e. the particular rows don't need to huged, just the index) by altering the index the following:
KEY `type_4` (`type`,`play_date`, `email`)
I'd be amazed in the event that wouldn't quicken things a great deal.
Your indexing is most likely nearly as good as possible it. You've got a compound index around the 2 posts inside your
where clause and also the
explain you published signifies that it's getting used. Regrettably, you will find 270,404 rows that match the factors inside your
where clause plus they all have to be considered. Also, you are not coming back unnecessary rows inside your
My advice is always to aggregate the information daily (or hourly or whatever is sensible) and cache the outcomes. That method for you to access slightly stale data instantly. Hopefully this really is appropriate for your reasons.
Try a catalog on play_date, type (just like type_4, just corrected fields) and find out in the event that helps
You will find 4 possible types, and that i assume hundreds of possible dates. When the query uses the kind, play_date index, it essentially (not 100% accurate, but general idea) states.
(A) Find all of the Play records (about 25% from the file) (B) Now within that subset, find all the asked for dates
By curing the index, the approach is
> (A) Find all of the dates within range > (Maybe 1-2% of file) (B) Now find all > PLAY types within that more compact portion > from the file
Hope this can help
Removing email to split up table ought to be a great performance boost since counting distinct varchar fields should take some time. Apart from that - the right index can be used and also the query is as enhanced as it may be (aside from the e-mail, obviously).
COUNT(DISTINCT(email)) part may be the bit that's killing you. Should you only truly require the first 2000 outcomes of 270,404, possibly it might assistance to perform the email count just for the outcomes rather than for the entire set.
Choose date, COUNT(DISTINCT(email)) AS count FROM log, ( Choose play_date AS date FROM log WHERE play_date BETWEEN '2009-02-23' AND '2020-01-01' And kind = 'play' ORDER BY play_date desc LIMIT 2000 ) AS candidate WHERE candidate.id = log.id GROUP BY date
Try creating a catalog only on play_date.
Long-term, I would suggest creating a summary table having a primary key of play_date and count of distinct emails.
For the way current you really need it to become - either ensure it is up-to-date daily (by play_date) or live using a trigger around the log table.