I've two queries to obtain related tags from the mysql database, one works, you don't, my real question is: "why?"
Problem: When performing the very first query, the mysql server will get 100% cpu usage, and needs to be restarted to operate again.
Query 1 (doesn't work):
SELECT tags.*, COUNT(ct.company_id) AS count FROM company2tag ct, tags WHERE ct.company_id IN ( SELECT ct.company_id FROM company2tag ct WHERE ct.tag_id = 18 GROUP BY ct.company_id HAVING COUNT(ct.company_id) = 1 ) AND tags.id != 18 AND tags.id = ct.tag_id GROUP BY ct.tag_id ORDER BY count DESC LIMIT 5;
Query 2 (works):
SELECT tags.*, COUNT(ct.company_id) AS count FROM company2tag ct, tags WHERE ct.company_id IN (5864, 5870, 6140, 6221, 6268) AND tags.id != 18 AND tags.id = ct.tag_id GROUP BY ct.tag_id ORDER BY count DESC LIMIT 5;
To my comprehending the two queries above do completely exactly the same, the only real difference would be that the first query retrieves its "company_id's" using a subselect.
Just how can this happen?
To begin with, you might be going through problems in the first query since you have two tables aliased to ct... one out of the outer query, one out of the sub-query.
Next, you are able to rewrite the IN like a JOIN:
SELECT tags.*, COUNT(ct.company_id) AS count FROM company2tag ct INNER JOIN tags ON tags.id = ct.tag_id INNER JOIN ( SELECT company_id FROM company2tag WHERE tag_id = 18 GROUP BY company_id HAVING COUNT(company_id) = 1 ) ctf ON ct.company_id = ctf.company_id WHERE tags.id != 18 GROUP BY ct.tag_id ORDER BY count DESC LIMIT 5;
Observe that I've not really examined this.
MySQL is not so good in optimizing
The problem inside your first query can't be easily rewritten as
EXISTS, this is exactly why
MySQL inspections the outcomes for every row.
If you wish to choose
company_id's which are pointed out more often than once in
tag 18, it's easier to rewrite this question as a result:
SELECT tags.*, COUNT(company_id) AS count FROM company2tag ct JOIN tags ON tags.id = ct.tag_id WHERE ct.tag_id <> 18 AND NOT EXISTS ( SELECT NULL FROM company2tag cti WHERE cti.tag_id = 18 AND cti.company_id = ct.company_id LIMIT 1, 1 ) GROUP BY ct.tag_id ORDER BY count DESC
The primary idea is the fact that you don't have to
COUNT(*): it's enough simply to make sure that a minimum of two values exist.
Check this out article during my blog for a similar problem:
Getting the next index:
CREATE INDEX ix_company2tag_tag_company_id ON company2tag (tag_id, company_id)
will greatly improve this question.