I've got a query that chooses a table of nodes, then joins a table of game titles into it. This is accomplished beginning with joining an in-between table of node IDs and title IDs that enables a many-to-many relationship between your first couple of tables. Both joins are inner to ensure that only nodes having a correctly set up and existing title are selected. In my opinion this to any or all be neat and efficient - the issue is below:
There's additionally a 4th table that delivers an easy hierarchy for nodes node_parents. Each row has two fields a node ID along with a node ID that functions as that node's parent (node_id and parent_id). Some nodes don't have children set up within this database (ie. the node itself is not marked being a parent in almost any row from the node_parents table) - fundamental essentials nodes I am attempting to choose.
The extra criteria of these childless nodes is they possess a specific title set up - hence the subquery initially choosing from node_game titles after which inner joining node_parents. The subquery also offers an organization BY because some nodes are parents of multiple nodes, so their node_id will unnecessarily appear multiple occasions within the results. I ought to also explain that due to this the main key for node_parents is a mix of the node_id and parent_id.
SELECT `nodes`.`node_id`, `titles`.`title` FROM `nodes` INNER JOIN `node_titles` ON `nodes`.`node_id` = `node_titles`.`node_id` INNER JOIN `titles` ON `node_titles`.`title_id` = `titles`.`title_id` WHERE `nodes`.`node_id` NOT IN ( SELECT `node_titles`.`node_id` FROM `node_titles` INNER JOIN `node_parents` ON `node_titles`.`node_id` = `node_parents`.`parent_id` WHERE `node_titles`.`title_id` = 1 GROUP BY `node_titles`.`node_id` ) AND `titles`.`title_id` = 1
Tables dimensions: nodes = ~32,000 node_game titles = ~49,000 game titles = 3 node_parents = ~55,000
The query takes around 16 minutes to accomplish. Can anybody provide any pointers? I've attempted profiling the query - which does not have lengthy dangles, however it does continue doing this cycle for which appears like all selected row:
| executing | 0.000005 | | Copying to tmp table | 0.515815 | | Sorting result | 0.000053 | | Sending data | 0.000028 |
I've also attempted ditching the subquery and taking advantage of a LEFT JOIN having a WHERE foo Isn't NULL, but this still requires a very long time to process - the profiler claims ~180 seconds for 'Copying to tmp table'.
Ultimately I suspect this can be an indexing problem - but in either case I'd appreciate solutions that are not questioning the implementation from the query unless of course they're going after a potential reason for the downturn (eg. yes, the game titles and nodes should be inside a many-to-many relationship). Thanks all, and additional info on request!
Take away the
GROUP BY in the subquery:
SELECT nodes.node_id, titles.title FROM nodes n INNER JOIN node_titles nt ON nt.node_id = n.node_id INNER JOIN titles t ON t.title_id = nt.title_id WHERE n.node_id NOT IN ( SELECT nti.node_id FROM node_titles nti INNER JOIN node_parents npi ON npi.parent_id = nt.node_id WHERE nti.title_id = 1 )
Produce the following indexes:
node_titles (node_id, title_id) titles (title_id) node_parents (parent_id)
SELECT nodes.node_id, titles.title FROM nodes n INNER JOIN node_titles nt ON nt.node_id = n.node_id AND nt.title_id = 1 INNER JOIN titles t ON t.title_id = nt.title_id WHERE n.node_id NOT IN ( SELECT parent_id FROM node_parents )
MySql has a tendency to have difficulties with subqueries in my opinion. Do this
SELECT nodes.node_id, titles.title FROM nodes b INNER JOIN node_titles nt ON nt.node_id = n.node_id INNER JOIN titles t ON t.title_id = nt.title_id LEFT OUTER JOIN ( SELECT nti.node_id FROM node_titles nti INNER JOIN node_parents npi ON npi.parent_id = nt.node_id WHERE nti.title_id = 1 ) ThisTable on n.node_id = ThisTable.node_id WHERE ThisTable.node_id is null