I refactored a sluggish portion of a credit card applicatoin we inherited from another company to make use of an inner join rather than a subquery like
where id in (choose id from ... )
The refactored query runs about 100x faster. (~50 seconds to ~.3) I was expecting a noticable difference, but could anybody explain why it had been so drastic? The posts utilized in the where clause counseled me indexed. Does SQL execute the query within the where clause once per row or something like that?
Update - Explain results:
The main difference is incorporated in the second area of the "where id in ()" query -
2 DEPENDENT SUBQUERY submission_tags ref st_tag_id st_tag_id 4 const 2966 Using where
versus 1 indexed row using the join:
SIMPLE s eq_ref PRIMARY PRIMARY 4 newsladder_production.st.submission_id 1 Using index
A "correlated subquery" (i.e., one where the where condition is dependent on values acquired in the rows from the that contains query) will execute once for every row. A non-correlated subquery (one where the where condition is in addition to the that contains query) will execute once at the start. The SQL engine makes this distinction instantly.
But, yeah, explain-plan provides you with the dirty particulars.
You're running the subquery once for each row whereas the join happens on indexes.
Run the explain-intend on each version, it will explain why.
prior to the queries are run from the dataset they're subjected to a question optimizer, the optimizer tries to organize the query in this fashion that it may remove as numerous tuples (rows) in the result set as rapidly as it can certainly. Frequently if you use subqueries (especially bad ones) the tuples can not be trimmed from the result set before the outer query begins to operate.
Without having seeing the the query its tough to say that which was so bad concerning the original, but my prediction could be it had been something which the optimizer just could not make far better. Running 'explain' will highlight the optimizers way of locating the information.
Here's a good example of how subqueries are examined in MySQL 6..
The brand new optimizer will convert this type of subqueries into joins.
The where subquery needs to run 1 query for every came back row. The interior join just needs to run 1 query.
It is not a lot the subquery because the IN clause, although joins are in the building blocks with a minimum of Oracle's SQL engine and run very rapidly.
The subquery was most likely performing a "full table scan". Quite simply, not while using index and coming back so many rows the Where in the primary query were requiring to remove.
Only a guess without particulars obviously but that is the most popular situation.
Usually its caused by the optimizer the inability to determine the subquery could be performed like a participate in which situation it executes the subquery for every record within the table instead of join the table within the subquery from the table you're querying. A few of the more "enterprisey" database are better only at that, however they still miss often it.
Optimizer did not perform a excellent job. Usually they may be changed with no difference and also the optimizer can perform this.
Having a subquery, you need to re-execute the second Choose for every result, and every execution typically returns 1 row.
Having a join, the second Choose returns much more rows, however, you just execute it once. The benefit is the fact that you can now join around the results, and joining relations is exactly what a database should be proficient at. For instance, maybe the optimizer can place how to get better benefit of a catalog now.
This is sort of general, so here is a general answer:
Essentially, queries take more time when MySQL has a lot of rows to examine.
Run an EXPLAIN on each one of the queries (the JOIN'ed one, then your Subqueried one), and publish the outcomes here.
I believe seeing the main difference in MySQL's interpretation of individuals queries will be a chance to learn for everybody.
Sorry to become a MySQL basher and the inability to offer any constructive advice, however the query optimizer appears like total s*** whether it can't handle simple joins/subqueries. Truly, Oracle had similar issues in 8i, but it was fixed almost ten years ago.