I've got a requirement to create a listing of possible replicates before a person saves an entity towards the database and warn them from the possible replicates.
You will find 7 criteria which we ought to look into the for replicates and when a minimum of 3 match we ought to flag this as much as the consumer. The factors will all match on ID, so there's no fuzzy string matching needed but my problem originates from the truth that you will find many good ways (99 ways if I have done my sums corerctly) not less than 3 products to complement in the listing of 7 possibles.
I'd rather not need to do 99 separate db queries to locate my search engine results and nor do I wish to bring the entire lot back in the db and filter around the client side. We are most likely only speaking of the couple of hundreds of 1000's of records at the moment, but this can come to be the millions because the system matures.
Anybody got any thoughs of the nice efficient method of doing this? I had been thinking about an easy OR query to obtain the records where a minumum of one area matches in the db after which doing a bit of processing around the client to filter it more, but a couple of from the fields have really low cardinality and will not really lessen the amounts by a large amount.
CASE summing works but they are quite inefficient, given that they avoid using indexes.
You have to make
UNION for indexes to become functional.
If your user makes its way into
address in to the database, and you need to check all records that match a minimum of
3 of those fields, you problem:
SELECT i.* FROM ( SELECT id, COUNT(*) FROM ( SELECT id FROM t_info t WHERE name = 'Eve Chianese' UNION ALL SELECT id FROM t_info t WHERE phone = '+15558000042' UNION ALL SELECT id FROM t_info t WHERE email = 'email@example.com' UNION ALL SELECT id FROM t_info t WHERE address = '42 North Lane' ) q GROUP BY id HAVING COUNT(*) >= 3 ) dq JOIN t_info i ON i.id = dq.id
This can use indexes on these fields and also the query is going to be fast.
Check this out article during my blog for particulars:
- Matching 3 of 4: how you can match an archive which fits a minimum of
Also check this out question the content relies upon.
If you wish to have a listing of
DISTINCT values within the existing data, you simply wrap this question right into a subquery:
SELECT i.* FROM t_info i1 WHERE EXISTS ( SELECT 1 FROM ( SELECT id FROM t_info t WHERE name = i1.name UNION ALL SELECT id FROM t_info t WHERE phone = i1.phone UNION ALL SELECT id FROM t_info t WHERE email = i1.email UNION ALL SELECT id FROM t_info t WHERE address = i1.address ) q GROUP BY id HAVING COUNT(*) >= 3 )
Observe that this
DISTINCT isn't transitive: if
C, it doesn't mean that
You may want something similar to the next:
SELECT id FROM (select id, CASE fld1 WHEN input1 THEN 1 ELSE 0 "rule1", CASE fld2 when input2 THEN 1 ELSE 0 "rule2", ..., CASE fld7 when input7 THEN 1 ELSE 0 "rule2", FROM table) WHERE rule1+rule2+rule3+...+rule4 >= 3
This is not examined, however it shows a method to tackle this.
What DBS are you currently using? Some support using such constraints by utilizing server side code.