Details
-
Type: Bug
-
Status: Done/Fixed
-
Priority: Trivial
-
Resolution: Fixed/Completed
-
Affects Version/s: 4.7.6
-
Fix Version/s: 4.7.8
-
Component/s: None
-
Labels:
-
Documentation Required?:None
-
Sprint:MergeSprint1
-
Funding Source:Contributed Code
Description
The queries that retrieve the records to dedupe are extremely slow due to poor construction.
The use of the OR here makes it an unindexed query
{{SELECT t1.contact_id id1, t2.contact_id id2, 50 weight
-> FROM civicrm_email t1
-> JOIN civicrm_email t2 USING (email)
-> WHERE t1.contact_id < t2.contact_id
-> AND t1.email IS NOT NULL
-> AND t1.email <> ''
-> AND (t1.contact_id IN (4279,68563,88888)
-> OR t2.contact_id IN (4279,68563,88888))
-> GROUP BY id1, id2 }}
Although the query does not return any results it is extremely slow on a large dataset. I only left it for 15 minutes or so before killing it but by comparison the equivalent query with a unit took 0.01 seconds
SELECT t1.contact_id id1, t2.contact_id id2, 50 weight
-> FROM civicrm_email t1
-> JOIN civicrm_email t2 USING (email)
-> WHERE t1.contact_id < t2.contact_id
-> AND t1.email = '%'
-> AND (t1.contact_id IN (4279,68563,88888) )
-> #OR t2.contact_id IN (4279,68563,88888))
-> GROUP BY id1, id2
->
-> UNION
->
-> SELECT t1.contact_id id1, t2.contact_id id2, 50 weight
-> FROM civicrm_email t1
-> JOIN civicrm_email t2 USING (email)
-> WHERE t1.contact_id < t2.contact_id
-> AND t1.email = '%'
-> AND (t2.contact_id IN (4279,68563,88888) )
-> #OR t2.contact_id IN (4279,68563,88888))
-> GROUP BY id1, id2
-> ;