CRM-8822 Adding new individual while dedupe check is running causes Database Lock Error

    Details

    • Type: Bug
    • Status: Done/Fixed
    • Priority: Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 3.4.5
    • Fix Version/s: 4.4.0
    • Component/s: Core CiviCRM
    • Labels:
      None

      Description

      To reproduce:

      1. Get a database with enough records for a dedupe check to take a few minutes.
      2. Start a dedupe check on a group using the Strict Individual rule.
      3. While the dedupe check is running, open a new tab. Add a new individual, with first name, last name, and email address.
      4. Instead of a new contact, I get an error:

      Database Error Code: Lock wait timeout exceeded; try restarting transaction, 1205
      Additional Details:
      Array
      (
      [callback] => Array
      (
      [0] => CRM_Core_Error
      [1] => handle
      )

      [code] => a515ac9c2796ca0e23adbe92c68fc9fc
      [message] => DB Error: a515ac9c2796ca0e23adbe92c68fc9fc
      [mode] => 16
      [debug_info] => INSERT INTO civicrm_email (contact_id , location_type_id , email , is_primary , is_billing , on_hold , is_bulkmail , signature_text , signature_html ) VALUES ( 115200 , 1 , 'me@example.com' , 1 , 0 , 0 , 0 , NULL , NULL ) [nativecode=1205 ** Lock wait timeout exceeded; try restarting transaction]
      [type] => DB_Error
      [user_info] => INSERT INTO civicrm_email (contact_id , location_type_id , email , is_primary , is_billing , on_hold , is_bulkmail , signature_text , signature_html ) VALUES ( 115200 , 1 , 'me@example.com' , 1 , 0 , 0 , 0 , NULL , NULL ) [nativecode=1205 ** Lock wait timeout exceeded; try restarting transaction]
      [to_string] => [db_error: message="DB Error: a515ac9c2796ca0e23adbe92c68fc9fc" code=0 mode=callback callback=CRM_Core_Error::handle prefix="" info="INSERT INTO civicrm_email (contact_id , location_type_id , email , is_primary , is_billing , on_hold , is_bulkmail , signature_text , signature_html ) VALUES ( 115200 , 1 , 'me@example.com' , 1 , 0 , 0 , 0 , NULL , NULL ) [nativecode=1205 ** Lock wait timeout exceeded; try restarting transaction]"]
      )

        Attachments

          Activity

          [CRM-8822] Adding new individual while dedupe check is running causes Database Lock Error
          Peter Olson added a comment -

          Here's the output of MySQL slow query logging for the dedupe:

          1. Time: 110907 11:44:47
          2. User@Host: drupal[drupal] @ localhost []
          3. Query_time: 2129.885597 Lock_time: 0.000952 Rows_sent: 0 Rows_examined: 303661784
            INSERT INTO dedupe (id1, id2, weight) SELECT t1.contact_id id1, t2.contact_id id2, 15 weight FROM civicrm_email t1 JOIN civicrm_email t2 USING (email) WHERE t1.contact_id < t2.contact_id AND t1.email IS NOT NULL AND (t1.contact_id IN (2,4,9,73,77,118,122,127,140,164,176,205,209,227,230,248,276,277,282,288,294,307,329,330,397,411,448,475,485,511,533,553,576,592,617,639,640,641,647,713,752,839,857,863,874,939,1323,1359,1674,1680,1756,1818,2033,2195,2243,2246,2311,2383,2482,2614,2704,2739,2809,2902,2936,2946,2951,2959,3090,3201,3252 ... many values omitted .... 115160, 115161) OR t2.contact_id IN (2,4,9,73,77,118,122,127 ... many values omitted ... 115160,115161)) GROUP BY id1, id2 ON DUPLICATE KEY UPDATE weight = weight + VALUES(weight);
          Peter Olson added a comment -

          The two subordinate SELECT clauses produce 849 contact_id values out of 110708 rows in the civicrm_email table.
          It looks to me that 187 million rows will be produced, but the query says it looked at 303 million. Maybe I am
          misinterpreting what rows examined means? Perhaps this is from 110808*110708 == 12 billion rows
          in the main JOIN prior to the selection logic?

          In any case this statement appears to be hopelessly inefficient.

          Donald A. Lobo added a comment -

          Quick notes after an IRC meeting with Ward (cure on IRC), kurund and lobo:

          1. The restricting to group clause in dedupe, basically inlines ALL the contact IDs in the query. This does not scale. We should do an inner join

          2. joining the email table to itself will explode for big email tables. We should implement the dedupe hook and see if we can do an optimized query. One idea was
          to use a "group by email" and then restrict the contact ids to only the ones that had a count( ) > 1. This seems like it will be a lot more efficient

          3. The foll queries are called:

          14468 Query SELECT id, is_deleted
          FROM civicrm_contact

          WHERE ( civicrm_contact.id = 414 )
          14468 Init DB civicrm
          14468 Query SELECT id
          FROM civicrm_relationship
          WHERE (( contact_id_a = 4 AND contact_id_b = 414 AND is_permission_a_b = 1 ) OR
          ( contact_id_a = 414 AND contact_id_b = 4 AND is_permission_b_a = 1 )) AND
          (contact_id_a NOT IN (SELECT id FROM civicrm_contact WHERE is_deleted = 1)) AND
          (contact_id_b NOT IN (SELECT id FROM civicrm_contact WHERE is_deleted = 1))
          AND ( civicrm_relationship.is_active = 1 )
          14468 Init DB civicrm
          14468 Query SELECT count(DISTINCT contact_a.id)
          FROM civicrm_contact contact_a
          WHERE contact_a.id = 414 AND (contact_a.is_deleted = 0)

          for every contact in the merge set. This if we return 12K merges, we get 36K queries which is actually the real bottleneck. We should
          reduce / merge / do a late load of some of these queries, i.e. investigate what and why

          Coleman Watts added a comment -

          If you alter the query builder, please remember to document how this change will affect hook_civicrm_dupeQuery()

          Donald A. Lobo added a comment -

          For item 3 above:

          So the 3 queries per merge row basically check permission to see if the "user" has permission to merge those records. This same check is
          repeated if the merge link is clicked. One easy alternative (for now) is to skip this check during the listing and block it during the merge process.

          I'm not sure why we'll even consider contacts that that user does not have permission on. I think the acl's basically restrict the set of contacts that
          to only ones the user has rights to

          For item 2 above:

          A good workaround for now is to use the dedupe hook which NYSS has been using very effectively. This will have a major impact on dedupe speed.
          More details on lcdweb's blog post: http://civicrm.org/blogs/lcdweb/matching-contacts-dedupe-hook

          For item 1 (and also special casing the shipped dedupe hooks), we'll need to do this via an MIH. Optimization is always quite time consuming. I'd spec
          the MIH to be approx 50 hours or so. John S, do u think FSF could help sponsor this work or put some developer resources on it

          Donald A. Lobo added a comment -

          I've committed one optimization to avoid the multiple queries per inner loop. Wth that fix and a hook implementation a dedupe can be made to run fairly quickly.

          For more optimizations, we'll need to start an MIH and get funding for the required hours.

          Donald A. Lobo added a comment -


          Closing issues in batch on 4/16/13 since these issues have not been worked on in a long long time

            People

            • Assignee:
              Donald A. Lobo
              Reporter:
              John Sullivan

              Dates

              • Created:
                Updated:
                Resolved: