Uploaded image for project: 'CiviCRM'
  1. CiviCRM
  2. CRM-11231

Dedupe based on initial 5 characters of street address runs incredibly slowly on large datasets

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 4.2.0
    • Fix Version/s: Unscheduled
    • Component/s: None
    • Labels:
      None
    • Versioning Impact:
      Patch (backwards-compatible bug fixes)

      Description

      Running a dedupe on first 5 characters of street address gets bogged down for hours while copying data into tmp table.

      INSERT INTO dedupe (id1, id2, weight) SELECT t1.contact_id id1, t2.contact_id id2, 10 weight FROM civicrm_address t1 JOIN civicrm_address t2 ON (SUBSTR(t1.street_address, 1, 5) = SUBSTR(t2.street_address, 1, 5) AND t1.location_type_id = t2.location_type_id) WHERE t1.contact_id < t2.contact_id AND t1.street_address IS NOT NULL AND (t1.contact_id IN (364571,365335,379322,425418,1274856,1274857,1274859,1274862,1274863,1274865,1274867,1275281,1275283,1275287,1275288,1275526,1275528,1300217) OR t2.contact_id IN (364571,365335,379322,425418,1274856,1274857,1274859,1274862,1274863,1274865,1274867,1275281,1275283,1275287,1275288,1275526,1275528,1300217)) GROUP BY id1, id2 ON DUPLICATE KEY UPDATE weight = weight + VALUES(weight)

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mfb Mark Burdett
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: