Uploaded image for project: 'CiviCRM'
  1. CiviCRM
  2. CRM-17454

New speed bump (since last LTS) on deduping


    • Type: Bug
    • Status: Done/Fixed
    • Priority: Minor
    • Resolution: Fixed/Completed
    • Affects Version/s: 4.6.9
    • Fix Version/s: 4.7
    • Component/s: None
    • Documentation Required?:
    • Funding Source:
      Contributed Code


      On upgrading to 4.6 I discovered a new query that runs slowly in some circumstances.

      The query is

      FROM information_schema.key_column_usage
      referenced_table_schema = database() AND
      referenced_table_name = 'civicrm_contact' AND
      referenced_column_name = 'id';

      and it was introduced in https://issues.civicrm.org/jira/browse/CRM-14500 in order to make the selection of which tables to merge dynamic.

      The fastest this query ran for me was locally at 150ms (locally, 202 contacts, file_per_table = OFF) and the slowest was on a dev server with many databases & file_per_table= ON at 1 min 17 sec. Those extremes were running the query directly. I experience of up to 17 seconds within the context of CiviCRM.

      It seems that running this query will open the db file for as many database tables as the mysql user has permission to. The slowest results were thus obtained running under the user with access to the most databases on the dev server. However, that was not the full picture as it was still slower against some databases than others within that server and this didn't seem to match directly the database size. Perhaps file fragmentation is a factor?

      Some informations about why it opens each table file is available at this link http://dev.mysql.com/doc/refman/5.5/en/information-schema-optimization.html.
      Limiting the query to the database in use did not alter the speed - which is consistent with that link.

      I played around with setting innodb_stats_on_metadata = 0; and it seemed to shave some time off - locally from 250 ms to 150ms and at the upper extreme from 1.17 min to 1.05 min. I was 'fairly' consistent.


      My local was the only one I tested with file_per_table OFF as setting that to on is standard for us so I don't know if that affects the speed.

      The data derived from the query only needs to be updated as frequently as a system_flush is done - ie. upgrades, extension installs, module installs & custom group creation so some sort of caching makes sense




            • Assignee:
              eileen Eileen McNaughton
              eileen Eileen McNaughton
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: