CRM-19385 Look at REMOVING id column from cache tables

    Details

    • Versioning Impact:
      Patch (backwards-compatible bug fixes)
    • Documentation Required?:
      None
    • Funding Source:
      Paid Issue Queue

      Description

      After some digging into cache table deadlocks I have concluded that REMOVING the auto increment field may improve the frequency of deadlocks. The issue is that when it goes to INSERT into a table it locks the number of ids it will need for the insert. If the query is at all slow then it takes time to calculate this lock.

      So, 2 separate processes cannot insert to the same table at the same time because it has to finish one to calculate the next auto-increment.

      However, the id field adds no value to these cache tables:
      civicrm_group_contact_cache
      civicrm_acl_contact_cache
      civicrm_cache

      With some hesitation I would also add to that list
      civicrm_prevnext_cache

      I had a go at removing the id from civicrm_group_contact_cache and I will add a PR for the changes I had to make to get past any errors. So far it seems to have led to a reduction in auto-inc errors

      I'm also playing with adjusting the indexes on group_contact_cache to

      UNIQUE INDEX `UI_contact_group` (`group_id`, `contact_id`),
      INDEX `index_contact` (`contact_id`)

      The logic being the gap lock will lock less rows for the unique index with group first since it is usually only a single group, but a wide range of contacts

      See this one for more on gap locks & also for my next approach
      {{SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
      session2> BEGIN;
      session2> INSERT INTO actor (first_name,last_name) VALUES ('Bob','Davis');Query OK, 1 row affected (0.02 sec)

      session1> COMMIT}};
      http://blog.9minutesnooze.com/diagnosing-mysql-autoinc-gap-locks-ideeli-tech-blog/

        Attachments

          Activity

          [CRM-19385] Look at REMOVING id column from cache tables
          Eileen McNaughton added a comment -

          OK - just a diff at the moment...

          diff --git a/sites/all/modules/civicrm/CRM/Contact/BAO/Group.php b/sites/all/modules/civicrm/CRM/Contact/BAO/Group.php
          index caf8147..b007755 100644
          — a/sites/all/modules/civicrm/CRM/Contact/BAO/Group.php
          +++ b/sites/all/modules/civicrm/CRM/Contact/BAO/Group.php
          @@ -993,7 +993,7 @@ class CRM_Contact_BAO_Group extends CRM_Contact_DAO_Group {
          }
          // Exclude deleted contacts
          $where .= " and c.id = g.contact_id AND c.is_deleted = 0";

          • $dao = CRM_Core_DAO::executeQuery("SELECT g.group_id, COUNT(g.id) as `count` FROM $table g, civicrm_contact c WHERE $where GROUP BY g.group_id");
            + $dao = CRM_Core_DAO::executeQuery("SELECT g.group_id, COUNT as `count` FROM $table g, civicrm_contact c WHERE $where GROUP BY g.group_id");
            while ($dao->fetch()) { $values[$dao->group_id]['count'] = $dao->count; }

            diff --git a/sites/all/modules/civicrm/CRM/Group/Page/Group.php b/sites/all/modules/civicrm/CRM/Group/Page/Group.php
            index d0f6efd..036f8ea 100644

              • a/sites/all/modules/civicrm/CRM/Group/Page/Group.php
                +++ b/sites/all/modules/civicrm/CRM/Group/Page/Group.php
                @@ -143,7 +143,7 @@ class CRM_Group_Page_Group extends CRM_Core_Page_Basic {
                if (!empty($_GET['update_smart_groups'])) { CRM_Contact_BAO_GroupContactCache::loadAll(); }
          • elseif (!CRM_Core_DAO::singleValueQuery("SELECT id FROM civicrm_group_contact_cache LIMIT 1")) {
            + elseif (!CRM_Core_DAO::singleValueQuery("SELECT contact_id FROM civicrm_group_contact_cache LIMIT 1")) { CRM_Core_Session::setStatus(ts('Count data for smart groups is not currently calculated. You may click Update Smart Groups to generate it. Be aware this can cau }
          Alan Dixon added a comment -

          This seems like a useful thing. In general, CiviCRM having those ids as primary auto-increment fields is awesome, and a brilliantly forward looking design. So I wonder if not having those ids will have unexpected side effects.

          I also wonder if using a different table type, or even an overall abstracting of the cache tables to make using, say, redis, is a timely thing to add. Yeah, I think I've had this conversation with Tim before and it sounds like a project to put at the end of a too-long-todo-list.

          Eileen McNaughton added a comment -

          I think the queued patch is good - but it is stalled pending someone doing prod testing (works in staging & dev). There are some other tickets & initiatives that go beyond this - but we should keep the scope of this limited

          Andrew West added a comment - - edited

          We've tested this and added some comments to PR-9801 regarding a couple of missed references to the id column in prevnext_cache.

          That said, we've applied the sections that refer to group_contact_cache, and removed its id column. Hard to say if there's any difference, unfortunately - we're still getting a lot of deadlocks on that table. I'm currently working on getting a reproducible error - it's maddeningly intermittent atm.

          Eileen McNaughton added a comment -

          Thanks for the testing. Just a quick check re the deadlocks - you haven't given out the 'view my contact' or 'edit my contact' permission have you? There are specific issues with those permissions

          Andrew West added a comment -

          No, we haven't used those at all.

          Dave Jenkins added a comment -

          Eileen McNaughton Did you get anywhere with READ COMMITTED?
          Also does this issue want to join the CRM-18142 epic party?

          Eileen McNaughton added a comment -

          So where this is at at the moment is that it is now possible to remove the id column from the following tables

          • civicrm_cache
          • civicrm_acl_cache
          • civicrm_group_contact_cache
          • civicrm_acl_contact_cache

          but not civicrm_prevnext_cache

           

          In theory, but not yet proven, dropping the id column would improve some types of deadlocks. This issue is massively complicated by the fact there are different types of deadlocks & DELETE deadlocks are not expected to be affected. The ones that would be likely affected at the ones that come into play when an insert operation is happening, blocking other insert operations because it locks the possibly autoincrement values on the id column.

          Potentially it would make sense to put a different primary key on - Jaap's work assumed that a primary key of multiple fields makes more sense on the acl_cache table.

           

          Anyway - you can test different keys in production & CiviCRM should keep working fine as all queries that touch those id fields should be gone now

          Dave Jenkins added a comment -

          Thanks, yes saw those PRs. Sadly the site where we're currently seeing deadlocks on civicrm_group_contact_cache is still on 4.6 - pushing to upgrade to 4.7 soon but that can't happen until after the current busy & critical period. Would be interested to know if there are any low-risk enhancements around civicrm_group_contact_cache performance that have been tried successfully in 4.6 - maybe that's a conversation for Mattermost though. Apart from the pesky time zone difference!

          Nicolas Ganivet added a comment -

          Dave Jenkins Thanks for the note on CRM-18146 leading me here. As Eileen stated above you could delete the id columns on some cache tables, this should lead to an improvement regardless of your CiviCRM version.

          John K. added a comment -

          Another reason to remove the id - we keep running out of numbers!

          The ID is int(10). We have some smart groups with ~300,000 members.

          After a few months of refreshing smart group caches we hit errors because the auto_increment runs out of numbers and the cache falls over, which has some nasty consequences.

          Eileen McNaughton added a comment -

          John K. I encourage you to test removing the column. It should be optional & you should be able to alter the primary key to group_id+contact_id & ensure there is a key for contact_id alone

           

          Would be great to hear if there is any improvement from doing so

          Andrew McNaughton added a comment - - edited

          I'm currently looking at a range of performance issues relating to `civicrm_group_contact_cache`.  I haven't looked at the other cache tables addressed here.  Also, I'm focussed on performance of some specific queries rather than locking issues.

          It seems odd to me that an id would have be present here.  What use case exists for it?

          In particular, having the `id` column as a primary index means that for either searching for contacts by group, or for groups by contact, the relevant fields will not be contiguous on disk, increasing the required disk activity.

          The cases I'm currently looking at involve looking up whether a contact is a member of a group.  For that, I'd like to see the `index_contact` replaced with a spanning index like so:

              UNIQUE INDEX `UI_group_contact` (`contact_id`, `group_id`),

          Much of what I'm looking at involves the use of temporary tables.  I suspect there's indexes wanted on those also.

          Eileen McNaughton added a comment -

          pretty sure the id is now not required - but we have not had testing on whether removing it helps so have not rolled out.

           

          Current keys on the table are

          PRIMARY KEY (`id`),
          UNIQUE KEY `UI_contact_group` (`contact_id`,`group_id`),
          KEY `FK_civicrm_group_contact_cache_group_id` (`group_id`),
          CONSTRAINT `FK_civicrm_group_contact_cache_contact_id` FOREIGN KEY (`contact_id`) REFERENCES `civicrm_contact` (`id`) ON DELETE CASCADE,
          CONSTRAINT `FK_civicrm_group_contact_cache_group_id` FOREIGN KEY (`group_id`) REFERENCES `civicrm_group` (`id`) ON DELETE CASCADE

          John K. added a comment -

          We implemented a patch in production two weeks ago that removed the id column from the group_contact_cache table.

          The main reason for this was because of the issues we were having with running out of integers in the ID column. Obviously this has been resolved now.

          For that reason alone I think it would be good to remove the ID column from the group_contact_cache table. We just wanted to run it a bit longer before putting in a PR when we get the time to do so.

          We haven't seen any noticeable performance improvements from removing the ID though.

          Tangential: we are currently looking at some other performance issues that are related, whereby the smart group caches are being refreshed due to user actions, even when 'deterministic' cache rebuilds are enabled. We'll post back the results when we have them.

            People

            • Assignee:
              Jitendra Purohit
              Reporter:
              Eileen McNaughton

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 day
                1d