Uploaded image for project: 'CiviCRM'
  1. CiviCRM
  2. CRM-17996

CRM_Contact_BAO_Query::selectClause() returns unnecessary fields

    Details

    • Type: Improvement
    • Status: Done/Fixed
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 4.4.6
    • Fix Version/s: None
    • Component/s: Core CiviCRM
    • Labels:
      None
    • Documentation Required?:
      None
    • Funding Source:
      Contributed Code

      Description

      Ref: https://issues.civicrm.org/jira/browse/CRM-17748
      Ref: https://github.com/routinet/civicrm-core/commit/33092c897f267588ef7401f860908b033d594a34

      In addressing the memory issues we discovered in CRM-17748, we implemented a change in CRM_Contact_Selector. Instead of passing a NULL value for $fields, we passed array('contact_id'). In initial testing based on our specific problem search, this limited the number of return fields. However, testing other queries exhibited issues. For example, searching by a custom field worked fine, but searching by postal_code would show incorrect results. By passing an explicit list of fields in the $fields parameter, any field not in that list would be ignored.

      Locally, we reverted that change, which still left us with the problem of extraneous fields being returned in the query for CRM_Contact_Selector::contactIDQuery(). After tracing some execution, we found the fields were being added in CRM_Contact_BAO_Query::selectClause().

      Ref: https://github.com/civicrm/civicrm-core/blob/master/CRM/Contact/BAO/Query.php#L673

      The if statement there checks to see if a particular field is included in the $returnProperties list, the query's $params list, or is an "exception". If any of those conditions is true, the field is added to the list of select fields for the query. In turn, this means that any query coming through advanced search will have additional fields added to the return if they are used as predicate fields, i.e., in the $params list. For example:

      EXAMPLE 1
      Original query: search by postal_code = '12308'
      Desired fields: contact_id int
      Actual fields: contact_id int, contact_type varchar(64), address_id int, postal_code varchar(12)
      Memory: 4 bytes/row desired, 84 bytes/row actual

      EXAMPLE 2
      Original query: email like '%nysenate%'
      Desired fields: contact_id int
      Actual fields: contact_id int, contact_type varchar(64), email_id int, email varchar(254)
      Memory: 4 bytes/row desired, 326 bytes/row actual

      The more fields used in the search, the longer each row will become, even if those fields are only filters. As some of our basic searches can yield resultsets larger than 300k rows, we have a need to reduce this footprint as much as possible.

      Ref: https://gist.github.com/routinet/2050ca733b11147c6902

      We have tested removing the check against $this->_paramLookup, with promising results. Is there a purpose behind including filter/predicate fields in a query's SELECT list? How would this proposed change impact other areas of Civi?

        Attachments

          Activity

            People

            • Assignee:
              timotten Tim Otten
              Reporter:
              s.bink Steve Binkowski
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: