Details
-
Type: Improvement
-
Status: Done/Fixed
-
Priority: Trivial
-
Resolution: Won't Fix
-
Affects Version/s: 3.1.3
-
Fix Version/s: 4.3.0
-
Component/s: None
-
Labels:None
Description
Recently on irc it was reported (forgot by who) that searching with no criteria on 450k contact creates n GB temp table and, of course, takes minutes.
This prompted me to check my log tables (though we have much smaller dataset) and indeed I found that this query gets logged.
The query in question uses
SELECT DISTINCT(contact_a.id) as contact_id, contact_a.contact_type as `contact_type`,.... and so on (60 columns)
FROM civicrm_contact contact_a
LEFT JOIN civicrm_address ON ( contact_a.id = civicrm_address.contact_id AND civicrm_address.is_primary = 1 )
LEFT JOIN civicrm_state_province ON civicrm_address.state_province_id = civicrm_state_province.id
LEFT JOIN civicrm_country ON civicrm_address.country_id = civicrm_country.id
LEFT JOIN civicrm_email ON (contact_a.id = civicrm_email.contact_id AND civicrm_email.is_primary = 1)
LEFT JOIN civicrm_phone ON (contact_a.id = civicrm_phone.contact_id AND civicrm_phone.is_primary = 1)
LEFT JOIN civicrm_im ON (contact_a.id = civicrm_im.contact_id AND civicrm_im.is_primary = 1)
LEFT JOIN civicrm_worldregion ON civicrm_country.region_id = civicrm_worldregion.id
LEFT JOIN civicrm_option_group option_group_gender ON (option_group_gender.name = 'gender')
LEFT JOIN civicrm_option_value gender ON (contact_a.gender_id = gender.value AND option_group_gender.id = gender.option_group_id) WHERE ( contact_a.display_name LIKE '%goran%' ) ORDER BY contact_a.sort_name asc LIMIT 0, 50;
Which on my dataset returns
50 rows in set (1.62 sec)
This is a bit too long for a query with LIMIT 0, 50
It seems that this query is interpreted as SELECT DISTINCT contact_a.id as contact_id, contact_a.contact_type as `contact_type`,... and so on and I was wondering if DISTINCT is necessary?
The joins are mostly one to many, and IF we can assume that address, phone and e-mail have only one primary record per contact then distinct is redundant and running the query without it will give the same results with main difference in running time (and memory requirements):
SELECT contact_a.id as contact_id, contact_a.contact_type as `contact_type`,...
50 rows in set (0.01 sec)
According to http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html using distinct with order by may lead to creating temporary tables (also IMHO if LIMIT needs to be strictly respected, too, but not sure about that).
Especially over 60 columns and 450k records (it basically builds a 60 column index if I understand this correctly)
I'll be happy to provide patches if someone tells me where is this query created. I think this might be impacting a lot of searches.