CiviCRM

Performance: Indexes on varchar fields should be limited in length

Details

  • Type: Bug Bug
  • Status: Open Open
  • Priority: Trivial Trivial
  • Resolution: Unresolved
  • Affects Version/s: 3.1.1
  • Fix Version/s: Future Version
  • Component/s: Core CiviCRM
  • Labels:
    None

Description

CiviCRM generally has too many indexes (which slows down inserts) and the existing indexes are not very well defined (which slows down searches). This stems from research I've been doing in [CRM-5556]. One area for improvement is indexes on varchar fields.

Take for example the email field. emails could be anywhere from ~10 characters to 64 characters and beyond (I'm not quite sure what the spec says the absolute limit is). So if you have one email in the table that is 64 characters then that means the index is ~ 192 bytes wide (UTF characters X 3 = # of bytes (approx)). This means that searching by email address is going to be a lot slower than it could be. The better way to go is to have your indexes have a cardinality approximately 95% of the number of rows in the table.

SELECT COUNT(*) FROM civicrm_email WHERE LENGTH(email) > 26

gives you approx. 5% of the rows in the table. So I think we should limit the length of the index on the email field to 26 characters (26 characters ~ 78 bytes).

ALTER TABLE civicrm_email
DROP INDEX UI_email,
ADD INDEX UI_email (email (26))ADD INDEX UI_email (email (26))

This applies to any indexes on varchar fields. Most importantly email, first_name, last_name, organization_name, household_name, street_name, city, sort_name.

For more info see:
http://www.mysql.com/news-and-events/newsletter/2002-10/a0000000075.html
(This one applies to InnoDB, the other recommendations in here are MyISAM only).
  1. benchmark_combined
    16/Mar/10 2:29 PM
    6 kB
    dave hansen-lange
  2. benchmark_email_index_length
    16/Mar/10 2:29 PM
    6 kB
    dave hansen-lange
  3. benchmark_email.php
    16/Mar/10 2:29 PM
    3 kB
    dave hansen-lange

Activity

Hide
Donald A. Lobo added a comment -

1. suspect might be good to actually test this on a reasonable sized realistic DB to see if it does make a difference

2. if it does, please file a patch that improves this. please also patch the upgrade process

lobo
Show
Donald A. Lobo added a comment - 1. suspect might be good to actually test this on a reasonable sized realistic DB to see if it does make a difference 2. if it does, please file a patch that improves this. please also patch the upgrade process lobo
Hide
dave hansen-lange added a comment -
Here's some benchmarks to prove that I'm not totally off my rocker. This is done in a DB with ~610k contacts and ~620k emails. The comparison is between civicrm_email (which has my proposed modifications) and civicrm_email_original which is an exact copy, just with CiviCRM's default indexes.

The combined benefit of both the changes here and in [CRM-5793] are:
inserts: 2.11% improvement
searches: 1.28% - 2.67% improvement
deletes: 2.43% improvement
Show
dave hansen-lange added a comment - Here's some benchmarks to prove that I'm not totally off my rocker. This is done in a DB with ~610k contacts and ~620k emails. The comparison is between civicrm_email (which has my proposed modifications) and civicrm_email_original which is an exact copy, just with CiviCRM's default indexes. The combined benefit of both the changes here and in [CRM-5793] are: inserts: 2.11% improvement searches: 1.28% - 2.67% improvement deletes: 2.43% improvement

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated: