CiviCRM

Street Address Parsing (Walk-list / Canvassing Support)

Details

  • Type: New Feature New Feature
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed/Completed
  • Affects Version/s: 3.1
  • Fix Version/s: 3.1.6
  • Component/s: Core CiviCRM
  • Labels:
    None

Description

1. Parse Street Address method
Use the parseAddress() method developed by Dharmatech as the basis for a new BAO method which takes a a street address as input and returns a parsed address array with the following elements (which map to columns in civicrm_address table):

street_number
street_number_suffix
street_name
street_unit

You can grab the existing parseAddress() code from here
 https://svn.dharmatech.org/svn/civicrm/2.2/branches/ptp/CRM/Report/Form/PTPList.php

You'll need to modify the code to separate any non-numeric portion of the street_number portion and assign it to street_number_suffix. This eliminates that need for 's_street_number' and 'odd' elements in DT's method.

i_street_number INT => maps to => street_number INT
apt_number VARCHAR(255) => maps to => street_unit VARCHAR(16)

... and we use street_number_suffix VARCHAR(8) to deal with any non-integer portion of a street number.

Example: "54A Excelsior Ave. Apt 1C"
street_number = 54
street_number_suffix = 'A'
street_name = 'Excelsior Ave.'
street_unit = 'Apt 1C'

ALSO, the core schema includes some elements that are described in the USPS pub 28 doc but are not handled in the address parser so far (they are left in the street_name field):
- street_type (e.g. St., Rd., etc.)
- street_number_predirectional (e.g. the 'S' in 'S. Main St.')
- street_number_postdirectional (e.g. the 'SW' in '150 Main St. SW')

For this implemention, we'll leave those in the schema but NOT parse them out (they are retained in the street_name element as shown in example above).

2. Configuration option to enable address parsing
Address parsing is "disabled" by default. It is enabled by checking a new box under Address Editing portion of Global Settings >> Address Settings (civicrm/admin/setting/preferences/address&reset=1).
- Add another option value to address_options group, option label = 'Street Address Parsing'
- When this preference is checked, address parsing functions are enabled for the site (items 3 and 4 below)

3. Parsing street address elements in Contact Edit form (Address pane)
When 'Street Address Parsing' is enabled:
- When ADDING a new address, postProcess calls the parseStreetAddress BAO to parse the string in the street_address field (for each address) into the 4 elements, and saves them to the address table columns.

- When EDITING an existing address, IF the address elements are populated, then civicrm_address.street_address is considered to be a 'cached' value. This ensures that the "parsing" decision is transparent when viewing the street_address. So the setDefaults function should populate street_address form value by concatenated the 4 address elements:

$street_address = $street_number . ' ' . $street_number_suffix . ' ' . $street_name . ' ' . $street_unit;

4. Editing street address elements
Users need a way to modify parsing "decisions" - so we need to provide a way to edit the street address elements.
- If address parsing is enabled, add a link to the right of the "normal" Street Address field (i.e. "address[$i][street_address]: ): "Edit Address Elements"
- Clicking this link REPLACES the Street Address field with the 4 fields corresponding the 4 elements and sets defaults by retrieving the column values (if we're updating an existing address).
- When address elements are displayed, include a link to the right of "Apt No. or Unit" field: "Edit Street Address" - which swaps the fields again (remove address element fields and insert street_address)

NOTE: The goal here is that the form POST params should include EITHER street_address field OR the 4 address element fields, but not BOTH. This allows the postProcess function to use a conditional to control the update method:

  IF street_address is present in POST params
- save street_address form value to street_address column in DB
- parse street_address into the address elements
- if errors in parsing, add "warning" to status message and do NOT save address elements to DB
(so we allow user to save the street_address, but don't saved the parsed elements since we're not able to split things up)
- if no errors in parsing, also save the address elements to corresponding DB columns

  IF street_number, street_name, street_unit are present in POST params
- parse street_number field value into INT segment (street_number) plus substring (street_number_suffix).
- if errors in parsing street_number (e.g. street_number field value is 'A 142 C'), throw a formRule error (see below)
- else save the address elements to corresponding DB columns AND concatenate with space character to form the street_address string and save that to the DB street_address column

Warning for "unparseable street_address":
ts('The complete street address has been saved. However we were unable to parse this address into address elements due to an unrecognized address format. You can set the address elements manually by clicking 'Edit Address Elements' next the Street Address field while in edit mode.');

FormRule error for "unparseable street_number":
ts('The street number you entered is not in an expected format. Street numbers may include numeric digit(s) followed by other characters. You can still enter the complete street address (unparsed) by clicking "Edit Street Address".);

NOTE: I"m not sure about the best approach in the form markup to achieve the above goal. I think it makes sense to actually remove the street_address form field from the DOM and inject the address element fields. We can use jquery remove() and append() methods for this (or might be better ways).

5. Command line script to parse existing addresses
Extend the existing bin/UpdateAddress.php to handle parsing addresses and saving the address elements. I think we should add parameters to explicitly control whether the script does geocoding AND / OR parsing (maybe use geocode/g=true, parse/ap=true for options).

We COULD configure the defaults for these options based on site configuration:
- if site has mapping provider, then geocoding option is true by default unless over-ridden by caller
- if site has Address Parsing enabled, then parse option is true by default unless over-ridden by caller

Activity

Hide
David Greenberg added a comment -
Overall this looks REALLY good. NIce job! A few small issues:

1. Bugs in parsing complete street address:

22A Upper Terrace Apartment #3 -> '22A' + 'Upper Terrace Apartment' + '#3'
(We should recognize 'Apartment', 'Apt', 'Apt.' as all being start of street unit substring)

22A Upper Terrace Street Apt #3 -> '22A' + 'First Street #3' + 'Apt'
(Looks like the '#' character is causing this problem. We need to ignore any street_unit 'starting substrings' AFTER we find the first one.)

---
2. Bug when entering Street Number in separate address elements field (i.e. in address[$n][street_number] field):

If I enter '22 A' in that field, it is saved as '22' (the space + A chars are dropped). Instead, we should save '22' to street_number column and ' A' to street_number_suffix'.

NOTE: '22A' (no space between number and 'A') works properly.

--
3. When entering a New Individual, if you OMIT a required field (i.e either a required custom field, or just enter first name), AND also enter an unparseable street address (e.g 'W 23 First Street'), you get this javascript error:

"Request to hide() function failed. Element id undefined = addressElements_1"
Show
David Greenberg added a comment - Overall this looks REALLY good. NIce job! A few small issues: 1. Bugs in parsing complete street address: 22A Upper Terrace Apartment #3 -> '22A' + 'Upper Terrace Apartment' + '#3' (We should recognize 'Apartment', 'Apt', 'Apt.' as all being start of street unit substring) 22A Upper Terrace Street Apt #3 -> '22A' + 'First Street #3' + 'Apt' (Looks like the '#' character is causing this problem. We need to ignore any street_unit 'starting substrings' AFTER we find the first one.) --- 2. Bug when entering Street Number in separate address elements field (i.e. in address[$n][street_number] field): If I enter '22 A' in that field, it is saved as '22' (the space + A chars are dropped). Instead, we should save '22' to street_number column and ' A' to street_number_suffix'. NOTE: '22A' (no space between number and 'A') works properly. -- 3. When entering a New Individual, if you OMIT a required field (i.e either a required custom field, or just enter first name), AND also enter an unparseable street address (e.g 'W 23 First Street'), you get this javascript error: "Request to hide() function failed. Element id undefined = addressElements_1"
Hide
David Greenberg added a comment -
One other parsing fix we should make:

'2500-10 Alabama Rd Apt 2a' -> '2500' + '-10 Alabama Rd' + 'Apt 2a'

Should have parsed '-10' as street_number_suffix. Basically any non-integer characters after the street_number AND before the first space should be considered street_unit_suffix.
Show
David Greenberg added a comment - One other parsing fix we should make: '2500-10 Alabama Rd Apt 2a' -> '2500' + '-10 Alabama Rd' + 'Apt 2a' Should have parsed '-10' as street_number_suffix. Basically any non-integer characters after the street_number AND before the first space should be considered street_unit_suffix.
Hide
David Greenberg added a comment -
All bugs noted in comments above are fixed!
Show
David Greenberg added a comment - All bugs noted in comments above are fixed!
Hide
David Greenberg added a comment -
If Address Setting -> Parse Street Addresses is TRUE, we should parse the address data during Contact Import.
Show
David Greenberg added a comment - If Address Setting -> Parse Street Addresses is TRUE, we should parse the address data during Contact Import.
Hide
Neha Kulkarni added a comment -
checked in r25786
Show
Neha Kulkarni added a comment - checked in r25786
Hide
David Greenberg added a comment -
Address parsing fails for addresses with a single digit street number. Something is not quite right in CRM_Core_BAO_Address::parseStreetAddress function. Examples:

1 Main Street
2 Roxbury Court
1 Miller Place

... all throw the "unable to parse error" in current 3.2 rev.

Not sure if this is a regression bug - or has been there all along ??
Show
David Greenberg added a comment - Address parsing fails for addresses with a single digit street number. Something is not quite right in CRM_Core_BAO_Address::parseStreetAddress function. Examples: 1 Main Street 2 Roxbury Court 1 Miller Place ... all throw the "unable to parse error" in current 3.2 rev. Not sure if this is a regression bug - or has been there all along ??
Hide
Neha Kulkarni added a comment -
checked in r 28247
Show
Neha Kulkarni added a comment - checked in r 28247

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: