Details
-
Type: Epic
-
Status: Open
-
Priority: Minor
-
Resolution: Unresolved
-
Affects Version/s: 4.7.16
-
Fix Version/s: None
-
Component/s: Dedupe
-
Labels:
-
Epic Name:Resolve dedupe code issues
-
Versioning Impact:Patch (backwards-compatible bug fixes)
-
Documentation Required?:Developer Doc
-
Funding Source:Needs Funding
-
Verified?:No
Description
The dedupe code loads a quickform oriented array of data & uses that for post processing. The result is the code is convoluted and data is hard to alter by hook. Conversely it is the hooks that have locked in the difficult code. My expectation at this stage is we need to break-hook-contract to get past this - which I believe means putting the code in an extension & incrementing the version.
I think the path will look something like
- extract to org.civicrm.dedupe (new civicrm github repo) & resolve ensuring it is enabled on install / upgrade
- ensure unit tests are running - will they run on the extension PRs too? if not the extension could hurt core tests ...
- consider if there any issues on managing DAO / schema when 'something changes'??
- increment the version of the extension, resolving the hook issues
- figure out if we can have unit tests run against both versions!!
- figure out how to have new version installed in install but an optional upgrade
I had a quick go and moving to an extension is fairly seamless. There is a question as to whether we
a) move to an extension and enable for all and THEN increment the version & make changes in the version, leaving people to update the version when they are ready (& put the new one on for new installs)
b) add an extension, leave existing sites on the core version and enable the extension by default for new sites.
The latter approach seems easier from the point of view of making the tests work - ie. if we ship a version with core we only need to ship one version and the tests would run on that, and separately on the core code without the tests enabled. If also seems to offer a slightly softer transition.
One challenge is that once we move the code out to an extension we want to lock the classes in core that are now deprecated to the extension. Perhaps we could do that using a github hook? From that point of view the extension versioning approach seems cleaner. It also saves us from having the question later on about how to decide when we can remove the deprecated code.
Notes on issues with current hook pattern....
The code currently hits hooks as follows:
1) CRM_Utils_Hook::merge('flip', $dupes, $dupes['dstID'], $dupes['srcID']);
$dupes is the only alterable variable here
Array
(
[dstID] => 4
[dstName] => Mr. Anthony Anderson II
[srcID] => 5
[srcName] => Mr. Anthony Anderson II
[weight] => 10
[canMerge] => 1
)
The intent of the hook appears to be to allow the reversal of the 2 contacts. Not sure if it works.
2) CRM_Utils_Hook::merge('batch', $migrationData, $mainId, $otherId);
described as
// A hook to implement other algorithms for choosing which contact to bias to when
// there's a conflict (to handle "gotchas"). fields_in_conflict could be modified here
// merge happens with new values filled in here. For a particular field / row not to be merged
// field should be unset from fields_in_conflict.
$migrationData can be altered - to resolve a conflict and choose a field it is necessary to alter the following 3 places (is that it?)
unset($migrationData[''fields_in_conflict'][$fieldname]
$migrationData['migration_info']['move_$fieldname]
$migrationData['migration_info'][$rows]['move_$fieldname]
- see https://issues.civicrm.org/jira/secure/attachment/57777/array.txt for what $migrationData might hold
CRM_Utils_Hook::alterLocationMergeData($blocksDAO, $mainId, $otherId, $migrationInfo);
$ blocksDAO passes out all the $dao actions to be done with a chance to change them. This was a recent addition reflecting the other hooks not meeting this need & no other clear place to add it
Other hooks in the flow:
CRM_Utils_Hook::merge('cidRefs', $contactReferences);
CRM_Utils_Hook::merge('eidRefs', $eidRefs);
In this case the goal is to get a list of tables that refer to the contact table. (the original list is derived from metadata & includes tables in the extension with correct metadata - e.g CiviDiscount does not implement this hook but does get merges happening
Other completely different versions of the hook:
CRM_Utils_Hook::merge('relTables', $relTables);
CRM_Utils_Hook::merge('locTables', $locTables);
CRM_Utils_Hook::merge('form', $migrationData, $this->_cid, $this->_oid);
CRM_Utils_Hook::merge('sqls', $sqls, $tagAId, $tagBId, $tables);
There is also a hook
CRM_Utils_Hook::post_case_merge($mainContactId, $mainCaseId, $otherContactId, $otherCaseId, $changeClient);
but that one passes clear IDs