CiviCRM's Core Duplicate finding code contains a flaw in the way it builds queries to find duplicates.
In our case if someone has 4 address records in the same zip code and we are using a duplicate criteria looks at postal code with a weight of 10 for the postal code criteria and the threshold of 40 for the complete criteria. Anyone what come in with that zip code will match to the record. I know 4 addresses in the same zip is an extreme example, but it illustrates the point well.
In this example a new entry on a Contribution page named Jane Doe in zip code 60640 will match with a John Smith who has 4 address records in the 60640 postal code, despite the fact that they don't match on first name, last name or e-mail. Generally all the the UNIONed queries (see below) should have a DISTINCT after the SELECT, but specifically in this case the third SELECT ( SELECT contact_id id, 10 weight FROM civicrm_address WHERE postal_code = '60640 ') in the query returns one record for each address that is in the 60640 zip code. Which for John Smith has four address, it's counting each one of those 4 records as 10 "points" towards the threshold of 40 points to consider something a dupe.
Here's the query that CiviCRM submits to the DB when an anonymous user submits Jane Doe's record via a Contribution Page.
SELECT id id, 10 weight FROM civicrm_contact WHERE SUBSTR(first_name, 1, 4) = SUBSTR('Jane', 1, 4) AND first_name IS NOT NULL
SELECT id id, 10 weight FROM civicrm_contact WHERE last_name = 'Doe'
SELECT contact_id id, 10 weight FROM civicrm_address WHERE postal_code = '60640'
SELECT contact_id id, 10 weight FROM civicrm_email WHERE email = 'firstname.lastname@example.org';
This is the result of the function in the CRM_Dedupe_BAO_RuleGroup::tableQuery() called by CRM_Dedupe_Finder::dupesByParams called by CRM_Contribute_Form_Contribution_Confirm::postProcess() when the Confirm page is submitted.
Attached is the sql for the civicrm_dupe_rules for our rules.