Details
-
Type: Bug
-
Status: Done/Fixed
-
Priority: Major
-
Resolution: Fixed/Completed
-
Affects Version/s: 4.7.16
-
Fix Version/s: 5.4
-
Component/s: None
-
Labels:
-
Versioning Impact:Patch (backwards-compatible bug fixes)
-
Documentation Required?:None
-
Funding Source:Needs Funding
-
Verified?:No
Description
Here's a 'I'm not sure what the best way to solve this problem is' problem.
There's a bug in PEAR Mail/SMTP:
http://pear.php.net/bugs/bug.php?id=20513
Here's a really full and boring explanation of what we're seeing.
CiviCRM tries to ->send via SMTP. The Mail\smtp library kicks in, creates a new Net\SMTP instance. If the 'send' succeeds, and we're not using a persistent connection then the SMTP connection is closed. Rinse, repeat.
The problem is, we sometimes hit an error code: 421 - Timeout waiting for client. This means that our SMTP provider has closed the connection.
But the library doesn't understand it's disconnected. It just tries to 'reset' the connection rather than disconnecting (https://github.com/pear/Mail/blob/master/Mail/smtp.php#L283) and then it bombs out, so the 'disconnect' at end of ->send doesn't happen and it doesn't try to fix the problem.
Then on the next email, the SMTPObject is still instantiated (https://github.com/pear/Mail/blob/master/Mail/smtp.php#L246) so it doesn't create a fresh connection - it keeps trying to use the dead one.
So the mailing job quickly hits 6 errors in a row and cancels itself.
A beneficial side effect
Since the mailing job quits at this point, the next time the job is picked up it effectively 'retries' from the point of the first error. So all six are sent out again later, correctly.
But
If I 'fix' the library to handle the timeout, and create a new SMTP connection, then CiviCRM will no longer 'retry' sending the emails that failed - because the job will continue to send and then be marked as completed. So emails will be marked as 'delivered' even though they failed.
How to resolve?
Your thoughts: appreciated.