CRM-2574 VERP email addresses are often too long (> 64 characters) violating RFC and thus being rejected

    Details

    • Type: Bug
    • Status: Done/Fixed
    • Priority: Major
    • Resolution: Fixed/Completed
    • Affects Version/s: 1.9
    • Fix Version/s: 2.1
    • Component/s: CiviMail
    • Labels:
      None

      Description

      According to RFC2821 <http://tools.ietf.org/html/rfc2821>:

      4.5.3 Sizes and Timeouts

      4.5.3.1 Size limits and minimums

      ...

      local-part
      The maximum total length of a user name or other local-part is 64
      characters.

      We are experiencing a problem with an important client that uses Earthwave <http://www.earthwave.com.au/> to look after their mail. Their mail gateway blocks our CiviMail mails because of this RFC violation (specifically the Return-Path bounce VERP address used in the MAIL FROM). Presumably there are other mail providers out there that do the same.

      An example VERP email address CiviMail generates would be:
      bounce.1.9.27.243c148ff9a0f4b0b4327c5aa4e21091833760b6-mike=example.com@example.com

      It seems Moodle use VERP and complies with RFC (keeping the local part <= 64 characters):
      http://docs.moodle.org/en/Email_processing

      Ideally the VERP emails would be formatted in a way to ensure they complied with this RFC policy.

        Attachments

          Activity

          [CRM-2574] VERP email addresses are often too long (> 64 characters) violating RFC and thus being rejected
          Donald A. Lobo added a comment -


          This is a big change, so pushing to 2.1. Also check: http://docs.moodle.org/en/Email_processing

          Matt Chapman added a comment -

          Confirming that this is also an issue for EARTHLINK, a very large provider here in the USA. Recommending that this issue stay high priority.

          Chris Burgess added a comment -

          This may also affect GMail to some extent, I've today noticed that GMail will use the From: address rather than the Reply-To:, and this may be because of the email not being RFC-compliant.

          Mathieu Lutfy added a comment -

          We have also run into this issue from a university that is using a Baracuda anti-spam filter.

          What makes things even worst here, is that students or workers in large corporations often have e-mail addresses with their complete name, and they often have two family names, such as: "rosalie tremblay-lapierre" = rosalie.tremblay+2Dlapierre@umontreal.ca (fictive name) That's already 26 chars out of 64 for the local part.

          Donald A. Lobo added a comment -

          a few ideas on this:

          1. we currently use a sha1 hash (a 40 digit hex number). if we switch to md5 (32 digit hex number)
          2. we can store the md5 hash in base 36 (using base_convert). this reduces the size to 26 characters

          so we've saved 14 characters off the string

          finally we can also replace the below with one character saving between 4 - 10 characters

          bounce - b
          unsubscribe - u
          reply - r
          optout - o
          resubscribe - s

          that gives us a saving of 18 - 24 characters

          We should also base_convert the domain_id / job_id / event_queue_id to base 36, which will save a few more characters. There is also a proposal to dump domain_id altogether from civicrm, which will give even more savings

          i suspect with some/all of these changes, we should come under the 64 character limit for 99% or higher of all email addresses

          Chris Burgess added a comment -

          That approach should minimise the variables we have under our control, but I wonder how we can deal with also the part that isn't - ie the address, which we encode into the VERP string.

          Any VERP implementation which includes the full address string must be at the mercy of long addresses, but by swapping the address for some key should prevent that.

          eg reply.1.98.21741.ce0251bb86c695ced35c125aced1f9b7e4d6f626-localpart=domain.com@civicrm.org

          {action}

          .

          {did}

          .

          {mid}

          .

          {qid}

          .

          {hash}

          .

          {address}

          - is that right?
          did = domain id
          mid = mailing id
          qid = ? not sure

          If we have all mailed addresses into civicrm_email (even when we are using civimail forward functionality) then we can swap the encoded address for the ID there too.

          I think this is true (if we don't have a record of the address in the system, the VERP string may be of little use anyway).

          Would that work? As long as the hash includes some concealed value as well, it shouldn't be vulnerable to exploitation like that.

          Unless there's some other reason to include the plain address?

          Piotr Szotkowski added a comment -

          IIRC, the consensus was that we can drop the VERP address altogether (and derive it from the MD5 hash if needed).

          Matt Chapman added a comment -

          I don't know enough to comment on the VERP issue, but I would like to give input on the domain ID removal proposal. Is there another discussion on that elsewhere?

          Donald A. Lobo added a comment -

          we could keep it simple and drop the VERP address completely. this definitely puts us under the 64 character limit without any other changes. we currently just ignore the verp'ified email address (in all cases), so from our perspective it does not matter

          i was a bit reluctant to remove it since VERP seems to be a well defined protocol, but maybe that is a much simpler and easier solution to implement

          Donald A. Lobo added a comment -

          i just checked moodle, and basically it takes the first 16 characters of the md5. if we adopt the same approach, we get a saving of 24 characters which basically simplifies the issue a lot

          i'm tempted to go down this path, since the hash is used as a checksum against the other params, 16 characters is still a fair bit, and the consequences of someone guessing the right hash is not awful (since its only limited to the specific event queue id)

          Donald A. Lobo added a comment -

          i've reduced the hash to 16 characters (first 16 from sha1) and used single digit keys for all verp entities. please review the code if possible. i'll resolve the issue and we'll do a test run after we upgrade amavis (there is an open issue for this)

          I suspect this should pretty much fix most of the length issues. we'll still be in trouble with long email addresses, but i suspect thats an edge case

          Chris Burgess added a comment -

          There may be a new issue with the shortened code for users using imap2soap.pl.

          Most imap2soap installations will have configured a filter to deliver reply.@domain to the mailbox handling VERP replies. This address will become r.@domain. However, a filter like that will also match a large number of legitimate mailboxes, eg of the form r.simmons@jazzercise.com

          Therefore, perhaps with the shortened hash, we don't need to abbreviate the reply., bounce., etc component so much, and can still filter the mail without conflicting with real email addresses.

          xavier dutoit added a comment -

          Would be great to be able to define the return prefix. It would allow XYZ+*@ (that's a standard/common alias to XYZ@ mailbox)

          X+

          Chris Burgess added a comment -

          +1 for Xavier's suggestion that a customisable prefix be optionally available.

          Chris Burgess added a comment -

          Ah - this issue is already resolved.

          Xavier, you may need to post a new issue for your suggestion.

            People

            • Assignee:
              Deepak Srivastava
              Reporter:
              Michael Knight

              Dates

              • Created:
                Updated:
                Resolved: