CRM-14773 UTF-8 and export data from report to CSV

    Details

    • Type: Bug
    • Status: Done/Fixed
    • Priority: Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 4.4.5
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Documentation Required?:
      None

      Description

      While exporting data to CSV (Excel) from a report the UTF-8 is not encoded correctly. This can be achieved to set the BOM on the start of the CSV. That way Excel will open the file correctly.

      See also: http://stackoverflow.com/questions/4348802/how-can-i-output-a-utf-8-csv-in-php-that-excel-will-read-properly

        Attachments

          Activity

          [CRM-14773] UTF-8 and export data from report to CSV
          Jaap Jansma added a comment -

          See also the Pull Request at GitHub: https://github.com/civicrm/civicrm-core/pull/3404

          Evi Vanoost added a comment -

          Please be aware that breaking the CSV export in order to support a broken program may cause issues when in use by others.

          I wouldn't go around adding undocumented hacks to make MS-centric systems work. At least make it optional (eg. another button just for Excel exports which could later be expanded in exporting a 'real' document) or only on your environment.

          Per the StackOverflow and Microsoft forums - Excel for Mac does not currently support UTF-8 regardless of a BOM - I'd recommend ask Microsoft to fix it.

          David Greenberg added a comment - - edited

          Jaap - a few comments / questions:

          1. I applied this patch to my 4.5 (master) checkout and exported sample data from the Contribution Details report. Attached screenshot shows extra characters added to first cell as viewed in MS Excel on MAC OS-10. This doesn't seem ideal

          2. Can you address concerns raised by Evi regarding this change 'breaking' the CSV file for other applications.

          3. Wondering whether this same UTF-8 export problem exists when exporting data from searches (e.g. Find Contributions > Export)? If so, then whatever fix we come up with should probably address that as well. If not, then what's the difference?

          4. I don't think this is a a critical bug - so probably should not be fixed in 4.4.x. Was there some discussion / reason for moving it from 4.5 to 4.4.6 (Lobo or Jaap) ?

          Jaap Jansma added a comment -

          Hey Dave en Evi,

          In different order answer to your questions.

          2) I do agree with Evi that we shouldn't fix the Excel issue in civicrm. So I do agree that it is better to ask MS to change this behaviour in Excel. However as we expert the data in utf-8 format we should at least comply to the utf-8 standards ourselfs. So that means that we should include a Byte Order Mark, at least as far as I know that is part of the UTF-8 standard.

          1) as Evi mentions Excel still has an issue with UTF-8 CSV files. There is probably another workaround for that.

          3) Not sure if the same problem exist on exporting from e.g. Find Contributions. But it make sense to check it at least and fix it if it exist.

          4) I haven't set the next fix version number. So you should ask Lobo. However I am not sure wether it is critical or not critical. I came across this bug in a Belgium organisation where they are going to export at lot of reports to CSV and they have a lot of name's with apostrophes. So from this user perspective it could marked as important.

          Evi Vanoost added a comment -

          Please read this regards UTF-8 BOM

          http://www.w3.org/International/questions/qa-byte-order-mark#problems

          As said above, it results in weird characters on the first line for some applications that do not understand UTF-8. I would make it optional (checkbox)

          The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8.
          http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf page 30, near the bottom

          Jaap Jansma added a comment -

          Evi,

          You are right. The bom is not required and not needed for UTF-8. I always thought it was needed but it is needed for UCS-2 (or UTF-16). And I think excel interprets the UTF-8 files as UTF-16. At least that was what I came through the otherday that an export from excel to csv is or was default UTF-16.

          Anyway so I think we should discard the solution and think of a better solution. And the question is then: is UTF-8 good enough for CSV exports? Or should we make the CSV exports into UTF-16 (so that excel always opens the file correctly). I cannot answer the question, from user perspective opening in Excel is probably what most users do.

          Jaap Jansma added a comment -

          I think Matthieu is discussing probably a solution for the problem. See the forum: http://forum.civicrm.org/index.php/topic,32954.0/topicseen.html

          David Greenberg added a comment -

          Jaap - I'm pushing this out a bit to 4.6 to give us time to figure out a complete solution. If you / Mathieu come up with a patch that everyone is comfortable with quite soon then we can include in 4.5.

          Jaap Jansma added a comment -

          The solution is in an extension: https://civicrm.org/extensions/export-native-excel which exports directly to Excel files (xlsx) which is even more usefull. So I think this issue could be closed in favor of the extension.

            People

            • Assignee:
              Jaap Jansma
              Reporter:
              Jaap Jansma

              Dates

              • Created:
                Updated:
                Resolved: