[CRM-93] string truncation for display happening mid-character - CiviCRM Issue Tracker

Details

Type: Bug
Status: Done/Fixed
Priority: Minor
Resolution: Fixed/Completed
Affects Version/s: None
Fix Version/s: 1.0
Component/s: Internationalisation
Labels:
None

Description

In various places CiviCRM truncates the strings for display purposes (example: street and email addresses in search results). The truncation seems to be byte-based; i.e., the string is chopped after X bytes, not after X characters.

This didn?t matter when the data contained only ASCII characters, as one character always equaled one byte. Now, when we?re supporting UTF-8-encoded non-ASCII characters, we should be sure the string is being truncated on character boundaries.

For example, the string ód consists of four characters encoded on seven bytes (the letters ???, ?ó? and ??? are two bytes each in UTF-8). When truncating the cut must happen either after the second, fourth or fifth byte.

This issue can be resolved in two ways, either by changing all of the string-manipulating functions to their multibyte counterparts, or by setting a PHP variable that would overload the ?old? function calls with the mb_* function calls. I?ll run some test and try to anticipate the ramifications of either method.

Attachments

Activity

People

Assignee:

Piotr Szotkowski

Reporter:

Piotr Szotkowski

Votes:

0 Vote for this issue

Watchers:

0 Start watching this issue

Dates

Created:

2005-05-23 12:57

Updated:

2008-12-08 12:42

Resolved:

2005-05-30 10:34