In various places CiviCRM truncates the strings for display purposes (example: street and email addresses in search results). The truncation seems to be byte-based; i.e., the string is chopped after X bytes, not after X characters.
This didn?t matter when the data contained only ASCII characters, as one character always equaled one byte. Now, when we?re supporting UTF-8-encoded non-ASCII characters, we should be sure the string is being truncated on character boundaries.
For example, the string ód consists of four characters encoded on seven bytes (the letters ???, ?ó? and ??? are two bytes each in UTF-8). When truncating the cut must happen either after the second, fourth or fifth byte.
This issue can be resolved in two ways, either by changing all of the string-manipulating functions to their multibyte counterparts, or by setting a PHP variable that would overload the ?old? function calls with the mb_* function calls. I?ll run some test and try to anticipate the ramifications of either method.