Page 1 of 1

garbled characters in templates

Posted: Tue Nov 21, 2006 7:26 pm
by deminy
This bug exists for more than 1 year, and never get fixed.

I didn't report this bug before because this bug exists in templates only, not in the main source code. Meanwhile, this bug exsits in many different themes, which might be a "huge" work for developers/contributors to fix it.

In last several months, some Chinese users asked me how to fix it, and I replied them in my guestbook. After reading Garvin's post "Serendipity 1.1 release cycle", I finally decided to report this bug.

Well, the bug is simple:

In template file "templates/..../entries.tpl", The following line generates garbled characters (for East Asian languages) on web pages:

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:htmlall}');
To fix it, just change it to:

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:html}');
The difference between the two output is shown below:

Image

Image

Posted: Tue Nov 21, 2006 7:41 pm
by carl_galloway
I will happily update all of my templates to include your solution if one of the developers can confirm this won't cause problems elsewhere. Thanks for letting us know this is a problem for East Asian languages.

Posted: Tue Nov 21, 2006 10:36 pm
by judebert
I can't claim it won't cause *any* problems, but it looks pretty good. The only difference is whether ALL html entities get escaped, or only &"'<>*.

Looks like one of the other HTML entities (like what?) is getting used as part of a Chinese UTF-8 character that Smarty doesn't recognize.

deminy, could you try instead:

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:htmlall:UTF-8}');
Of course, substitute the encoding your blog actually uses; you can find a list of supported encodings at http://us3.php.net/htmlentities if you're interested.

And naturally, if someone can actually find how to substitute the blog encoding there, it would just foolproof the whole thing.

Posted: Tue Nov 21, 2006 11:05 pm
by deminy
Hi, Garvin,

Your suggestion is great. The following Smarty variable/modifier will print proper (Chinese) characters on the web page:
{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"htmlall":"UTF-8"}
But, in this template file (entries.tpl), since the "escape" Smarty modifier is used in an "a href" HTML tag, we can not use quotation marks here again. Without quotation marks, the "escape" modifier won't work properly (a fatal PHP error will be created).

Posted: Wed Nov 22, 2006 10:35 am
by garvinhicking
Hi Guys!

(Deminy, note that the UTF-8 hint was from the awesome Judebert, not me *g*)

I'm a bit confused to what the better solution is. I've never used the second "UTF-8" parameter to escaping. Might that make trouble in "native" charsets, when people use ISO or even koi8r or other charsets?

So currently I'm thinking that replacing 'htmlall' with 'html' could do the job better for all users? As long as it escapes quotation marks, we should be fine.

Deminy, I sadly don'T understand what you mean with this:
But, in this template file (entries.tpl), since the "escape" Smarty modifier is used in an "a href" HTML tag, we can not use quotation marks here again. Without quotation marks, the "escape" modifier won't work properly (a fatal PHP error will be created).
The goal is that the JS-call will not use ", but instead & quot;...?

Best regards,
Garvin

Posted: Wed Nov 22, 2006 6:39 pm
by deminy
I think there are several solutions for this problem, but none of them can completely solve the problem.

1. For multi-language support, officially use UTF-8 format only: All language packages should be written in UTF-8 format. As we can see, most PHP functions could handle UTF-8 strings correctly.

People can also use other character sets (without too much problems), but this is not officially recommended.

2. When using function "htmlentities" to translate a string, convert the string to UTF-8 first, and then call function "htmlentities" to translate the UTF-8 string.

The problem for this solution is, we might not be able to translate the UTF-8 string back to the original character encoding.

A similar idea was mentioned by Cameron on php.net. He provided a solution shown how to simulate function "htmlentities" for multi-byte strings without causing any problem.


3. Use

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html"}

instead of

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"htmlall"}

(without specifying the 2nd parameter for "escape")

Here, parameter "html" is better than parameter "htmlall" because "html" uses function htmlspecialchars, while "htmlall" uses function htmlentities. As I can see, function "htmlspecialchars" is much safer when handling multi-byte strings.

This solution is not perfect, but it might be the best current available solution.

4. A more proper way to use the modifier "escape" is

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html":$_charset}

, but not

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html"}

This could cause some serious problems, since some character sets are not supported by the modifier "escape" (which uses function “htmlspecialchars” here).

Posted: Wed Nov 22, 2006 6:42 pm
by garvinhicking
Hi!
1. For multi-language support, officially use UTF-8 format only: All language packages should be written in UTF-8 format. As we can see, most PHP functions could handle UTF-8 strings correctly.
This is not an option for us, we definitely want to preserve the ISO-functionality, because existing blogs are not easily updatable to UTF-8.
2. When using function "htmlentities" to translate a string, convert the string to UTF-8 first, and then call function "htmlentities" to translate the UTF-8 string.

The problem for this solution is, we might not be able to translate the UTF-8 string back to the original character encoding.
Exactly, once the string is UTF-8 it would be hard to reconvert it. Plus, its performance impact would be huge!
3. Use

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"html"}

instead of

{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:"htmlall"}

(without specifying the 2nd parameter for "escape")

Here, parameter "html" is better than parameter "htmlall" because "html" uses function htmlspecialchars, while "htmlall" uses function htmlentities. As I can see, function "htmlspecialchars" is much safer when handling multi-byte strings.

This solution is not perfect, but it might be the best current available solution.
Okay, I think I now understand. I'm also much in favor of this option.

If no one disagrees, I'd patch all internal templates to use the 'html' modifier then?

Regards,
Garvin

Posted: Wed Nov 22, 2006 6:52 pm
by carl_galloway
So should all templates now start to use

Code: Select all

alert('{$CONST.TRACKBACK_SPECIFIC_ON_CLICK|@escape:html}');
If so, Garvin do you want template designers to update their zipfiles?

Posted: Wed Nov 22, 2006 7:22 pm
by garvinhicking
Hi Carl!

That's okay, I will just search+replace all occurences in both bundled as well as all spartacus files. Only if people like you also offer a download on their servers, it would be nice to get them in sync :)

Regards,
Garvin

Posted: Wed Nov 22, 2006 11:34 pm
by carl_galloway
ok I'll go and update my template zipfiles over the next few days, I wonder if it would be worthwhile making an announcement in the themes forum in case other designers aren't following this thread?

Carl

Posted: Thu Nov 23, 2006 12:27 pm
by garvinhicking
Hi Carl!

Just committed the changes. If you want to make an announcement,please go ahead!

Best regards,
Garvin