Page 1 of 1
Diacritical marks in RSS break?
Posted: Thu Nov 10, 2005 1:48 pm
by zoran
Hi all,
I have a name with diacritical marks: Zoran Kovačević
In the RSS it shows up as:
Code: Select all
<author>nospam@example.com (Zoran Kova&# 269;evi&# 263;)</author>
(spaces added to prevent rendering in HTML)
Any ideas if this is correct RSS format and that my clients (RSSOwl, Thunderbird) are buggy? Or is it the UTF-8 local of the feed? Or ...?
Regards,
Zoran
Re: Diacritical marks in RSS break?
Posted: Thu Nov 10, 2005 2:01 pm
by garvinhicking
Actually, this is a bit of a harder problem. I'll try to describe it.
The special characters you use, are invalid in the ISO-8859-1 character set. Thus when you entered the characters, your browser translated them to HTML entities wit (&#...).
The problem is now, that XML Feeds use htmlspecialchars() for most of your author names and so on. That means in author names etc., HTML is not thought to be allowed, and thus &# get's double encoded to &#, which of course renders wrong in all browsers.
Now there are two ways to get this fixed. One way is to allow HTML entities in Author names, but that can then easily break XML feeds if people use bad characters.
The other solution, and this is the proper one, is for you to convert your blog to a UTF-8 character set. Then you can use your special characters natively, and they will then be put correctly into the feed. However converting an existing blog to UTF-8 is a bit hard, because you need to transcode all your existing entries and change all special chars from their &# code to their real entity...
Best regards,
Garvin
Re: Diacritical marks in RSS break?
Posted: Thu Nov 10, 2005 11:22 pm
by zoran
I changed my blog into utf-8. Changed my name in the personal settings the way you described (and one or two blog entries

). Works perfectly.
Thanks. Again!
Zz.
Re: Diacritical marks in RSS break?
Posted: Fri Nov 11, 2005 2:22 am
by garvinhicking
That's great to hear! I guess it was a bit of work to convert everything to UTF-8?
Regards,
Garvin
Re: Diacritical marks in RSS break?
Posted: Fri Nov 11, 2005 11:07 am
by zoran
garvinhicking wrote:That's great to hear! I guess it was a bit of work to convert everything to UTF-8?
No, not too much work. I entries are mostly english and dutch and I usually don't bother with the diacriticals, except for my name
