Diacritical marks in RSS break?

Found a bug? Tell us!!
Post Reply
zoran
Regular
Posts: 71
Joined: Sun Jan 16, 2005 9:13 pm
Location: Amsterdam
Contact:

Diacritical marks in RSS break?

Post by zoran »

Hi all,

I have a name with diacritical marks: Zoran Kovačević
In the RSS it shows up as:

Code: Select all

<author>nospam@example.com (Zoran Kova&# 269;evi&# 263;)</author>
(spaces added to prevent rendering in HTML)

Any ideas if this is correct RSS format and that my clients (RSSOwl, Thunderbird) are buggy? Or is it the UTF-8 local of the feed? Or ...?

Regards,
Zoran
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Diacritical marks in RSS break?

Post by garvinhicking »

Actually, this is a bit of a harder problem. I'll try to describe it.

The special characters you use, are invalid in the ISO-8859-1 character set. Thus when you entered the characters, your browser translated them to HTML entities wit (&#...).

The problem is now, that XML Feeds use htmlspecialchars() for most of your author names and so on. That means in author names etc., HTML is not thought to be allowed, and thus &# get's double encoded to &#, which of course renders wrong in all browsers.

Now there are two ways to get this fixed. One way is to allow HTML entities in Author names, but that can then easily break XML feeds if people use bad characters.

The other solution, and this is the proper one, is for you to convert your blog to a UTF-8 character set. Then you can use your special characters natively, and they will then be put correctly into the feed. However converting an existing blog to UTF-8 is a bit hard, because you need to transcode all your existing entries and change all special chars from their &# code to their real entity...

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
zoran
Regular
Posts: 71
Joined: Sun Jan 16, 2005 9:13 pm
Location: Amsterdam
Contact:

Re: Diacritical marks in RSS break?

Post by zoran »

I changed my blog into utf-8. Changed my name in the personal settings the way you described (and one or two blog entries ;)). Works perfectly.

Thanks. Again!

Zz.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Diacritical marks in RSS break?

Post by garvinhicking »

That's great to hear! I guess it was a bit of work to convert everything to UTF-8?

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
zoran
Regular
Posts: 71
Joined: Sun Jan 16, 2005 9:13 pm
Location: Amsterdam
Contact:

Re: Diacritical marks in RSS break?

Post by zoran »

garvinhicking wrote:That's great to hear! I guess it was a bit of work to convert everything to UTF-8?
No, not too much work. I entries are mostly english and dutch and I usually don't bother with the diacriticals, except for my name ;)
Post Reply