BUG: XHTML Phrase Elements in entry title mis-encoded

Found a bug? Tell us!!
Post Reply
marktaff
Posts: 3
Joined: Sat Jul 08, 2006 11:53 pm
Location: Bellevue, WA
Contact:

BUG: XHTML Phrase Elements in entry title mis-encoded

Post by marktaff »

In version 1.0, XHTML phrase elements such as <code>, <cite>, <abbr>, etc, are mis-encoded prior to rendering, eg

Code: Select all

s/<code>/<code>/
The offending code:

Code: Select all

            $entry['title']     = htmlspecialchars($entry['title']);
is in include/function_entries.inc.php on approx line 927 (my working file isn't clean anymore).

Using htmlspecialchars() prevents the xhtml from being rendered as xhtml.

I am sure people would want some *entities* translated, such as "<" and "&", I can't think of any reason to translate an xhtml tag. So we probably need code that replaces "<", ">", "&", plus the other common ones like (c) and (tm), but that leaves xhtml elements, especially phrasal elements, alone.

Probably the best approach is to use preg_match() to match those parts of the string that don't match a tag pattern, then pass those parts through htmlentities().

Regards,

Mark
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: BUG: XHTML Phrase Elements in entry title mis-encoded

Post by garvinhicking »

Hi!

HTML code in entry titles is not supported by Serendipity, this is intentional to prohibit RSS feed errors or strange rendering of Titles.

It can introduce all sort of problems, if suddenly HTML were allowed in titles, including a BC break.

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
marktaff
Posts: 3
Joined: Sat Jul 08, 2006 11:53 pm
Location: Bellevue, WA
Contact:

Post by marktaff »

Thanks for the reply, but I'm not buying. ;-)

1. English grammar requires a title within a title to be emphasized, in XHTML we use <cite> for this. Eg "Taff Reviews <cite>Lord of the Rings</cite>".

2. Accessibility standards require that the first use of an acronym or abbreviation be tagged with the appropriate <abbr> or <acronym> tag, and often that may happen in the title. Even the xhtml standard recommends this.

3. You may also need to have code in the title to set off Linux commands, eg "Howto use<code>rsync</code> to Backup your Data".

BC is no excuse to perpetuate a design flaw. Worst case it requires a config option to "Allow phrasal XHTML elements in entry titles".

There is no reason you can't have two versions of the title string:

Code: Select all

$xhtmlTitle   // title with phrasal xhtml elements
$textTitle    // title with unescaped* xhtml tags stripped
You could use the text version for feeds, and the xhtml version for display on the web page.

* I say unescaped, because you have to have a mechanism to allow for a title with an xhtml or xml tag in it, eg "Proper Usage of the <abbr> element in XHTML 1.1". Conventional backslash coding should suffice such as "\<abbr\>".

As for strange rendering of titles, phrasal elements are inline, and shouldn't cause any damage. Though if someone starts using block level elements, that could certainly cause issues. However, this is at best a weak reason not do allow phrasal elements, as we can filter out non-phrasal elements.

I hope you can see that serendipity's current handling of titles is broken as noted above.

Obviously I shall have to fix this in my copy. Are you still set against fixing the svn version? I was hoping s9y would be a standards-compliant blog, as I really don't want to be bothered with applying a multitude of personal patches each release, or having to write my own blog.

Hoping you reconsider,

Mark
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

Mark has a point about elements such as cite being inline, however in many of my templates I have converted code elements to block level so that they stand out within entries. This of course means that we would need to convert these back to inline just for the entry title, so if you are going to start making these changes then could I suggest you restrict the number of html tags to just a handful and then fully document which ones are available to users so that template designers are able to update all old templates. This would be a lot of work by the way, and might cause a lot of broken installations if users update serendipity but fail to update their template.

Personally I'm in favour of introducing Mark's changes, but... I would need good lead time to update my templates, and I suspect a few of the other template designers wouldn't even know what we're talking about so we might expect a lot of quesries in the forums for a long time after the change has been made - is there a better way of implementing this?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!
1. English grammar requires a title within a title to be emphasized, in XHTML we use <cite> for this. Eg "Taff Reviews <cite>Lord of the Rings</cite>".
That might be true, but many templates use "<h3>" or other tags for titles, and HTML there is forbidden. Also in RSS readers, HTML tags in titles/subjects are not supported.

You did already patch up your own code to support that, right? Did you already check how things are working with that?

I have now committed a patch to Serendipity 1.1 (SVN Trunk) that creates a new variable $entry['html_title'] that holds the un-htmlspecialchar'd content of an entry, which you can use via {$entry.html_title} in your custom template. Then you don't need to hack the s9y code anymore, you just need to create your custom template.

For the general user, I am still very convinced that allowing this functionality would create confusion and BC breaks, leaving people wondering. For specific users like you, I think the way above should be in order. Or maybe you have a better idea/option?

The "config option" might be workable, but because of the <h3> tag issues above, I think it should more be a template issue than a global config issue. I'd also like to keep configuration value cluttering as low as possible.

BC is very vital to Serendipity, so I hope you can understand my reasoning. But I still agree that users like you want to have an accomodation to the problem. So I hope we can figure out something that works for both of us. :)
* I say unescaped, because you have to have a mechanism to allow for a title with an xhtml or xml tag in it, eg "Proper Usage of the <abbr> element in XHTML 1.1". Conventional backslash coding should suffice such as "\<abbr\>".
Are you a bit skilled with PHP and could flesh out a small algorithm for the replacement rules you'd like to have? If you have no PHP knowledge, maybe a verbose english description of what needs to be replaced how could help me. I find \< escaping quite strange and have never yet seen this somewhere for <> characters...I don't know if people would know that this was possible?

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Thanks, Garvin. I've been avoiding HTML in titles since I made my "<strike>Victim</strike>Donor" entry for my EV blog. At least now I can modify the template to include the strikeout.
Judebert
---
Website | Wishlist | PayPal
marktaff
Posts: 3
Joined: Sat Jul 08, 2006 11:53 pm
Location: Bellevue, WA
Contact:

Post by marktaff »

Phrasal elements are allowed *anywhere* text is valid. This can be validated with the w3c validator using <code> within an <h3>.
Also in RSS readers, HTML tags in titles/subjects are not supported.
Yes, hence the need for a plain-text version of the title used for links, rdf, and anywhere else that won't render xhtml. And then an xhtml "pretty title" for display in places that will render xhtml, like the title that gets displayed.
You did already patch up your own code to support that, right? Did you already check how things are working with that?
So far I've just comment out that offending call. I did get started a coding a perm solution, but work interrupted. I will supply a patch when all is done for review. Things seem to work OK, but urls for reading the story have the tag minus <> in them, that will have to be repaired.
I have now committed a patch to Serendipity 1.1 (SVN Trunk) that creates a new variable $entry['html_title'] that holds the un-htmlspecialchar'd content of an entry, which you can use via {$entry.html_title} in your custom template. Then you don't need to hack the s9y code anymore, you just need to create your custom template.
html_title will work for advanced users--we just have to take care of our own tags and entities. But I don't see any reason that we can't allow for using xhtml phrasal elements and entities in rendered and substituted form.
For the general user, I am still very convinced that allowing this functionality would create confusion and BC breaks, leaving people wondering. For specific users like you, I think the way above should be in order. Or maybe you have a better idea/option?

The "config option" might be workable, but because of the <h3> tag issues above, I think it should more be a template issue than a global config issue. I'd also like to keep configuration value cluttering as low as possible.
I'm all for BC, let's just not cripple the present and the future at the same time. We should be able to do both. ;-)
BC is very vital to Serendipity, so I hope you can understand my reasoning. But I still agree that users like you want to have an accomodation to the problem. So I hope we can figure out something that works for both of us. :)
Are you a bit skilled with PHP and could flesh out a small algorithm for the replacement rules you'd like to have? If you have no PHP knowledge, maybe a verbose english description of what needs to be replaced how could help me. I find \< escaping quite strange and have never yet seen this somewhere for <> characters...I don't know if people would know that this was possible?
\ is the standard perl escape character, also used by many other systems, including PHP in preg_*. While you don't *need* to escape <> in preg_*, it will nonetheless work. Truthfully, the escape character could be anything, but I think we should stick with one that a) some already know, and b) if they learn it for the first time, it will apply to other things. We will have to tell them somewhere that if they want a title like:

"Proper use the <code> tag in XHTML 1.1" that they will have to write that as \<code\>

In reply to Carl, I submit that making <code> (an inline element) display as block globally in a style sheet is a bad idea. This would fail with a sentence like: This brief howto will cover my use of <code>rsync</code> to backup the critical data on my home network. <code> is the proper semantic markup here.

I submit that anyone smart enough to put in a <code> tag is (or ought to be) smart enough to make it a <code style="display: block;"> or perhaps <code class="block">.[/quote]

I'll keep hacking on a patch.

Regards,

Mark
carl_galloway
Regular
Posts: 1331
Joined: Sun Dec 04, 2005 5:43 pm
Location: Andalucia, Spain
Contact:

Post by carl_galloway »

@Mark, you're right in a sense, but blogging is the easiest way to create a web presence for non html literate people and I take the approach that my templates need to match their needs, most would never know about using the code element, but if they don't want to use a wysiwyg editor then the only way to add some extra styling to an entry is to find a beginners guide to html, most of which won't mention the distinction between block and inline so styling the element as block level gives those people a method of adding some extra styling without having to learn html and css.

That said, my concerns don't appear to be relevant to this discussion now that Garvin has patched the files and provided the extra hook that you need so I'll refrain from commenting further.

Cheers, Carl
Post Reply