Page 1 of 1

Audioscrobbler Plugin: UTF8 vs. ISO8859-1

Posted: Sun Dec 17, 2006 9:59 pm
by basquiat
My installation of s9y still uses ISO8859-1 as charset, since I'm a bit too short on time to fight with converting the MySQL database from one charset to the other. Hence, "Umlauts" like in "Esbjörn Svensson Trio" or "Trentemøller" are garbled when appearing on my sidebar. I managed to change this by editing

Code: Select all

xml_parser_set_option ($xml_parser, XML_OPTION_TARGET_ENCODING, 'UTF-8');
to

Code: Select all

xml_parser_set_option ($xml_parser, XML_OPTION_TARGET_ENCODING, 'ISO8859-1');
Maybe it would be an idea to make this configurable via the plugin preferences?

Posted: Mon Dec 18, 2006 5:23 pm
by judebert
Your wish is my command. I committed the change. But I can't help but wonder if this should use the blog's encoding automatically. Someone who knows more about PHP XML parsing should take a look and tell me if it's using that value to parse or format the XML.

Posted: Mon Dec 18, 2006 6:38 pm
by mgroeninger
Ok, from http://us3.php.net/manual/en/function.x ... create.php:
The optional encoding specifies the character encoding for the input/output in PHP 4. Starting from PHP 5, the input encoding is automatically detected, so that the encoding parameter specifies only the output encoding. In PHP 4, the default output encoding is the same as the input charset. If empty string is passed, the parser attempts to identify which encoding the document is encoded in by looking at the heading 3 or 4 bytes. In PHP 5.0.0 and 5.0.1, the default output charset is ISO-8859-1, while in PHP 5.0.2 and upper is UTF-8. The supported encodings are ISO-8859-1, UTF-8 and US-ASCII.
So I think the short answer is that there is not great short answer...

That being said, Garvin put a great little work around together for the link_list plugin a while ago. Here is what he did:

He first defined the xml parser as UTF-8 regardless of what the character set is, encodes it into UTF when he passes it to the parser, parsed the data, and then freed the parser:

Code: Select all

        /* XML definitaion */
        $xml = xml_parser_create('UTF-8');
        $linkxml = serendipity_utf8_encode($this->get_config('links'));
        xml_parse_into_struct($xml, '<list>' . $linkxml . '</list>', $struct, $index);
        xml_parser_free($xml);
Then, he has a separate function defined called decode:

Code: Select all

    function decode($string) {
        if (LANG_CHARSET != 'UTF-8') {
            return utf8_decode($string);
        }

        return $string;
    }
So then it is just a matter of calling decode before you output any data from $struct. For example, later on we go:

Code: Select all

                        $str .= 'd.add('.$j.','.$level[count($level)-1].',"'.$this->decode($struct[$i]['attributes']['NAME']).'");'."\n";
I don't know if this is easier than just adding an option the way you did Judebert, since I haven't looked at the plugin, but it would be one way of doing this without the user having to know what charset they are using.

Posted: Thu Dec 28, 2006 8:26 pm
by judebert
If it's good enough for Garvin, it's good enough for me.

I just made this change to the audioscrobbler plugin. Oddly enough, it already used utf8_decode() on the title and artist, so this may not have the desired effect.

Somebody please check it out.

Still encoding problems in audioscrobbler plugin

Posted: Sun Feb 25, 2007 7:24 pm
by freggy
I seem to have the same (or rather: the inverse?) problem.
The URL is http://artipc10.vub.ac.be/serendipity/
The artist name Röyksopp has a question mark as the second character here. Looking in the source of the generator HTML file, there is
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The letter ö on the contrary, seems to be encoded in ISO8859-15.

Anyway, should not special characters like these be replaced by their HTML entity characters (ö in this case)?

Re: Still encoding problems in audioscrobbler plugin

Posted: Mon Feb 26, 2007 9:46 am
by garvinhicking
Hi!

It might be a problem of the source RSS feed, which contrary to its header sents different characters. Sadly calling http://ws.audioscrobbler.com/1.0/user/f ... tracks.xml currently does not yield any output... :(

Best regards,
Garvin

Posted: Mon Feb 26, 2007 7:50 pm
by freggy
I just replayed a Röyksopp song, and I have the impression that the XML file is correct. It says it is encoded in UTF-8, which seems to be the case.

Posted: Tue Feb 27, 2007 11:09 am
by garvinhicking
Hi!

Could you save the output of the XML file as well as the sent HTTP headers, because it's empty again now :)

Best regards,
Garvin

Posted: Thu Mar 01, 2007 10:01 pm
by freggy