Hi!
I have been using serendipity for 6 months now to write a French and Japanese blog. I have had problems with Japanese fonts since the beginning, and the only way to make them appear correctly in the titles, comments or in the upper bar of the navigator was to delete some htmlspecialchars commands here and there.
The thing is that the Japanese part of my texts can be seen correctly in the browser, but that the encoding itself is definitely not correct: if you try to check the page source, japanese doesn't appear and you get this kind of stuff:
De l'origine du monde...& #12288;& #25163;& #37196;
(I put a space between & and # so that it doesn't appear as the correct Japanese character in this entry...)
Precision: usually on Japanese websites, Japanese characters appear correctly in the page source also!
This is quite an important issue for me as I think it is connected with the fact that no Japanese search engine is succeeding in refering my home page (French ones do)!
I tried to change the encoding to UTF-8, but then French accents would not appear anymore...
I also guess this leads to the same kind of problems with the RSS feed: I get a parse error. I put a syndication link temporarily so that you can check it yourself if you want. I also get the "& #12288;& #25163;& #37196;" and "??????????" when I link my blog to the diary of a Japanese "social network" website.
The adress of my blog is:
http://leilujapan.free.fr
If anyone could help me with this issue, I would be really happy to hear his/her comments!
Thanks in advance!
Handling french/japanese text
-
garvinhicking
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Re: Handling french/japanese text
You can only mix Japanese and French if you fully use UTF-8.
You'll need to edit the french language file, set the Charset to "UTF-8" and then (very important) convert the file to UTF-8 format.
Then you will need to re-edit all entries in your database or use a MySQL 4.1 function to convert all your tables to UTF-8 format.
As soon as everything is in UTF-8 you will be able to post+read in french - but your browser needs to support UTF-8 as well.
Best regards,
Garvin
You'll need to edit the french language file, set the Charset to "UTF-8" and then (very important) convert the file to UTF-8 format.
Then you will need to re-edit all entries in your database or use a MySQL 4.1 function to convert all your tables to UTF-8 format.
As soon as everything is in UTF-8 you will be able to post+read in french - but your browser needs to support UTF-8 as well.
Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
-
leilujapan
I am actually the english language template...
I am not sure I am following you, but I guess I should change the /lang/serendipity_lang_en.inc's "LANG_CHARSET" field to UTF-8. And then I should save this file to utf-8? (this is the part that makes me think I am not getting the point...)
As for the conversion part, when you mean re-editing the entries, do you mean re-editing them one by one in the serendipity admin page? The MySQL on the server I use is a 4.0.15, so I guess I will have to do it the long way...
I found this web page:http://tokyoahead.com/main/article.php/ ... 3074103415 about the conversion to utf-8 of databases, is there any chance this will work here?
Sorry for these too naive questions, and thank you very much for your very quick answer and the time you spend on resolving problems!
Sincerely Yours,
Jonathan.
I am not sure I am following you, but I guess I should change the /lang/serendipity_lang_en.inc's "LANG_CHARSET" field to UTF-8. And then I should save this file to utf-8? (this is the part that makes me think I am not getting the point...)
As for the conversion part, when you mean re-editing the entries, do you mean re-editing them one by one in the serendipity admin page? The MySQL on the server I use is a 4.0.15, so I guess I will have to do it the long way...
I found this web page:http://tokyoahead.com/main/article.php/ ... 3074103415 about the conversion to utf-8 of databases, is there any chance this will work here?
Sorry for these too naive questions, and thank you very much for your very quick answer and the time you spend on resolving problems!
Sincerely Yours,
Jonathan.
-
garvinhicking
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
If you use english as the language template, you can save the file in ISO-8859-1 as well (this is the standard format). English does not contain any High-AScii Bytes so it does not matter in which format it is saved in.
For french however, the file contains characters that are high-ascii bytes and when using UTF-8 those language strings need also to be converted to UTF8.
And about the reediting, you definitely need to recode all your entries to UTF-8 format; since you don't have MySQL 4.1 you will need to edit the entries manually to fix those language things. You could also write up a PHP script to convert those fields:
You may need to modify this code for categories and comments and authornames as well, depending on whether you used foreign characters there or not.
The web page you posted also would be a convenient way to do that, yes. But instead of using a geeklog backup you would just need to create a SQL Dump from your entries (using phpMyAdmin), then edit this file, save it, and then upload your SQL dump again.
It may even be easier for you, depending on your contant, so start from scratch and import your entries via RSS from the old blog to the new one?
Best regards and good luck,
Garvin
For french however, the file contains characters that are high-ascii bytes and when using UTF-8 those language strings need also to be converted to UTF8.
And about the reediting, you definitely need to recode all your entries to UTF-8 format; since you don't have MySQL 4.1 you will need to edit the entries manually to fix those language things. You could also write up a PHP script to convert those fields:
Code: Select all
$sql = mysql_query("SELECT * from serendipity_entries");
while ($row = mysql_fetch_array($sql)) {
$new = 'UPDATE serendipity_entries SET ';
$new_vals = array();
foreach($row AS $key => $val) {
$new_vals[] = $key . ' = "' . mysql_escape_string(utf8_encode(utf8_decode($val))) . '"';
}
$new .= implode(', ', $new_vals);
$new .= ' WHERE id = ' . (int)$row['id'];
mysql_query($new);
}
The web page you posted also would be a convenient way to do that, yes. But instead of using a geeklog backup you would just need to create a SQL Dump from your entries (using phpMyAdmin), then edit this file, save it, and then upload your SQL dump again.
It may even be easier for you, depending on your contant, so start from scratch and import your entries via RSS from the old blog to the new one?
Best regards and good luck,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
-
leilujapan
Thanks a lot for your answers. We have been trying to fix this during the week-end with a friend of mine, and ran into several problems. We fixed some but encountered new ones, so I post this in case it is useful to someone or anyone has got a nice suggestion to make.
First, I don't have sufficient rights on the hosting server to change the encoding of the database to UTF-8, and we got an error from the database when we tried to feed UTF-8 entries into it. So we thought, ok, let's not convert the database but the way pages are created each time (not too cool for the hosting server, but, well, anyway).
The encoding problem was the following: database is encoded in ISO-8859-1, so the french accents are OK, but the Japanese text was converted into HTML entities. So everywhere it was necessary (entries, comments...), we first encoded in UTF-8 (utf8-encode) and then used a code found on the net that would parse the & #12345; entities and convert them in the correct unicode character: it is called replace_num_entity and you can find the code here:
http://jp2.php.net/html_entity_decode
So far, everything was fine, entries' title, body, etc... was in unicode. But serendipity filename system uses entries' title to produce the URL, and it didn't like to Japanese characters too much. So we thought we could just modify the filenaming to use only the entry's id, as it seems that's the only part parsed. Surprisingly, this worked fine with entries without comments, but for the ones with comments, we got a "the file is empty" answer...
I am aware that this a question which is not very general, but just in case this reminds you of some problem you had before, it would a great help!
Thanks again,
Best Regards,
Jonathan
First, I don't have sufficient rights on the hosting server to change the encoding of the database to UTF-8, and we got an error from the database when we tried to feed UTF-8 entries into it. So we thought, ok, let's not convert the database but the way pages are created each time (not too cool for the hosting server, but, well, anyway).
The encoding problem was the following: database is encoded in ISO-8859-1, so the french accents are OK, but the Japanese text was converted into HTML entities. So everywhere it was necessary (entries, comments...), we first encoded in UTF-8 (utf8-encode) and then used a code found on the net that would parse the & #12345; entities and convert them in the correct unicode character: it is called replace_num_entity and you can find the code here:
http://jp2.php.net/html_entity_decode
So far, everything was fine, entries' title, body, etc... was in unicode. But serendipity filename system uses entries' title to produce the URL, and it didn't like to Japanese characters too much. So we thought we could just modify the filenaming to use only the entry's id, as it seems that's the only part parsed. Surprisingly, this worked fine with entries without comments, but for the ones with comments, we got a "the file is empty" answer...
I am aware that this a question which is not very general, but just in case this reminds you of some problem you had before, it would a great help!
Thanks again,
Best Regards,
Jonathan
-
leilujapan
-
garvinhicking
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
About the captchas - if that happens it seems your file paths are wrong; try to download the captcha image location via wget and see if there are error messages on top of the file?
Regarding your other problems, do you have a URL where I could have a look at?
Regards,
Garvin
Regarding your other problems, do you have a URL where I could have a look at?
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
-
leilujapan
Thanks for the hint on the captchas! It made me realize that the link was serendipityindex.php without the slash, and that this actually hapened everytime $serendipity['baseURL'] was required.
$serendipity['serendipityHTTPPath'] is ok, with a correct "/" at the end...
Well, I haven't been able to figure out where this variable was modified yet, but I'm working on it.
Unfortunately, I am doing all the tests on my local installation so I can't give you any URL so far...
One more question: is it usual to have a SQL database in UTF-8? how do people usually do?
Thank you very much,
Sincerely Yours,
Jonathan
$serendipity['serendipityHTTPPath'] is ok, with a correct "/" at the end...
Well, I haven't been able to figure out where this variable was modified yet, but I'm working on it.
Unfortunately, I am doing all the tests on my local installation so I can't give you any URL so far...
One more question: is it usual to have a SQL database in UTF-8? how do people usually do?
Thank you very much,
Sincerely Yours,
Jonathan
-
garvinhicking
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Recent MySQL and PostgreSQL versions deal fine with different charsets; basically older MySQL versions could store Unicode, but didn't know that it was storing UTF, so it was your applications turn to do possible translations...
Regards,
Garvin
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
-
leilujapan
Ok, the "/" was nothing, just a mistake in the configuration...
I thought it would fix the link to the entries problem, but I didn't.
Are the entries' pages in cache somewhere, once they have been loaded once? Is there a way to flush it and restart from 0?
It's just a guess, but the thing is that the entries without comments (and which page is likely not to have been accessed separately...) can be seen without problem, whereas the ones with comment link to an empty page.
Best Regards,
Jonathan
I thought it would fix the link to the entries problem, but I didn't.
Are the entries' pages in cache somewhere, once they have been loaded once? Is there a way to flush it and restart from 0?
It's just a guess, but the thing is that the entries without comments (and which page is likely not to have been accessed separately...) can be seen without problem, whereas the ones with comment link to an empty page.
Best Regards,
Jonathan
-
garvinhicking
- Core Developer
- Posts: 30022
- Joined: Tue Sep 16, 2003 9:45 pm
- Location: Cologne, Germany
- Contact:
Ifyou are using the entryproperties plugin then yes, entries can be cached. In that case a "Cache all articles" link will appear in the admin menulist.
But I still don't really understand your problem; I just looked at your webpage and can see all entries fine?
I used those links:
http://leilujapan.free.fr/index.php?/ar ... 2395;.html
http://leilujapan.free.fr/index.php?/ar ... 5311;.html
Regards,
Garvin
But I still don't really understand your problem; I just looked at your webpage and can see all entries fine?
I used those links:
http://leilujapan.free.fr/index.php?/ar ... 2395;.html
http://leilujapan.free.fr/index.php?/ar ... 5311;.html
Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
-
leilujapan