Page 3 of 3

Posted: Fri Feb 17, 2006 10:48 am
by garvinhicking
Hi!

Do you know if there is any difference in the cyrillic characters that show up and those that don't? Do you maybe know if they are 1-byte vs. 2-byte strings?

The "ru_RU.utf-8" strings are just used for the system locale, and those are only used for emitting russian date formatting syntaxes; they are unrelated to any database in- or output.

On my server I am using PHP 5.1.2, UTF-8 and MySQL 4.1.7, where it works as expected with all umlauts or special characters I know (plus chinese)

About other russian users, you might want to contact Nightly, the translator of the russian file, whose mail address is inside the file. He's also an experienced developer, and russian too, so he might have a clue.

Meanwhile, I would like you to test a little script and save it as "utf_test.php" inside your serendipity folder and call it via PHP. It will insert some test strings and read them. Please make sure that you save that file with UTF-8 character set and that you insert cyrillic strings into the $test_string variable.

Code: Select all

<?php
$teststring = 'äöüß';
include 'serendipity_config.inc.php';
header('Content-Type: text/html; charset=utf-8');
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html>
<head><title>Cyrillic Test</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<?php
echo "SET UTF-8...<br />\n";
$x = serendipity_db_query("SET NAMES utf8");
echo $x . "<br />\n";

echo "CREATING Table...<br />\n";
serendipity_db_query("DROP TABLE IF EXISTS `utf_test`");
$x = serendipity_db_query("
CREATE TABLE `utf_test` (
    `key` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `value` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;");

echo $x . "<br />\n";

echo "INSERTING '$teststring'...<br />\n";
$x = serendipity_db_query("INSERT INTO utf_test (value) VALUES ('$teststring')");
echo $x . "<br />\n";

echo "SELECTING text...<br />\n";
$x = serendipity_db_query("SELECT * FROM utf_test");
echo "<pre>" . print_r($x, true) . "</pre><br />\n";
?>

</body>
</html>
Looking forward to the results,
Garvin

Posted: Fri Feb 17, 2006 6:51 pm
by RavenH
Hi Garvin,

I'm quite willing to go to the bottom of this ;-)

However, I'm no coder and my PHP knowledge is close to nil. :lol: So you will have to do this one step after the other for me. Please tell me where precisely I need to input those Cyrillic strings within this piece of code, as what I shall save it (filename, format etc.), where I should put it and what I should use to call it. Sorry if I sound obtuse, but when it comes to such stuff, I usually prefer being safe rather than sorry and have to repeat things.

Greetings

Raven

Posted: Sat Feb 18, 2006 5:23 pm
by garvinhicking
Hi Raven!

Great! I'm sorry I did not include the necessary details. From your description I figured you were quite familiar with this now.

1. Copy + paste the code snippet from my posting into your editor.
2. Make sure your editor is saving the file in UTF-8 / Unicode format
3. Go to the beginning of the file, and where you currently see "$teststring = 'äöüß';" you add your cyrillic characters before or after the words "äöüß". If I were able to type cyrillic characters, I would have inserted some as a test. :-)
4. Now save that file as "utf_test.php" inside your serendipity directory, via FTP or SCP or whatever you use to upload files.
5. Now call utf_test.php via your browser: http://host/serendipity/utf_test.php

The output of that script is then, what I'd need. :)

Best regards,
Garvin

Posted: Mon Feb 20, 2006 8:12 am
by RavenH
Hi Garvin,

ok, here's the output:

Image

And the Cyrillic is clean as a whistle and works with all letters.

Greetings

Raven

Posted: Mon Feb 20, 2006 11:05 am
by garvinhicking
Now, that's strange.

But it indicates, that we can solve the problem. :)

Please use phpMyAdmin or the MySQL tool of your choice to compare the 'utf_test' table with your entries table. Please check if the collation and the character set of the tables and columns are the same.

Then we will proceed and eliminate one possible problem at a time. The next would be testing the script from above by entering characters via a browser form.

Regards,
Garvin

Posted: Mon Feb 20, 2006 11:27 am
by RavenH
Hi Garvin,

hopefully this is what you need as info...

This is the utf8-test table:

Image

Image

This is the entries table:

Image

Image

Greetings

Raven

Posted: Mon Feb 20, 2006 11:29 am
by garvinhicking
You might already have spotted the difference in "utf8_general_ci" and "utf8_unicode_ci". Can you change the collation and charsets of your serendipity_entries table from "utf8_unicode_ci" to "utf8_general_ci", please?

This might make a difference when creating a new entry...

If not, report back as usual :-D

Best regards,
Garvin

Posted: Mon Feb 20, 2006 11:46 am
by RavenH
Hi again,

changed charset collation and, err, no effect, I'm sorry to say.

Greetings

Raven

Posted: Mon Feb 20, 2006 11:57 am
by garvinhicking
Okay, so please now replace your utf_test.php with the code below. You will get a page where you can enter your cyrillic characters into an input field. Please also try to change the collation via the radio button of utf_test.php

Please test it again and show me the output as you did before. Could you also create a file somewhere or post here on the forums the cyrillic characters you use? So that I can copy and paste them to try it myself?

Code: Select all

<?php
$teststring = 'äöüß';
include 'serendipity_config.inc.php';
header('Content-Type: text/html; charset=utf-8');
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html>
<head><title>Cyrillic Test</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>

<form action="utf_test.php" method="post">
    <input type="text" name="teststring" value="<?php echo $teststring; ?>" /><br />
    <input type="radio" name="collation" value="utf8_general_ci" /> utf8_general<br />
    <input type="radio" name="collation" value="utf8_unicode_ci" checked="checked" /> utf8_unicode<br />
    <input type="submit" name="submit" value="Go!" />
</form>

<?php
$collation = $_REQUEST['collation'];
if (empty($collation)) {
    $collation = 'utf8_unicode_ci';
}

echo "SET UTF-8...<br />\n";
$x = serendipity_db_query("SET NAMES utf8");
echo $x . "<br />\n";

echo "CREATING Table...<br />\n";
serendipity_db_query("DROP TABLE IF EXISTS `utf_test`");
$x = serendipity_db_query("
CREATE TABLE `utf_test` (
    `key` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `value` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE $collation NOT NULL
) CHARACTER SET utf8 COLLATE $collation;");

echo $x . "<br />\n";

echo "INSERTING '$teststring'...<br />\n";
$x = serendipity_db_query("INSERT INTO utf_test (value) VALUES ('$teststring')");
echo $x . "<br />\n";

if (!empty($_REQUEST['teststring'])) {
    echo "INSERTING POST '" . $_REQUEST['teststring'] . "'...<br />\n";
    $x = serendipity_db_query("INSERT INTO utf_test (value) VALUES ('escaped: " . serendipity_db_escape_string($_REQUEST['teststring']) . "')");
    echo $x . "<br />\n";
    $x = serendipity_db_query("INSERT INTO utf_test (value) VALUES ('real: " . $_REQUEST['teststring'] . "')");
    echo $x . "<br />\n";
}

echo "SELECTING text...<br />\n";
$x = serendipity_db_query("SELECT * FROM utf_test");
echo "<pre>" . print_r($x, true) . "</pre><br />\n";
?>

</body>
</html>
Thanks,
Garvin

Posted: Mon Feb 20, 2006 12:53 pm
by RavenH
Hi Garvin,

will do ... and report back :)

As to cyrillic script, here are ressources with which it is very easy to set up Cyrillic on your PC for testing purposes:

http://aatseel.org/fonts/wincyrillic.html

http://ourworld.compuserve.com/homepages/PaulGor/

Essentially all you need are Cyrillic-able fonts (Windows has full Unicode fonts built in usually), a keyboard driver to switch between cyrillic and latin, and you need to enable the support for the Russian language and driver set in Windows itself. Takes rarely more than a moment to set up.

Greetings

Raven

Posted: Mon Feb 20, 2006 1:12 pm
by RavenH
Hi again,

ok - here are the results. It didn't matter whether utf8-unicode-ci or utf8-general-ci was used, again both types produced nice cyrillic.

Image

Greetings

Raven

Posted: Mon Feb 20, 2006 1:20 pm
by garvinhicking
Could you please help me in the second issue I wrote about? A copy+pastable version of your cyrillic strings? I can't fetch them out of your image :-D

Please also tell me which Event Plugins you have installed in the blog where the strange characters happen. It might be that one of them is interfering.

It seems we are getting close. The script I mailed you uses the serendipity framework for fetching and inserting data, so it should basically work. There are only some things to eradicate, which I'd like to try on my setup.

Regards,
Garvin

Posted: Mon Feb 20, 2006 1:29 pm
by RavenH
Hi Garvin,

I use a basic installation (I haven't gotten farther so far, the charset problem hit right away LOL), so there are no additional plugins installed, I haven't even worked with the template/theme either. It's as one sets it up from the install.php.

Could you please pm me your email adress? Then I can mail you a notepad++ text file with the complete alphabet in UTF8)

Greetings

Raven

Posted: Mon Feb 20, 2006 1:33 pm
by garvinhicking
Hi Raven!

My mail adress is contained in my forum signature (the one with icq, url links).

Please also attach me a file with the exact characters you used for testing in your image.

I'll then get back to you via email, if that is okay with you.

Best regards,
Garvin

Posted: Thu Sep 21, 2006 12:24 pm
by ihra
Problem with calendar days / months was solved in my case (Fedora Core 4 and Trustix Secure Linux 3) my editing /etc/locale.conf and uncommenting fi_FI (both ISO and UTF-8), restarting apache and MySQL.

Works fine now.