Page 1 of 1

wondering if there is any plugin for searching chinese

Posted: Fri Apr 10, 2009 12:18 pm
by tianyi
I am a user from China, and I need a plugin for searching chinese in my blog. Can anyone develop a plugin like this?

Re: wondering if there is any plugin for searching chinese

Posted: Fri Apr 10, 2009 5:55 pm
by judebert
It was my impression that the Quicksearch plugin would search your blog regardless of language, as long as your database and blog use the same character encoding. Are you actually having problems with searching?

Re: wondering if there is any plugin for searching chinese

Posted: Mon Apr 13, 2009 4:22 am
by tianyi
judebert wrote:It was my impression that the Quicksearch plugin would search your blog regardless of language, as long as your database and blog use the same character encoding. Are you actually having problems with searching?
Yes, I do have the difficulties in searching. xu-kaidotcom is my own blog, at the right column of the blog is the quick search plugin. However, can not search any chinese character. It is an problem since long ago. I guess that nobody came here to report this problem, but modify the codes.

/include/functions_entries.inc.php

Code: Select all

if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
} 
change to

Code: Select all

if(preg_match("/^[x80-xff]+$/", $term))
{
$cond['find_part'] = "((e.title LIKE ('%" . addslashes($term) . "%')) or (e.body LIKE ('%" . addslashes($term) . "%')) or (e.extended LIKE ('%" . addslashes($term) . "%')))";
}
else
{
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
} 

Re: wondering if there is any plugin for searching chinese

Posted: Mon Apr 13, 2009 4:30 am
by tianyi
This method is available but not a good way.
Modifying the code is not convinient and the result is not exactly what i want.
I hope that there is a plugin to solve this problem

Re: wondering if there is any plugin for searching chinese

Posted: Mon Apr 13, 2009 1:42 pm
by garvinhicking
Hi!

Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!

Regards,
Garvin

Re: wondering if there is any plugin for searching chinese

Posted: Tue Apr 14, 2009 2:48 am
by tianyi
garvinhicking wrote:Hi!

Are you using the most recent MySQL version? This one should have charset support, and if your database is in UTF-8 format, your chinese characters should be included there and be no different than german umlauts?!

Regards,
Garvin
Mysql Version: 5.0.51a-community
I think it is a new one, thought not a latest one. :shock:

Re: wondering if there is any plugin for searching chinese

Posted: Wed Apr 15, 2009 7:19 am
by tianyi
Hello, anyone to answer my question? :mrgreen:

Re: wondering if there is any plugin for searching chinese

Posted: Wed Apr 15, 2009 10:00 am
by garvinhicking
Hi!
tianyi wrote:Hello, anyone to answer my question? :mrgreen:
Sadly not; only that quicksearch should work with MATCH AGAINST. Maybe mysql specific forums could help here.

Sadly I do not know any chinese or japanese, so I have no way of testing this...

Regards,
Garvin

Re: wondering if there is any plugin for searching chinese

Posted: Wed Apr 15, 2009 6:17 pm
by judebert
Perhaps the obvious question: Are your blog and your database both using UTF-8?

Re: wondering if there is any plugin for searching chinese

Posted: Fri Apr 17, 2009 2:28 am
by tianyi
judebert wrote:Perhaps the obvious question: Are your blog and your database both using UTF-8?
mysqlutf8.png
mysqlutf8.png (1.59 KiB) Viewed 8928 times

Re: wondering if there is any plugin for searching chinese

Posted: Wed Feb 03, 2010 2:18 pm
by ayamico
I have been using serendipity for 5 years. Chinese searching has never worked. I decided to do some searching on why it does not work.

It looks like it is because mySQL full text search mode cannot support Chinese / Japanese character searching.

What I have found is "For FULLTEXT searches, we need to know where words begin and end. With Western languages, this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary — the space character. However, this is not usually the case with Asian writing. We could use arbitrary halfway measures, like assuming that all Han characters represent words, or (for Japanese) depending on changes from Katakana to Hiragana due to grammatical endings. However, the only sure solution requires a comprehensive word list, which means that we would have to include a dictionary in the server for each Asian language supported."

http://blogs.sun.com/soapbox/entry/full ... uages_with

I can understand now why the following code modification is made

Code: Select all

if(preg_match("/^[x80-xff]+$/", $term))
{
$cond['find_part'] = "((e.title LIKE ('%" . addslashes($term) . "%')) or (e.body LIKE ('%" . addslashes($term) . "%')) or (e.extended LIKE ('%" . addslashes($term) . "%')))";
}
else
{
if (preg_match('@["+-*~<>()]+@', $term)) {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
} 
Looking at the current serendipity codes, the above changes cannot be made in a plugin.

Any suggestion ?