Page 1 of 1

QuickSearch and 8bit terms

Posted: Sun Sep 11, 2011 5:35 pm
by LazyBadger
Maybe I miss some settings (please direct me to RTFM), but
UTF8-database with utf8 general-ci + utf8 (russian) texts give me bad and unexpected results - nothing found, nothing filtered - when I try to find any text (which exist in texts) in russian
More exactly - search-result window will show almost all (pure english content excluded) records instead of requested
Example (just one) - tried to search word =search&serendipity[searchTerm]=%D0%B2%D0%BE%D0%B7%D1%80%D0%B0%D1%81%D1%82&serendipity[searchButton]=%3E]возраст from this entry
English term
=search&serendipity[searchTerm]=idiot&serendipity[searchButton]=%3E]idiot found and filtered correctly

Re: QuickSearch and 8bit terms

Posted: Mon Sep 12, 2011 4:46 pm
by garvinhicking
Hi!

Which MySQL version are you using? If you have phpmyadmin, try to execute this query:

Code: Select all

SELECT * FROM serendipity_entries WHERE MATCH(title,body,extended) AGAINST('Russioan characters here')
and see if that returns proper entries? Maybe the UTF-8 characters are not properly en/decoded, you could edit the include/functions_entries.inc.php file and in the serendipity_searchEntries() function replace this:

Code: Select all

$search =& serendipity_db_query($querystring);
with:

Code: Select all

$search =& serendipity_db_query($querystring);
echo "QUERY: " . $querystring;
to see what the actuall sql query is...

HTH,
Garvin

Re: QuickSearch and 8bit terms

Posted: Mon Sep 12, 2011 7:00 pm
by LazyBadger
garvinhicking wrote: Which MySQL version are you using?
Server version: 5.1.49
If you have phpmyadmin, try to execute this query
0 strings
Maybe the UTF-8 characters are not properly en/decoded
At least it seems so: on UTF8 PHPAdmin page all russian content are shown in unreadable form
echo "QUERY: " . $querystring;
Query 1 - "разница"

Code: Select all

QUERY: SELECT ep_sticky.value AS orderkey, ep_cache_extended.value AS ep_cache_extended, ep_cache_body.value AS ep_cache_body, e.id, e.authorid, a.realname AS author, e.allow_comments, e.moderate_comments, a.email, e.timestamp, e.comments, e.title, e.body, e.extended, e.trackbacks, e.exflag, e.isdraft, e.last_modified, a.username AS loginname FROM s_entries e LEFT JOIN s_authors a ON e.authorid = a.authorid LEFT JOIN s_entrycat ec ON e.id = ec.entryid LEFT OUTER JOIN s_entryproperties ep_cache_extended ON (e.id = ep_cache_extended.entryid AND ep_cache_extended.property = 'ep_cache_extended') LEFT OUTER JOIN s_entryproperties ep_cache_body ON (e.id = ep_cache_body.entryid AND ep_cache_body.property = 'ep_cache_body') LEFT OUTER JOIN s_entryproperties ep_access ON (e.id = ep_access.entryid AND ep_access.property = 'ep_access') LEFT JOIN s_entryproperties ep_sticky ON (e.id = ep_sticky.entryid AND ep_sticky.property = 'ep_is_sticky') WHERE (MATCH(title,body,extended) AGAINST('разница')) AND isdraft = 'false' AND timestamp <= 1315845900 AND (ep_access.property IS NULL OR ep_access.value = 'public') GROUP BY e.id ORDER BY timestamp DESC LIMIT 15QUERY: SELECT ep_sticky.value AS orderkey, ep_cache_extended.value AS ep_cache_extended, ep_cache_body.value AS ep_cache_body, e.id, e.authorid, a.realname AS author, e.allow_comments, e.moderate_comments, a.email, e.timestamp, e.comments, e.title, e.body, e.extended, e.trackbacks, e.exflag, e.isdraft, e.last_modified, a.username AS loginname FROM s_entries e LEFT JOIN s_authors a ON e.authorid = a.authorid LEFT JOIN s_entrycat ec ON e.id = ec.entryid LEFT OUTER JOIN s_entryproperties ep_cache_extended ON (e.id = ep_cache_extended.entryid AND ep_cache_extended.property = 'ep_cache_extended') LEFT OUTER JOIN s_entryproperties ep_cache_body ON (e.id = ep_cache_body.entryid AND ep_cache_body.property = 'ep_cache_body') LEFT OUTER JOIN s_entryproperties ep_access ON (e.id = ep_access.entryid AND ep_access.property = 'ep_access') LEFT JOIN s_entryproperties ep_sticky ON (e.id = ep_sticky.entryid AND ep_sticky.property = 'ep_is_sticky') WHERE (MATCH(title,body,extended) AGAINST('разница*' IN BOOLEAN MODE)) AND isdraft = 'false' AND timestamp <= 1315845900 AND (ep_access.property IS NULL OR ep_access.value = 'public') GROUP BY e.id ORDER BY timestamp DESC LIMIT 15
and
- nothing found for разница (must be at least 1 hit)
- empty search-result window (now)

Query 2 - "возраст"

Code: Select all

QUERY: SELECT ep_sticky.value AS orderkey, ep_cache_extended.value AS ep_cache_extended, ep_cache_body.value AS ep_cache_body, e.id, e.authorid, a.realname AS author, e.allow_comments, e.moderate_comments, a.email, e.timestamp, e.comments, e.title, e.body, e.extended, e.trackbacks, e.exflag, e.isdraft, e.last_modified, a.username AS loginname FROM s_entries e LEFT JOIN s_authors a ON e.authorid = a.authorid LEFT JOIN s_entrycat ec ON e.id = ec.entryid LEFT OUTER JOIN s_entryproperties ep_cache_extended ON (e.id = ep_cache_extended.entryid AND ep_cache_extended.property = 'ep_cache_extended') LEFT OUTER JOIN s_entryproperties ep_cache_body ON (e.id = ep_cache_body.entryid AND ep_cache_body.property = 'ep_cache_body') LEFT OUTER JOIN s_entryproperties ep_access ON (e.id = ep_access.entryid AND ep_access.property = 'ep_access') LEFT JOIN s_entryproperties ep_sticky ON (e.id = ep_sticky.entryid AND ep_sticky.property = 'ep_is_sticky') WHERE (MATCH(title,body,extended) AGAINST('возраст')) AND isdraft = 'false' AND timestamp <= 1315845900 AND (ep_access.property IS NULL OR ep_access.value = 'public') GROUP BY e.id ORDER BY timestamp DESC LIMIT 15

- total records: 72 (with unwanted entries \no string inside\ and without expected)

Re: QuickSearch and 8bit terms

Posted: Mon Sep 12, 2011 7:10 pm
by LazyBadger
Notes
1. Table
ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
2. Serendipity used without "SET NAMES"

Re: QuickSearch and 8bit terms

Posted: Tue Sep 13, 2011 4:25 pm
by LazyBadger
Well, Garvin, we can close this issue
Second blog with (almost) the same settings works with 8bit russian text, single difference is SET NAMES ON for good blog.
I suppose, this value have to be documented in user's doc as mandatory for non-English UTF8 blogs

Re: QuickSearch and 8bit terms

Posted: Wed Sep 14, 2011 1:33 pm
by garvinhicking
Hi!

Yes, definitely SET NAMES is the default and should always be used to properly utilize UTF-8.

Users should only change that if they have charset problems (usually fmor an upgraded blog or oder MySQL version) - we AFAIK also state that this setting should only be changed with care...

regards
Garvin