Page 1 of 1
Two questions - Search and Top Level Content Display (day)
Posted: Sun Aug 24, 2008 6:12 pm
by genesis
Two questions...
First, can Serepdipity be told to display one days worth of postings on the top page instead of a number of posts? It appears no, from the config screen....
Second, the "basic search" doesn't appear to use Tsearch2 on Postgresql. Is there support for that somewhere?
Thanks!
Re: Two questions - Search and Top Level Content Display (da
Posted: Mon Aug 25, 2008 12:19 pm
by garvinhicking
Hi!
That is true, there is no option for this. The only way to make this work would be to use the smarty serendipity_fetchPrintEntries function and pass it the SQL to it that would be needed to filter by a single date.
Second, the "basic search" doesn't appear to use Tsearch2 on Postgresql. Is there support for that somewhere?
Sadly not, we currently have no active pgsql developers that work on s9y core code, so pgsql support is currently limited to making basic SQL work, not any customized enhancmenets.
Best regards,
Garvin
Posted: Mon Aug 25, 2008 3:08 pm
by genesis
OK.... if I can muddle my way through how all the pieces work implementing Tsearch functionality should be trivial - that I can do, if I can figure out the components necessary to hook into the right places.
I assume the various documentation on the software and its internals will get me where I need to go, assuming I can understand them?

Posted: Mon Aug 25, 2008 3:55 pm
by garvinhicking
Hi!
The current search functionality can be found in s9y's include/functions_entries.inc.php file, function serendipity_searchEntries(). There already is a PGSQL switch, so if you tell me the required SQL/Code, I could insert it in the official distro
HTH,
Garvin
Posted: Mon Aug 25, 2008 5:43 pm
by genesis
To do searches on Full Text in Postgresql, you must have the "tsearch2" module loaded (this assumes 8.x or later)
You also must have an index on the table in question structured as:
"idx_message" gin (to_tsvector('english'::text, message))
"idx_subject" gin (to_tsvector('english'::text, subject))
This gives you both the message and subject fields
To query, you do:
select * from post where to_tsvector('english', message) @@ to_tsquery('bite & me');
This tells the engine to parse the message (you can also parse the subject if you want) and the query terms are "bite" and "me", both of which must be present. Booleans are acceptable (& and |)
This snippet is from my forum code which uses tsearch2 for full-text search capability. It is EXTREMELY fast.
Note that if you want support for other than English, you have to have a lexeme loaded for that language into the tsearch2 module in the database or it won't work.
A more complex example is:
select * from post where to_tsvector('english', subject) @@ to_tsquery('hellfire') and to_tsvector('english', message) @@ to_tsquery('bite & me')
This searches for both the word "hellfire" in the subject line AND both the words "bite" and "me" in the message body. All must be present to return a match.
Does this help?
Posted: Mon Aug 25, 2008 6:17 pm
by garvinhicking
Hi!
That helps a lot. In fact, inside the file I mentioned you currently see this:
Code: Select all
if ($serendipity['dbType'] == 'postgres' ||
$serendipity['dbType'] == 'pdo-postgres') {
$cond['group'] = '';
$cond['distinct'] = 'DISTINCT';
$cond['find_part'] = "(title ILIKE '%$term%' OR body ILIKE '%$term%' OR extended ILIKE '%$term%')";
if you modify this to:
Code: Select all
if ($serendipity['dbType'] == 'postgres' ||
$serendipity['dbType'] == 'pdo-postgres') {
$cond['group'] = '';
$cond['distinct'] = 'DISTINCT';
$cond['find_part'] = "(
to_tsvector('english', title) @@ to_tsquery('$term') OR
to_tsvector('english', body) @@ to_tsquery('$term') OR
to_tsvector('english', extended) @@ to_tsquery('$term')
)");
That this could work?
Regards,
Garvin
Posted: Mon Aug 25, 2008 8:00 pm
by genesis
That appears to work for single terms, but not for booleans.... How do I get a log somewhere of the exact query that is being submitted? The search return shows the correct boolean string, but I get nothing back.
If I submit the query by hand off the command line to the database (with complex booleans) I get the correct result.
Posted: Tue Aug 26, 2008 11:42 am
by garvinhicking
Hi!
I believe booleans are converted from/to HTML entities.
You could patch that mentioned PHP code:
if ($serendipity['dbType'] == 'postgres' ||
$serendipity['dbType'] == 'pdo-postgres') {
$cond['group'] = '';
$cond['distinct'] = 'DISTINCT';
$cond['find_part'] = "(
to_tsvector('english', title) @@ to_tsquery('$term') OR
to_tsvector('english', body) @@ to_tsquery('$term') OR
to_tsvector('english', extended) @@ to_tsquery('$term')
)");
$fp = fopen('/tmp/debug.log', 'a');
fwrite($fp, $cond['find_part'] . "\n");
fclose($fp);
[/code]
This creates a /tmp/debug.log file. You can change the path to match your filesystem.
Regards,
Garvin
Posted: Tue Aug 26, 2008 1:51 pm
by genesis
That's what's happening....
(
to_tsvector('english', title) @@ to_tsquery('bite & me') OR
to_tsvector('english', body) @@ to_tsquery('bite & me') OR
to_tsvector('english', extended) @@ to_tsquery('bite & me')
)
That sucks.... can you think of a reasonable way to prevent that? Without it, booleans won't work....
(Php's a new one for me.... I'm a "C" programmer

)
Posted: Tue Aug 26, 2008 2:14 pm
by garvinhicking
Hi!
You could use:
Code: Select all
if ($serendipity['dbType'] == 'postgres' ||
$serendipity['dbType'] == 'pdo-postgres') {
$term = str_replace('&', '&', $term);
$cond['group'] = '';
$cond['distinct'] = 'DISTINCT';
$cond['find_part'] = "(
to_tsvector('english', title) @@ to_tsquery('$term') OR
to_tsvector('english', body) @@ to_tsquery('$term') OR
to_tsvector('english', extended) @@ to_tsquery('$term')
)");
$fp = fopen('/tmp/debug.log', 'a');
fwrite($fp, $cond['find_part'] . "\n");
fclose($fp);
Is there a SQL statement for postgresql that one can use to check if this vector function is available? This would help s9y to check if it can use that query or not.
Regards,
Garvin
Posted: Tue Aug 26, 2008 2:38 pm
by genesis
Code: Select all
marketticker=> select routine_name from information_schema.routines where routine_name like 'to_tsvector' and specific_catalog='marketticker';
routine_name
--------------
to_tsvector
to_tsvector
to_tsvector
to_tsvector
to_tsvector
(5 rows)
"specific_catalog" is the database you're looking in (someone COULD HAVE loaded tsearch after creating the database; its loaded into the template, so unless intentionally loaded into your schema afterward, having it in the template doesn't help you if the database was created first) so you have to check your database catalog.
Note that the function is overloaded so it will show up in this query more than once; there are lots of other fields available, but I assume all you're looking for is the presence. If so you could use:
Code: Select all
marketticker=> select count(routine_name) > 0 from information_schema.routines where routine_name like 'to_tsvector' and specific_catalog='marketticker';
?column?
-------
t
(1 row)
Returns "t" (true) if the functions are present, and "f" (false) if not.
information_schema showed up around 7.4ish, so anyone with anything reasonably current in the context of Postgresql should have it.
Posted: Tue Aug 26, 2008 3:20 pm
by garvinhicking
Hi!
I just committed this to the official s9y repository:
http://svn.berlios.de/viewcvs/serendipi ... 1&view=rev
Can you check if that works fine for you as well?
Regards,
Garvin
Posted: Tue Aug 26, 2008 3:38 pm
by genesis
One error:
Code: Select all
to_tsvector('english', title) @@to_tsquery('$term') OR
to_tsvector('english', body) @@to_tsquery('$term') OR
to_tsvector('english', extended) @@to_tsquery('$term')
)";
Note that you have an extra closing parenthesis just before the semicolon in what you committed; this is "as corrected" and works.
Posted: Tue Aug 26, 2008 3:49 pm
by garvinhicking
Hi!
Good catch, sorry for that error. Great to hear that it works
Regards,
Garvin
Posted: Tue Aug 26, 2008 3:54 pm
by genesis
Cool - this is a fairly substantial improvement in the search functionality, especially for very large blogs (its smoking fast) and the boolean capability is something that people like as well.