Page 1 of 1

tag plugin - question

Posted: Tue Apr 03, 2007 1:23 pm
by robi-bobi
I just started to use tag plugin and it looks great

But, related entries shown by it do not look ok. It looks like the plugin shows the most recent entry which has at least one same tag as current entry.

I think that it should show entries which share the most tags with current one (and order them the same way: time desc)

Is it just my installation that works this way (I could not find setting for this), or this is default behavior.

Re: tag plugin - question

Posted: Tue Apr 03, 2007 2:01 pm
by garvinhicking
Hi!

You are right, entries are just sorted by timestamp:

Code: Select all

       $q = "SELECT DISTINCT e1.entryid,
                     e2.title
                FROM {$serendipity['dbPrefix']}entrytags AS e1
           LEFT JOIN {$serendipity['dbPrefix']}entries   AS e2
                  ON e1.entryid = e2.id
               WHERE e1.tag IN ('" . implode("', '", $tags) . "')
                 AND e1.entryid != " . (int)$postID . "
                 AND e2.isdraft = 'false'
                     " . (!serendipity_db_bool($serendipity['showFutureEntries'
            ORDER BY  e2.timestamp DESC
               LIMIT " . $this->get_config('show_related_count', 10);
(from function getRelatedEntries() within serendipity_event_freetag

Sadly I don't really know how to group by most common intersecting tags, I don't know if this is even possible with PostgreSQL or SQLite...

Best regards,
Garvin

Posted: Tue Apr 03, 2007 2:33 pm
by robi-bobi
I understand

maybe by using http://php.net/manual/en/function.similar-text.php we can achieve similar effect?

Retrieving all tag records in DB for all entries and using similar_text() on them with current tags?

Sure, this should be cached - with probably long period of cache :)
Maybe the first cache should be made after saving the entry, so users won't feel any delay

I don't know how this would behave on blogs with LOTS of entries, but I suppose that with <1000 entries this should work fine.

This can be added as option to the current plugin configuration with warning that it may slow down the site.

Posted: Tue Apr 03, 2007 2:35 pm
by garvinhicking
Hi!
maybe by using http://php.net/manual/en/function.similar-text.php we can achieve similar effect?
I'm afraid that won't really work. It would mean to fetch ALL entries body texts for EACH entry to compare them next to each other. That's quite performance intensive and would drag everything down. Building a cache would mean to rebuild the cache for All entries for each new entry that is made. If you have a couple of entries, that would mean you might need to wait minutes before you can publish a new entry.

The way to go would be to try to find a suitable SQL Query that does what you need, and then see if it works across MySQL, psotgresql and SQLite.

Best regards,
Garvin

Posted: Tue Apr 03, 2007 2:37 pm
by robi-bobi
hm, sorry, it looks like both similar_text and levenshtein() are for single words only

Posted: Tue Apr 03, 2007 2:39 pm
by robi-bobi
garvinhicking wrote: The way to go would be to try to find a suitable SQL Query that does what you need
Garvin
I agree with that, just was thinking loud :)

edit: another thought: http://dev.mysql.com/doc/refman/5.0/en/ ... earch.html (which might come close but adds other problems ...)

Posted: Tue Apr 17, 2007 1:48 am
by robi-bobi

Code: Select all

CREATE TABLE `entry_tags` (
  `entry_id` int(11) NOT NULL,
  `tag_id` int(11) NOT NULL,
  PRIMARY KEY  (`entry_id`,`tag_id`)
);

SELECT b.entry_id, COUNT(*) as score FROM entry_tags a
JOIN entry_tags b ON
  a.tag_id = b.tag_id AND
  a.entry_id = OUR_ENTRY_ID
GROUP BY b.entry_id
HAVING b.entry_id != OUR_ENTRY_ID
ORDER BY COUNT(*) DESC;
not my sql, so no credits go to me

even if the tables are not connected with such 'link' table, the query looks promising

Posted: Tue Apr 17, 2007 10:44 am
by garvinhicking
Hi!

Can you check if that works with SQLite and postgresql?

The query looks like this with actual s9y tables:

Code: Select all

SELECT b.entryid, COUNT(*) as score FROM serendipity_entrytags a
JOIN serendipity_entrytags b ON
  a.tag = b.tag AND
  a.entryid = OUR_ENTRY_ID
GROUP BY b.entryid
HAVING b.entryid != OUR_ENTRY_ID
ORDER BY COUNT(*) DESC
Regards,
Garvin