RSS Aggregator Problem

Creating and modifying plugins.
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

Unfortunately, that didn't do the trick :(

I didn't apply the patch by hand, but I just now let SPARTACUS update it for me. Spartacus said version 0.10, which is what it says in the patch, so I'm pretty sure your changes are there.

My hypothesis does still seem to be correct though, it is only doing it on entries that have been modified once after initally being added by the aggregator. As long as it has changed once, it remains 'new' every subsequent running of the aggregator.

Here's the debug line for the entry that wasn't previously showing up as new, but repeatedly does after I modified it:

DEBUG: parseDate(Thu, 27 Jul 2006 15:11:26 -0700) as 2006-07-27 15:11:26 (strtotime) DEBUG: pubDate Thu, 27 Jul 2006 15:11:26 -0700 = 1154038286 DEBUG: lookup cache_entries[Ok Sean, Hows This?][2][1156298679] finds 143. DEBUG: lookup cache_md5[2e099179e65768bae9560588a3c91963] finds nothing. Save 'Ok Sean, Hows This?' as 143.
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

I'm fairly useless with php, but I've been looking around through the plugin, trying to figure what's doing what where.

Upon crossreferencing things between the database and the code, I've found that the most recent entry (the one that's repeatedly showing as new) doesnt actually have an entry in the serendipity_aggregator_md5 table.

So it is detecting as new because cache_md5[$md5hash] is returning false. It simply doesnt have an entry.

Now I tried updating an earlier entry to see what happens, and changed the title slightly, and the aggregator now sees it as a new post entirely and has readded it. It shows up in the md5 table just fine..

Is this normal behavior?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

Maybe that missing MD5 entry comes from the old versions of the plugin, and it should work for new entries?

If you change the title AND the body of an entry, the aggregator might not match that entry at all, because it uses the title also as a unique item.

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

Maybe that missing MD5 entry comes from the old versions of the plugin, and it should work for new entries?
Well, that was my first thought as well. But then when I thought about it more, it seems like if it was now working properly, it would be saving the 'new' updated post's MD5 to the table and upon the next running of the aggregator shouldn't return as new. This didn't happen; the MD5 just never gets saved

It should be saving the MD5 every time it updates, regardless of if the post was aggregated initially by an old version, no?
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

Sadly I currently do not have my development sandbox around, so I can't really well check the code.

An md5 will only be updated when an entry gets saved; if the md5 is the same, no update will be made to an entry. However the code currently is a bit complex, so I must wait to help you until I get that new development machine for me...

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

Garvin, I'm still getting used to the RSS stuff. I mean, I just visit the darn site and see if anything's changed. Sheesh. Durn kids, want everything handed to 'em; when I was young... :lol:

This thread is a bit long, and I'm having a bit of trouble following along. If you can give me a synopsis and an idea of where to look, I'll give it a look.

My understanding so far: when retrieving an RSS feed, we get redirected to some chunk of code that checks an MD5 hash. If the hash is different from a previously-stored hash, it decides the entry is new and 1)gets the entry's date time groups, then 2)recalculates and updates the hash. This isn't working for jerwarren, and it seems that hashes aren't being stored at all.

Correct my understanding, and I'll try to improve.
Judebert
---
Website | Wishlist | PayPal
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi judebert!

Thanks for your involvement. :)

But I think you got it a bit wrong (and yeah, this thread is long). The problem is this:

The RSS aggregator stores several RSS locations that are fetched and then parsed, and stored in a s9y DB. Each time the aggregator runs, it fetches all feeds and then looks up the DB if each item of an RSS feed is already present. If the item is not present, it gets added. If it is present, it is checked if the item needs to be updated.

Now the problem is this "is the item present?" check. s9y creates a lookup array of FeedID-Title-Timestamp to associate RSS items to the database. If a lookup fails, a new entry will be stored.

If the lookup has a hit, then the next phase happens, and the content of an RSS item is investigated. For that, an md5 hash lookup of the title and the body is created from the current database and the one from the current RSS item. If those match, an item is not regarded as new. If they do not match, an item needs to be updated (not inserted, because this phase only happens when an entry is in the DB-lookup array).

Now this is what seems to fail in jerwarrens situation; the MD5 hash lookup does not regard the hash from the DB and the one from the item as the same. In fact, sometimes the MD5 hash is missing from the database.

The RSS aggregator has a "insertProperties" or sth. like that method, which should insert/create those refs...

HTH and best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

After some more debugging, I've found that when it detects an updated post, it's actually deleting the MD5 hash from the serendipity_aggregator_md5 table. From that point onward, since there is no entry in the table, it is assumed to need updating.

I posted a new post, ran the aggregator, checked the database and saw an entry for it. Then I changed the post, reran the aggregator, checked the database and found it gone.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

Hm, in the code I see no way how the MD5 could be set empty.

The insertProperties() method is the only one changing the md5hash, and this one takes the hash as an argument with the calculated hash of the run cycle. But it only does that with the patch I commited some days ago, but you said you did have that one, right:

Code: Select all

 # Always update the MD5 hash, to catch updates of an entry properly. Patch by jerwarren!
 $sql = "UPDATE {$serendipity['dbPrefix']}aggregator_md5   SET timestamp = '$t', md5='$md5hash' WHERE entryid = " . (int)$entryid;
Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

hmm. my reply seems to have disappeared...

anyway, yeh, I have that code you added. I couldn't find anywhere where it was deleting either, but it's doing it. And only when aggregating a post that has changed since the initial aggregation. Also, it does it for any of the serendipity feeds I've got it aggregating. (I havent tried it with non-s9y feeds.) I just noticed it only on certain ones at first because those were the ones I edited most frequently.

Have you tried duplicating this now that we know more about what's happening? (like, post something, aggregate it, edit the post and reaggregate it?)
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Post by garvinhicking »

Hi!

I can't duplicate that, as I currently don't have my development environment to try.

I hope to be able to do that in 1-2 weeks, when I buy the new PC.

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

In that case, perhaps the problem is that $md5hash gets reset or something before it's used.

If you add "print('DEBUG: storing {' . $md5hash . '}');" just before that SQL statement (or better yet, add it to the debug log), you should see exactly what's getting stored. If it's blank, then we've just got to figure out HOW it gets blanked, probably somewhere in updertEntries().
Judebert
---
Website | Wishlist | PayPal
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

that's the first thing I did in my own debugging process, and the value is valid at save time. It's got to be cleared out somewhere else along the way.

Perhaps when the aggregator runs and goes to do the check it is being deleted before the check?
jerwarren
Regular
Posts: 42
Joined: Fri Feb 10, 2006 8:55 pm

Post by jerwarren »

i was just playing around with it, and it seems your 'UPDATE' call is malformed or something. If I replace it with an INSERT (which I copied directly from a few lines up) it works like expected.

if I leave the entry in aggregator_md5, your UPDATE call removes it. In any case, this seems to solve my problem.
judebert
Regular
Posts: 2478
Joined: Sat Oct 15, 2005 6:57 am
Location: Orlando, FL
Contact:

Post by judebert »

If you'll do a diff or reply with the new line (is it a verbatim copy from one of the INSERT lines or did you have to modify it?) I'll commit it to the baseline.
Judebert
---
Website | Wishlist | PayPal
Post Reply