Page 1 of 1

Some ideas for improving the Karma plugin

Posted: Mon Mar 23, 2009 12:57 am
by marcusfriedman
Here are some ideas for improving the Karma plugin (and more specifically, the way it tracks visits)
  • Avoid counting spammers as visitors. Every time a spammer access an article, it is counted as visit. I think that http:bl could be used to filter those hits so that they don't get counted. From what I understand, http:bl support is already being used on other plugins, so I guess it wouldn't be very difficult to add it to Karma.
  • Avoid counting bots as visitors. This case is similar to the previous one: every time a search engine indexes a blog, its visits are counted as hits, even when they shouldn't. And for every new crawl, more visits get added. I understand that it's not easy to check for every possible search engine. However, maybe this could also be checked through http:bl, which can tell if a given IP address matches some of the known search bots. Another approach could be using a transparent image to track visits (I believe that bots don't download images, so they wouln't add hits).
  • Don't compute internal trackbacks as visits. As Garvin already told me, this one is negligible, and I agree. The way I see it, spammers and bots are the two factors that really skew the hit counters. However, I guess that filtering internal trackbacks could be implemented by adding an if clause.
Maybe these options have been considered before and discarded because of performance issues or some other reasons. If performance is a concern, then this additional checks could be added as options, so that the users that don't need them can leave them disabled.

Karma is an extremely useful and powerful plugin, and it would be really nice if its visitor tracking capabilities could be improved.

Re: Some ideas for improving the Karma plugin

Posted: Mon Mar 23, 2009 12:44 pm
by garvinhicking
Hi!

Hm, the way I read project honeypot is that they do not allow usage in "projects"; people using it must be active supporters, so I don't think s9y would qualify as a valid usage scenario?!

The other s9y plugins use DNSBL access, but I find those very much skewed. Those DNSBL lists almost always include dial-up IPs. That's okay, because in usual anti-spam scenarios for EMails, those are not allowed to be send from dial-up IPs.

But for reading visitors, exactly those IPs are the ones of people that you DO want to count!

So, I wouldn't really know how to exclude bots in a reliable way. The karma plugin already filters user agents from google, linkwalker, zermelo, niumblecrawler - so if you can supply a HTTP USER AGENT field list of agents to block, those could be included.

Regards,
Garvin

Re: Some ideas for improving the Karma plugin

Posted: Wed Mar 25, 2009 12:14 am
by marcusfriedman
I think that in order to get an http:BL key you must be an active supporter of the Honeypot Project. However, this has already been done in the Spam Protector (RBL) plugin (serendipity_event_spamblock_rbl), which allows you to provide an http:BL key if you have one.

Unfortunately, when a blog has been found by spammers, it will receive several POST requests each day, and each request will increase the hits counters (even when the spam itself doesn't make it through the spam filters, which is fine). You can spot those fake POSTs from your web server logs, check the IPs through the Honeypot Project or similar services, and ban those IPs via htaccess, but that requires a lot of time and it doesn't seem like a good solution, considering that there are hundreds of addresses that should be blocked.

From what I've seen so far, most of these spammers don't even use GET requests, they just send POSTs to specific articles that they have previously indexed. What about not counting POSTs as visits? (I guess that human visitors would be counted anyway because of a previous GET).

With regard to search engines, the Honey Project can identify the IPs of the most popular ones. But if those hits have to be filtered using the user agent string there are sites like user-agents.org that provide such information (even in XML format).

Re: Some ideas for improving the Karma plugin

Posted: Wed Mar 25, 2009 11:42 am
by garvinhicking
Hi!

What about not counting POSTs as visits? (I guess that human visitors would be counted anyway because of a previous GET).
Now that's a very easy and proper way that should be perfect to suit your needs. Indeed, all valid clicks should be prefixed with a GET, and POSTS really should not count as new visits. I've committed your idea:

http://svn.berlios.de/viewcvs/serendipi ... ision=2493

Best regards,
Garvin

Re: Some ideas for improving the Karma plugin

Posted: Wed Mar 25, 2009 6:12 pm
by Don Chambers
Sounds like a great improvement!

And, just as a reminder for those using Karma with voting/rating images, I created a bunch of new images for the plugin approximately a year ago. They can be found here.

Re: Some ideas for improving the Karma plugin

Posted: Wed Apr 01, 2009 2:35 am
by marcusfriedman
Hi Garvin! I'm really glad to see that the idea of counting only the GET requests as visits can be useful, and that it has been commited to the svn repository.

During the last few days, I've been thinking about other possible solutions to the problem of counters getting increased by bots and spammers. I guess that another way to do it would be by including some Javascript code that notified Karma.

Since in theory the Javascript code would only be parsed and run by real browsers, then any automated process without the proper JS parser and VM wouldn't add new hits.


Best regards,
Marcus