Some ideas for improving the Karma plugin

Creating and modifying plugins.
Post Reply
marcusfriedman
Regular
Posts: 11
Joined: Fri Feb 13, 2009 4:34 am
Contact:

Some ideas for improving the Karma plugin

Post by marcusfriedman »

Here are some ideas for improving the Karma plugin (and more specifically, the way it tracks visits)
  • Avoid counting spammers as visitors. Every time a spammer access an article, it is counted as visit. I think that http:bl could be used to filter those hits so that they don't get counted. From what I understand, http:bl support is already being used on other plugins, so I guess it wouldn't be very difficult to add it to Karma.
  • Avoid counting bots as visitors. This case is similar to the previous one: every time a search engine indexes a blog, its visits are counted as hits, even when they shouldn't. And for every new crawl, more visits get added. I understand that it's not easy to check for every possible search engine. However, maybe this could also be checked through http:bl, which can tell if a given IP address matches some of the known search bots. Another approach could be using a transparent image to track visits (I believe that bots don't download images, so they wouln't add hits).
  • Don't compute internal trackbacks as visits. As Garvin already told me, this one is negligible, and I agree. The way I see it, spammers and bots are the two factors that really skew the hit counters. However, I guess that filtering internal trackbacks could be implemented by adding an if clause.
Maybe these options have been considered before and discarded because of performance issues or some other reasons. If performance is a concern, then this additional checks could be added as options, so that the users that don't need them can leave them disabled.

Karma is an extremely useful and powerful plugin, and it would be really nice if its visitor tracking capabilities could be improved.
Marcus Friedman | ellipsys (website) | @marcusfriedman (Twitter)
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Some ideas for improving the Karma plugin

Post by garvinhicking »

Hi!

Hm, the way I read project honeypot is that they do not allow usage in "projects"; people using it must be active supporters, so I don't think s9y would qualify as a valid usage scenario?!

The other s9y plugins use DNSBL access, but I find those very much skewed. Those DNSBL lists almost always include dial-up IPs. That's okay, because in usual anti-spam scenarios for EMails, those are not allowed to be send from dial-up IPs.

But for reading visitors, exactly those IPs are the ones of people that you DO want to count!

So, I wouldn't really know how to exclude bots in a reliable way. The karma plugin already filters user agents from google, linkwalker, zermelo, niumblecrawler - so if you can supply a HTTP USER AGENT field list of agents to block, those could be included.

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
marcusfriedman
Regular
Posts: 11
Joined: Fri Feb 13, 2009 4:34 am
Contact:

Re: Some ideas for improving the Karma plugin

Post by marcusfriedman »

I think that in order to get an http:BL key you must be an active supporter of the Honeypot Project. However, this has already been done in the Spam Protector (RBL) plugin (serendipity_event_spamblock_rbl), which allows you to provide an http:BL key if you have one.

Unfortunately, when a blog has been found by spammers, it will receive several POST requests each day, and each request will increase the hits counters (even when the spam itself doesn't make it through the spam filters, which is fine). You can spot those fake POSTs from your web server logs, check the IPs through the Honeypot Project or similar services, and ban those IPs via htaccess, but that requires a lot of time and it doesn't seem like a good solution, considering that there are hundreds of addresses that should be blocked.

From what I've seen so far, most of these spammers don't even use GET requests, they just send POSTs to specific articles that they have previously indexed. What about not counting POSTs as visits? (I guess that human visitors would be counted anyway because of a previous GET).

With regard to search engines, the Honey Project can identify the IPs of the most popular ones. But if those hits have to be filtered using the user agent string there are sites like user-agents.org that provide such information (even in XML format).
Marcus Friedman | ellipsys (website) | @marcusfriedman (Twitter)
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Some ideas for improving the Karma plugin

Post by garvinhicking »

Hi!

What about not counting POSTs as visits? (I guess that human visitors would be counted anyway because of a previous GET).
Now that's a very easy and proper way that should be perfect to suit your needs. Indeed, all valid clicks should be prefixed with a GET, and POSTS really should not count as new visits. I've committed your idea:

http://svn.berlios.de/viewcvs/serendipi ... ision=2493

Best regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Don Chambers
Regular
Posts: 3657
Joined: Mon Feb 13, 2006 2:40 am
Location: Chicago, IL, USA
Contact:

Re: Some ideas for improving the Karma plugin

Post by Don Chambers »

Sounds like a great improvement!

And, just as a reminder for those using Karma with voting/rating images, I created a bunch of new images for the plugin approximately a year ago. They can be found here.
=Don=
marcusfriedman
Regular
Posts: 11
Joined: Fri Feb 13, 2009 4:34 am
Contact:

Re: Some ideas for improving the Karma plugin

Post by marcusfriedman »

Hi Garvin! I'm really glad to see that the idea of counting only the GET requests as visits can be useful, and that it has been commited to the svn repository.

During the last few days, I've been thinking about other possible solutions to the problem of counters getting increased by bots and spammers. I guess that another way to do it would be by including some Javascript code that notified Karma.

Since in theory the Javascript code would only be parsed and run by real browsers, then any automated process without the proper JS parser and VM wouldn't add new hits.


Best regards,
Marcus
Marcus Friedman | ellipsys (website) | @marcusfriedman (Twitter)
Post Reply