Page 1 of 1
Excluding bots from statistics?
Posted: Tue Dec 20, 2005 10:18 pm
by ghoti
Hi,
now that my statistics plugin works, I'd like to get rid of all the bots. I put a few of the bots' "hostnames" in the exclusion list, but that doesn't seem to have an effect. I then removed the list again to only contain "msnbot.msn.com" (which was the default after installing the plugin), and since then it doesn't even exclude that one.
What am I doing wrong? Is that a regular expression? If so, should I escape the period? This is a bit hard to test since I don't know when the next bot will come along ...
Thanks,
Robert
Re: Excluding bots from statistics?
Posted: Tue Dec 20, 2005 10:25 pm
by garvinhicking
Actually, the "hostname" of the bot is not the hostname, but the string of HTTP_USER_AGENT. It is applied as a full string match, not a regular expression.
See line 103 of the serendipity_event_statistics.php plugin, if you care.
Remember that setting this option only affects the tracking of new bots. Old bots that already have been tracked will not be removed from the tracking list.
HTH,
Garvin
Posted: Tue Dec 20, 2005 10:40 pm
by ghoti
Ah, thanks! I guess the default value threw me off. Looks good so far ... ;)
Posted: Thu Dec 29, 2005 3:06 am
by Josh
Has anyone made a list of the main bots that I could insert there?
Posted: Thu Dec 29, 2005 3:34 am
by judebert
Posted: Sat Jan 28, 2006 1:04 pm
by Michael Harrison
Has anyone been able to get this feature to work? No matter what string I use, the bots are still being recorded and displayed in the stats page.
I'm using s9y 0.91 with stats plugin 1.23
I'm not php-savvy so haven't accomplished anything by trolling the code.
I've tried the wikipedia strings:
Baiduspider (
http://www.baidu.com/search/spider.htm)|Googlebot/2.1 (+
http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +
http://www.google.com/bot.html)|Googlebot-Image/1.0| msnbot/1.0 (+
http://search.msn.com/msnbot.htm)|Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)|Mozilla/5.0 (compatible; Yahoo! Slurp China;
http://misc.yahoo.com.cn/help.html)
as well as the strings the stats plugin spits out:
msnbot/1.0 (+
http://search.msn.com/msnbot.htm)|Baiduspider+(+
http://www.baidu.com/search/spider.htm)|Mozilla/5.0 (compatible;Googlebot/2.1;+
http://www.google.com/bot.html)|Technoratibot/0.7| Googlebot/2.1 (+
http://www.google.com/bot.html)|Mozilla/5.0 (compatible; googlebot/2.1; +
http://www.google.com/bot.html)|Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
No good either way.
Posted: Sat Jan 28, 2006 1:17 pm
by garvinhicking
The strings you put in the referrer blocks are interpreted as Regular Expressions. You need to escape all strings that indicate control characters in Regular Expressions in your intput string. Those are, but not limited to:
( ) . +
So your string should look something like:
Code: Select all
Baiduspider \(http://www.baidu.com/search/spider.htm\)|Googlebot/2.1 \(http://www.google.com/bot.html\)|Mozilla/5.0 \(compatible; googlebot/2.1;|Googlebot-Image/1.0|msnbot/1.0 \(http://search.msn.com/msnbot.htm\)
Read more about regular expressions by searching wikipedia, php.net or google
Regards,
Garvin
Posted: Sat Jan 28, 2006 1:25 pm
by Michael Harrison
Wow, that was fast.
Thanks for the info. That certainly isn't clear through the admin interface.
btw, regex I get (I am a long-time programmer), I just don't do PHP although I'm thinking more and more about changing that.
Posted: Sat Jan 28, 2006 1:32 pm
by garvinhicking
Uh, hang on. I totally confused this with something different. What I mentioned earlier on is what really applies:
Actually, the "hostname" of the bot is not the hostname, but the string of HTTP_USER_AGENT. It is applied as a full string match, not a regular expression.
See line 103 of the serendipity_event_statistics.php plugin, if you care.
That means, you must specify exactly the string that the Bot submits as its HTTP User agent. You might want to grep your access logfiles to see, if they really submit the HTTP user agents exactly like you entered them?
I'm really sorry for mixing that up.
Regards,
Garvin
Posted: Sat Jan 28, 2006 10:11 pm
by judebert
If you use FireFox, you can test this by getting the UserAgent extension and setting your user agent to one of the bot agents. Then you can visit your site and see if it's excluing you from the statistics.
Posted: Thu Mar 09, 2006 12:11 am
by SHRIKEE
maybe a silly idea but would it be possible to use preg_match() to filter out bots ? As i see that bots change hostnames each week nowadays its a endless prayer to just add each new one. Why not filter on search.google.com who cares for the rest. Possibly 'some' normal users wouldnt get through as they have search or bot in theyre referrer, bad luck for them...
option? Im working on the statistics plugin anyway. If possible i could include it right away.
Posted: Sat Nov 18, 2006 5:00 pm
by yati
Hi,
OK... I have not much idea about coding or php or anything. I was just wondering if anyone could help me on this issue. I've tried to put the bot's address (which appears on the stats page) on the bit in the plug in i have on the stats but this is not working.
Could someone please explain to me in plain english without too much technical stuff on how I could block the bots from being counted on my stats?
Thanks
Should have been born a blonde,
yati
my blog
Posted: Tue Nov 21, 2006 6:48 pm
by judebert
Go to the plugin configuration; there you'll find an entry area for referrer blocking.
In that area, enter the exact user agent string of the bot you want to exclude. (Bot user agent strings can be found on Wikipedia.)
Hit save, and you're done.