Page 1 of 1

Robot Exclusion?

Posted: Wed Apr 20, 2005 4:43 pm
by boone
Is there any easy way to tell web spiders like Googlebot to not index stuff like the captcha graphics?

Re: Robot Exclusion?

Posted: Wed Apr 20, 2005 6:40 pm
by garvinhicking
Using a robots.txt like this:

Code: Select all

user-agent:*
disallow:/serendipity/plugin/
should help. I don't know if wildcards are allowed in disallow, then you could do:

Code: Select all

user-agent:*
disallow:/serendipity/plugin/spamblock_captcha*
To really only filter those. And, you can block any paths/referrers/googlebots via mod_rewrite rules of course.

Sadly, rel=nofollow does not work for <img> tags.

Regards,
garvin

robots

Posted: Thu May 26, 2005 7:10 pm
by kjoker
I have robots.txt in the main serendipity folder with this config:

User-agent: *
Disallow: /

But hm... robots still spidered my page. The one that is annoying is askjeeves.com. spidered categories also.

???

Re: robots

Posted: Fri May 27, 2005 12:50 pm
by garvinhicking
Then you should contact askjeeves and tell them to honour your spiders.txt file.

Of course you could create a block on HTAccess/VirtualHost level, but this is hard to maintain.

Regards,
Garvin

thanks :)

Posted: Wed Jun 01, 2005 3:15 pm
by kjoker
got it to work. actually i added the metatags noindex nofollow... and i also added another robots.txt in the root folder :)

Posted: Tue Aug 30, 2005 2:02 pm
by MySchizoBuddy
garvin, what else do u think should be disallowed for security reasons
Google sitemap file is one of them.
What else.

Posted: Wed Aug 31, 2005 1:27 pm
by garvinhicking
I don't think anything should be disallowed for security reasons. It's only about files accessible via HTTP, and we already have secured those files as good as possible.

Blocking any files is only a matter if people don't like the extra traffic of robots... I have not investigated which files the robots do not need to index.

Regards,
Garvin