Page 1 of 1
Karma causing 404's with search engines?
Posted: Sun Sep 17, 2006 1:48 pm
by Michael Harrison
I apologize if this is a dupe but couldn't find an instance of this problem in the forum although I can't quite believe I'm the only one who's seen this problem.
Google and Yahoo have reported a number of odd 404 errors in connection with my 1.0 s9y install. As near as I can tell they end up being caused by the Karma voting links. I don't have concrete proof of this but am wondering if anyone else has see those engines reporting that a link returns a 404, but otherwise finding that you can actively browse to the link in question.
How do I go about debugging these problems? The error log never has enough information that I can tell where the errors are really occurring.
Re: Karma causing 404's with search engines?
Posted: Sun Sep 17, 2006 4:46 pm
by garvinhicking
Hi!
Usually 404 errors can be found in the ACCESS log of your webserver, not in the error log.
There you should be able to see which pages the bots do not find.
Also, what's your URL, then I could check if your install is in order.
HTH,
Garvin
Posted: Mon Sep 18, 2006 4:13 am
by Michael Harrison
My blog is at
http://www.dragonseye.com/blog
Unfortunately I don't have direct access to my error logs but I'll have to see if I can get them from my host.
Posted: Mon Sep 18, 2006 5:15 am
by judebert
Looks up and functional. Your error logs won't have the 404s, only the access log. Webstats, AWStats, and other tracking programs use the access log to show you who visited the site.
What kind of redirection are you using in the Serendipity config? Is it the mod_rewrite or the mod_error? The mod_error redirection checks for file existence first, so it might signal 404 errors.
Posted: Mon Sep 18, 2006 12:00 pm
by Michael Harrison
judebert wrote:Looks up and functional. Your error logs won't have the 404s, only the access log. Webstats, AWStats, and other tracking programs use the access log to show you who visited the site.
Sorry, mis-read your post above (and I was sleepy)
I've found one example of an odd error, perhaps you can advise...
66.249.65.134 - - [31/Aug/2006:07:50:35 -0700] "GET /forum/viewtopic.php?t=4069&view=next&sid=c7a9d6760d2ef4cd32133ffaed007eef HTTP/1.1" 200 8164 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
66.249.65.134 - - [31/Aug/2006:07:51:18 -0700] "GET /pcg/PCGGI/index.html HTTP/1.1" 302 238 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
66.249.65.134 - - [31/Aug/2006:07:52:47 -0700] "GET /blog/categories/8-Holograms/pages/gallery/v/family/MichaelH/Holograms/pages/contactform/gallery/v/family/MichaelH/Holograms/pages/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right/P1.html HTTP/1.1" 200 12450 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
66.249.65.134 - - [31/Aug/2006:07:52:48 -0700] "GET /blog/categories/8-Holograms/pages/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right/pages/contactform/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right/pages/gallery/v/family/MichaelH/Holograms/SandDollarLeft_amp_Right.jpg.html HTTP/1.1" 200 12465 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
The bot goes from a mundane index page to a url that doesn't exist anywhere on my site. I'm not sure how that concatenation of s9y and Gallery urls came about.
judebert wrote:What kind of redirection are you using in the Serendipity config? Is it the mod_rewrite or the mod_error? The mod_error redirection checks for file existence first, so it might signal 404 errors.
I'm using mod_rewrite.
Posted: Mon Sep 18, 2006 12:19 pm
by garvinhicking
Hi!
This concatenation can happen if you somewhere use relative URLs to link to your gallery/static pages.
Like if a link is
pages/gallery/v/family/MichaelH/Holograms.html
instead of
/pages/gallery/v/family/MichaelH/Holograms.html
You might end up with unlimited recursion, because the relative paths add up and up and up, because s9y just matches by ID and not a full URL string.
Thus you'll need to check your blog thoroughly for any relative links that might cause this.
HTH,
Garvin
Posted: Mon Sep 18, 2006 12:54 pm
by Michael Harrison
Thanks. I've fixed a number of references that didn't start with http:// and should have. I'll see how things go from here.
My thanks to both of you for your help.