Page 1 of 1

Bug in next page code?

Posted: Sun Nov 06, 2005 11:11 am
by Guest
Hi,

It looks like the 'next page' link code is relative to the path being viewed. This means that if there is an invalid link which causes the blog to be displayed by errordocument, then the next page link does not work.

I noticed this when using a google sitemap crawler, as it was going into endless loops after somehow finding its way to a url like http://www.exampleblog.com/serendipity/pages

In this example, the next page link shows:
http://www.exampleblog.com/serendipity/pages/P2.html

then it finds:
http://www.exampleblog.com/serendipity/pages/P2/P2.html
http://www.exampleblog.com/serendipity/ ... P2/P2.html
etc etc

This might cause issues for search engine bots too?

Cheers
Rob

Re: Bug in next page code?

Posted: Sun Nov 06, 2005 2:12 pm
by garvinhicking
This sounds more like you misconfigured your Serendipity URL paths. What did you set your base URL, relative HTTP path etc. to?

And what's your URL?

Best regards,
Garvin

Posted: Sun Nov 06, 2005 9:41 pm
by Guest
Hi Garvin,

URL is http://www.hypnotherapyregister.com/register/

Settings are:
Full path - /home/XXX/XXXXtherapyregister-www/register/
Uploads path - uploads/
Relative path - /register/
Relative template path - templates/
Relative upload path - uploads/
URL to blog - http://www.hypnotherapyregister.com/register/
Autodetect used HTTP-Host - NO

You can see the behaviour if you type an incorrect URL, e.g.

http://www.hypnotherapyregister.com/register/pages

(somehow our sitemap crawler is picking up this url - but I can't see an incorrect url in the site)

and then the next page link is incorrect.

Cheers
Rob

Posted: Sun Nov 06, 2005 9:50 pm
by garvinhicking
Okay, the problem more lies in the URL that you start from, which is invalid (as it produces a 404, which is caught by s9y)

The serendipity path parsing always takes the current page into account, so sadly the only way to prevent this is to prevent "invalid" URLs like your /register/pages in first instance...

Best regards,
Garvin

Posted: Sun Nov 06, 2005 10:01 pm
by Guest
Hi Garvin,

Thanks for the lightning fast reply!

This is what I mean - if a page generating a 404 response is called, then the next page function compounds the problem by using the incorrect url as the starting point. This could be due to a mistyped url or external url - or an external link in which is wrong.

Is it necessary to have the next page code perform this way? It would be better if it used the correct path regardless so that the potential for such loops is minimised.

On a separate point, is it possible to remove the full path line from my previous post please as this contains my user name.

Many thanks
Rob

Posted: Sun Nov 06, 2005 10:03 pm
by garvinhicking
The problem just is, that many plugins operate on the "maybe wrong URL". So we can't actually say if an URL is invalid or not, and thus can't easily change a URL into a different URL without breaking plugins like staticpages.

Thus the only way would be to not use/put links to 404 pages anywhere - you'd need to search where your spider catches up on a "/pages" URL, and then we should focus on fixing that...?!

(I'll fix your previous posting!)

Regards,
Garvin

Posted: Sun Nov 06, 2005 10:11 pm
by Guest
garvinhicking wrote:Thus the only way would be to not use/put links to 404 pages anywhere - you'd need to search where your spider catches up on a "/pages" URL, and then we should focus on fixing that...?!
Absolutely - that is what I am currently trying to work out. But thought it would be best to raise the issue spotted in case this was a bug.

Thanks again for your help,
Rob

Posted: Sun Nov 06, 2005 10:31 pm
by garvinhicking
If you can isolate the place where it happened, please tell us! :)

Regards,
Garvin

Posted: Sun Nov 06, 2005 10:58 pm
by Guest
Hi Garvin,

Yes, I finally found it - static pages have an hash link for the header, i.e.

Code: Select all

<h4 class="serendipity_title"><a href="#">Static Page title</a></h4>
The sitemap crawler I am using reads this as '#' and then jumps to the conclusion that it is a page in its own right rather than an anchor within a page - so the url ends up being the pages directory.

So it looks like a bug in the crawler of the sitemap generator, not s9y.

Cheers
Rob

ps - anyone else using phpSitemapNG might want to watch out for this. The fix is simple, simply exclude 'pages/P2' from the scan.[/code]

Posted: Sun Nov 06, 2005 11:09 pm
by garvinhicking
Great detective work! :)

I do think it's a bug in the sitemap crawler, thanks for letting us know :)

Regards,
Garvin