Page 1 of 1

Two funky situations with incorrect urls

Posted: Fri Mar 11, 2005 10:26 pm
by petree2
By default, the .htaccess configuration is setup such that urls that would result in a 404 are just servered the index page.
From my .htaccess (serendipity lives at /blog/)

Code: Select all

ErrorDocument 404 /blog/index.php
This might seem like a good idea, but for search engines its not a good one. Potentially search engines could link to an invalid page because there was some interesting content from your front page. Yuck.

I'm not sure the best course of action to fix this, but if you use a fully qualified url it'll redirect instead of just serving the page. For example

Code: Select all

ErrorDocument 404 http://brainsoup.net/blog/index.php
But there's another issue...

normally archive urls are archives/###-Title-with-dashes-instead-of-spaces.html where ### is the entry id.

But serendipity only checks the number and then returns the page. This is bad for two reasons...anyone can change the urls of your site to be something that you probably wouldn't want (aka ###-I-secretly-hate-all-black-people.html) and that's a completely valid url.

Secondly it can dilute your rank in search engines. Because if people mistype urls and then link to them, you won't be getting pagerank credit for that link, the new (potentially inflamitory) url will get them.

Posted: Sat Mar 12, 2005 5:43 am
by Tys
I have to agree that I'm not impressed with a lot of the rewrite rules. I would a non-exsistant page to goto a real 404, a self created subdirectory to actually goto that subdirectory instead of being redirected to the main page, etc.

To me the rewrite rules should only be used in specific cases, and leave the rest like normal. /shrug, maybe I just have different ideas on what the point of them is.

Posted: Sat Mar 12, 2005 5:22 pm
by Wolfgang
I have a little photo-gallery (simple php-gallery) which has it's own URL rewriting with a .htaccess in it's subdirectory to produce search engine friendly html URLs. For some reason that I'm not able to figure out, this doesn't work. I have to deactivate the galleries URL-rewriting because otherwise it's internal links in it's overview (to the photos) will always kick you to serendipitys index :(

So I'm not too happy as well with the current solution, especially since an external counter (integrated in an iframe within a html nugget) only works on the index. On all other pages I see the upper left part of serendipity index within the iframe :)

Those two examples are due to the rewrite rules, but since I'm not advanced enough to fiddle with the htaccess myself I have to live with it or to disable Serendipitys rewriting (which I don't want in any case). So for now no counter and no short-URLs in the gallery.

Posted: Sun Mar 13, 2005 1:52 pm
by garvinhicking
First off, if a real 404 situation is found, Serendipity emits a 404 header. So no problem here.

We need the 404 magic to have custom permalinks and wildcard names, so there's no way around it.

Second, about changing the titles: We could easily also check the URL name if it matches. Even a plugin could make that so. But I must admit that I personally really like the way we have wildcard entry names, I use it quite often to access my articles or those of friends. If we were to change it, it would create a bit of hash-lookup performance hit and multilingual entries cannot easily be checked (because they have different titles but the same entry id). Also it allowed us in the past to be flexible with adapting the permalink structure. Of course I see the reason you mean with people reflecting pagerank upon you with creating "bad linknames". But I think those people should be seperately dealt with - it's a thing just like spam, if we give in and adapt to it, we loose functionality, just because few people abuse the system.

Wolfgang, about your problem: Your gallery should just overwrite the ErrorDocument syntax with a proper directory, then you'd have no problem inside the subdirectory.

Regards,
Garvin

Posted: Mon Mar 14, 2005 8:27 am
by Wolfgang
First off, if a real 404 situation is found, Serendipity emits a 404 header. So no problem here.
Hm, seemingly I haven't found such a situation yet. I can enter whatever address I want into my browser. Since I'm running serendipity, I haven't seen the 404 anymore.

Installed it for a friend and he has running apache 2.x and php 5.x on his webspace and indeed on his site there's the 404 if it's a complete crazy address but on my installation the 404 is non existant anymore.

Posted: Mon Mar 14, 2005 4:25 pm
by garvinhicking
Wolfgang - you need to look at your HTTP headers. Even though the s9y page looks like the index page, in fact it is the 404 error page with a 404 error header which Google and others will index as a 404 not found page with no content. :-)

You only get your custom 404 from the hoster if you set URL Rewriting to "None".

Regards,
Garvin

Posted: Mon Mar 14, 2005 9:40 pm
by Wolfgang
Hi garvin,

I'm not sure where to look for the http headers but i think they should be visible in the source of that page (somewhere in the head section)


Now if you follow this link, which doesn't exist, you'll land on the index and i can't see any 404:

http://www.wolfgang-stommel.de/notthere/test.php

Posted: Mon Mar 14, 2005 9:46 pm
by garvinhicking
Hi Wolfgang!

Your page actually shows me a 404 HTTP Header perfectly alright. So there's nothing wrong on your side.

I am using Mozilla and the LiveHTTP Headers extension, if you want to try it. :-)

Regards,
Garvin

Posted: Mon Mar 14, 2005 11:06 pm
by Wolfgang
Hm ok I'll trust you on that one you're the boss :D

Is there any way to verify that without this magic extension?

Posted: Tue Mar 15, 2005 12:27 pm
by garvinhicking
Yes, there is:

You can create a telnet connection to your host on port 80:

telnet www.wolfgang-stommel.de 80
GET /notthere/test.php

--> then you will receive the response from your server, including the HTTP Headers on top of the response.

Or you can try http://www.rexswain.com/httpview.html :-)