Page 1 of 2

Migration from TXP: URI rewrite, Redirect, UTF8, Import

Posted: Mon Apr 24, 2006 12:36 pm
by Kossatsch
I am thinking about migrating from TXP and I have the following questions:

URI rewrite
Currently, with TXP, I am using the URI scheme http://www.domain.com/section/id/titleofthearticle. Making an s9y test install, I saw I could rewrite the URI to /id/titleofthearticle. Very fine. But s9y changed this to http://www.domain.com/index.php?/id/titleofthearticle. Did I make a mistake? I would like to have it like this: http://www.domain.com/id/titleofthearticle. Any chance to do this?

Redirect
In addition, I would like to change the domain. Any idea how to write the Redirect, folowing the URIs above?

UTF8
I imported my articles into another s9y install of a commercial provider to test it. It worked fine, really, including the comments. But it seems to have problems with Umlaute, as my TXP install is completely in UTF8, and - no matter whether I choose UTF-8 oder ISO8859-1, they are broken. Any idea about this?

Import
Another import problem is the following: I am using the TXP costum field to tag my articles. Any chance to migrate this to the s9y tagging plug-in?

So many problems, I know. Thanks in advance for your help and comments.

Re: Migration from TXP: URI rewrite, Redirect, UTF8, Import

Posted: Mon Apr 24, 2006 12:47 pm
by garvinhicking
Hi!

Great that you're evaluating Serendipity! We hope you like what you see, and that you came here to ask for some issues. :-)

About URI Rewrite: If s9y rewrites those URL with index.php?/ prefix, it means in your s9y configuration you did not set "URL Rewriting" to either mod_rewrite or Apache Errorhandling. Please choose one of them. Mod_rewrite requires the apache module, and apache errorhandling requires that "AllowOverride" is allowed to be set in an .htaccess file.

However, there is one more problem with your scheme: Usually serendipity requires a prefix to identify that you are viewing an entries. By default this is "/archives". So, "/archives/id/title" would work, but only "/id/title" will most probably cause problems as serendipity might not be able to discover that "/" is the base URL for archives. If you set your permalink string to that, you MUST set your archives path setting to "/" to try it out.

About the redirection, I don't fully understand. Do you want Serendipity to listen on two domain names, like www.example.com and www2.example.com? This simply works if you enable "HTTP Host auto-detection" in serendipity's configuration...

About UTF8: Did you also set your serendipity installation to use the UTF-8 charset? The importer actually only reads the MySQL DB and then puts the same contents into the s9y database. So if both DB tables are in UTF-8 collations, it shouldn't differ...

Importing custom field tags would not be hard - but it would require someone with TXP and PHP knowledge. The freetag plugin of serendipity has a very easy Table scheme, so it should be easy to fetch the tags from the TXP DB and just write them to the freetag DB table.

I hope you'll get some of these things sorted out, and that at least a part of my answer could help you. :-)

Best regards,
Garvin

Posted: Mon Apr 24, 2006 1:02 pm
by Kossatsch
Thanks for your quick answer. One more comment about the redirection problem. Currently, I have the following URI scheme

http//:www.olddomain.com/blogname/id/titleofarticle

This old direction will be deleted after installing s9y. The old URI should be redirected to

http://www.newdomain/archives/id/titleofarticle or to
http://www.newdomain/archives/titleofarticle

(basing on your remarks about the problem with only putting id). Concerning the second possibility I would like to know whether s9y remarks that there are probably two articles with the same title and changes it automatically or tells the user (as TXP does).

Posted: Mon Apr 24, 2006 1:16 pm
by garvinhicking
Hi!

Yes, such a rewrite would work! Do you have mod_Rewrite available? Then I can give you a rewrite pattern; else I can give you an Apache compatible index.php file for redirecting.

If you omit the "id" in the article, serendipity will deliver the first article with that title. Currently there is no interface to redirect the user to multiple matching articles - it's a good feature suggestion!

Best regards,
Garvin

Posted: Mon Apr 24, 2006 1:20 pm
by Kossatsch
garvinhicking wrote:Hi!

If you omit the "id" in the article, serendipity will deliver the first article with that title. Currently there is no interface to redirect the user to multiple matching articles - it's a good feature suggestion!
I'm not sure, but I would suggest to force s9y to add "-i" or "-ii" and so on to the article URI.

Posted: Mon Apr 24, 2006 1:24 pm
by garvinhicking
Hi!

Actually I'd think it would be better to just introduce a redirector page into s9y. Like "There are multiple matches for this URL. Please choose your destination: [exact links with posting date and body preview]".

Introducing "-1, -2..." patterins in the title is IMHO not so intuitive...

Regards,
Garvin

UTF-8 problem remains

Posted: Mon May 15, 2006 9:31 am
by Kossatsch
I finally did the import. But now I have several major problems:
  • The UTF-8 import did not work properly. Titles have been converted, entries not. Now I could export the database and convert the Umlaute. Should I convert them into HTML entities or anything else? This would be a clean way - but will the blog remain searchable after it (people may be searching for über, but not for über).
  • s9y put every content into the body-extended table. I would prefer to have it in the entry table. There should be an easy SQL command to do so (I simply do not remember it).
Thanks in advance for your replies.

Re: UTF-8 problem remains

Posted: Mon May 15, 2006 9:40 am
by garvinhicking
Hi!

Hm, did you configure your s9y to use UTF-8? And are both your TXP and the S9Y entry tables both in SQL UTF-8 collations?

Because s9y pipes all the raw input data as it gets them from MySQL into the S9Y database.

I wouldn't suggest to convert Umlauts to HTML-entites. Instead I'd suggest to make a SQL dump of the s9y table after the import and then use an editor to fixup the entities properly, and then re-import the SQL dump.

About URL rewriting: S9y builds URLs with the serendipity_makeFilename() function found in include/functions_permalinks.inc.php which you might want to patch up. About the difference in URLs, that is true. You might need to fix that by creating a mod_Rewrite rule that will replace a /856/ with "/856/notitle.htm".

To put everything from extended into body, you could use this SQL:

Code: Select all

UPDATE serendipity_entries SET `body` = concat(`body`, `extended`); 
UPDATE serendipity_entries SET `extended` = ''; 
HTH,
Garvin

Posted: Mon May 15, 2006 9:43 am
by Kossatsch
Something very annoying I have I add. The comments were imported completely arbitrarely - this makes it quite difficult to follow the imported conversations (okay, if you don't look at the date, but you won't do that with about 20 comments...).

Here is an example: http://roxomatic.de/archives/985/Social ... 5#comments

Posted: Mon May 15, 2006 9:53 am
by Kossatsch
use an editor to fixup the entities properly
Ähm, but where to? Can I simply replace the wrong entity with ä,ö,ü?(question of a bloody unicode ignorant)

At last, something very nice to add - s9y searches and finds words with HTML entities (I still have some rests of my very early MT blog with HTML entities in the TXP database).

Posted: Mon May 15, 2006 10:33 am
by garvinhicking
Hi!

The comments are imported in the order of their insertion into the TXP DB, and their timestamps should be converted. At the time I wrote the importer, TXP seemed to me to have no means to indicate a "thread order" of the comments, so they are in fact only ordered chronologically.

Is that different now, is a DB column available to check how a comment relates to another?

About your UTF question: Yes, just load the SQL file into your editor, replace all the umlats to ööü and then make sure to save the file in UTF-8 format when re-importing.
At last, something very nice to add - s9y searches and finds words with HTML entities (I still have some rests of my very early MT blog with HTML entities in the TXP database).
Oh, that must be a feature of MySQL then :-D

Best regards,
Garvin

Posted: Mon May 15, 2006 3:02 pm
by Kossatsch
I wouldn't suggest to convert Umlauts to HTML-entites. Instead I'd suggest to make a SQL dump of the s9y table after the import and then use an editor to fixup the entities properly, and then re-import the SQL dump.
HTML-entities in the titles are displayed as such. This doesn't work.

But after fixing up all entities (saving it as UTF-8 with PSPad), I re-imported the SQL dump and it is the same thing as before, even worse, because now all the entities in the titles (properly imported before) are messed up.

Posted: Mon May 15, 2006 3:10 pm
by garvinhicking
Hi!

Hm, how did you upload the SQL dump? Did you make sure that you used a MySQL client that was in UTF-8 mode?

Alternately you might want to save the file in ISO-8859-1 format and upload it that way; it seems your DB is running in ISO mode?!

Regards,
Garvin

Posted: Tue May 16, 2006 10:09 am
by Kossatsch
It was me, of course, who messed the Unicode thing up.
The comments are imported in the order of their insertion into the TXP DB, and their timestamps should be converted. At the time I wrote the importer, TXP seemed to me to have no means to indicate a "thread order" of the comments, so they are in fact only ordered chronologically.
The comments are messed up, indeed, but this is not so important to me. If you have a look at http://roxomatic.de/archives/1035/Vor-7 ... k#comments (yes, some unicode stuff is remaining), you see that s9y does not care about the date. Any chance to change this? I may redo the comments by hand - with about 500 comments not really funny...

Another important thing: I copied the images into a subdirectory of uploads (about 200 images) - then I tried to rebuild the preview pane (vorschauen erneuern). After producing some thumbnails the server returned an error - now it cannot read the file .empty any more. Any chance to do this another way?

Last thing: sometimes I am forced to work with s9y and IE. Then it is sometimes possible to me to load Plug-Ins from Spartacus, sometimes not. Any idea why?

Posted: Tue May 16, 2006 11:30 am
by garvinhicking
Hi!

I think the problem is that TXP stores the date of a comment in human readable format, not in a timestamp. Thus, s9y does a "strotime()" conversion, which seems to fail on your install.

What is the format of the date of your comments in the originating TXP Table?

About the "rebuild thumbnails": Sadly s9y chokes on directory prefixed with a "."; like the SVN created ones. This has been addressed in Serendipity 1.1, but still causes some woes on 1.0 which are harder to backport because of sever changes to the mediadatabase in 1.1...

By just removing those directories, you should be fine?

HTH,
Garvin