Page 1 of 1

Advanced archiving

Posted: Sun Jan 16, 2005 5:55 pm
by Rade
I reallly like this blogging software. It's clean and simple and features essential tools.

One thing I thought about was, if adding entire Webpages with all their inline elements to the media library wouldn't be a nice feature. Not that it would make much sense in the daily blogging business (why mirroring the internet).
But weblogs are a kind of diaries or even research tools. And as such they could be valuable archives for oneself in the future. But there is no way, all those links contained in the log would work in a few years, making the whole weblog a whole lot less valuable.

Perhaps even the script could fetch all webpages referenced in an entry when posting it, saving the contents of them in the background.

One can develop this idea much further but I think it's clear, what I intend to say.

Speaking of parsing the content of new entries, how does your system work? Are media library items statically linked against the blog entry or is it possible to change the position of an image in the directory structure or to change its name without disrupting the post? Does the media library keep track of all the blog-internal references? Just curious :)

Re: Advanced archiving

Posted: Mon Jan 17, 2005 9:36 am
by garvinhicking
Though I generally like your idea, this is a very hard thing to do.

If you reference a page, it doesn't suffice to just download the HTML code. You need to download and parse all javascripts, images, flash stuff and so on: And you need to download follow up pages, which is hard to find out; if a page links to other subpages which are not relevant to your link, how should Serendipity know?

This would turn Serendipity into a whole application with even more logic as Google. This is impossible.

You can, however, manually add .zip files of a spidered webpage (downloaded manully) and upload it to your media repository.

About your media question: Currently media items are not linked statically into the entry. So if you move an image, it gets lost inside the entry. This is very sad, but none of our developers had time to implement a statically linking yet. Though it's on our todo list for quite some time...

External references from an entry are kept inside seperate DB tables (serendipity_exits, serendipity_references).

Regards,
Garvin

Re

Posted: Mon Jan 17, 2005 8:00 pm
by Rade
Oh, I did not expect you to reinvent google... What I was thinking of, was to fetch a single HTML-file with it's images and stylesheets, not the entire site and its virtual surroundings. Just a kind of advanced download that creates a snapshot of the referenced page.
Of course, it's still not so easy to create a working snapshot as all relative paths may need some readjustment and one would have to find a suitable way to save it, but it's possible and perhaps not so hard to implement.

And now I will try and download the 0.8-version.

Re: Re

Posted: Mon Jan 17, 2005 8:10 pm
by garvinhicking
Even that is much too hard. There are about a dozen ways to specify CSS files and images via HTML I can think of instaltny, and probably much more if I think longer about it ;)

So you better shouldn't hold your breath for this functionality, it may never come because of its complexity...

Regards,
Garvin

Posted: Mon Jan 17, 2005 10:05 pm
by Rade
Nja, too bad :D