Advanced archiving

Random stuff about serendipity. Discussion, Questions, Paraphernalia.
Post Reply
Rade

Advanced archiving

Post by Rade »

I reallly like this blogging software. It's clean and simple and features essential tools.

One thing I thought about was, if adding entire Webpages with all their inline elements to the media library wouldn't be a nice feature. Not that it would make much sense in the daily blogging business (why mirroring the internet).
But weblogs are a kind of diaries or even research tools. And as such they could be valuable archives for oneself in the future. But there is no way, all those links contained in the log would work in a few years, making the whole weblog a whole lot less valuable.

Perhaps even the script could fetch all webpages referenced in an entry when posting it, saving the contents of them in the background.

One can develop this idea much further but I think it's clear, what I intend to say.

Speaking of parsing the content of new entries, how does your system work? Are media library items statically linked against the blog entry or is it possible to change the position of an image in the directory structure or to change its name without disrupting the post? Does the media library keep track of all the blog-internal references? Just curious :)
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Advanced archiving

Post by garvinhicking »

Though I generally like your idea, this is a very hard thing to do.

If you reference a page, it doesn't suffice to just download the HTML code. You need to download and parse all javascripts, images, flash stuff and so on: And you need to download follow up pages, which is hard to find out; if a page links to other subpages which are not relevant to your link, how should Serendipity know?

This would turn Serendipity into a whole application with even more logic as Google. This is impossible.

You can, however, manually add .zip files of a spidered webpage (downloaded manully) and upload it to your media repository.

About your media question: Currently media items are not linked statically into the entry. So if you move an image, it gets lost inside the entry. This is very sad, but none of our developers had time to implement a statically linking yet. Though it's on our todo list for quite some time...

External references from an entry are kept inside seperate DB tables (serendipity_exits, serendipity_references).

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Rade

Re

Post by Rade »

Oh, I did not expect you to reinvent google... What I was thinking of, was to fetch a single HTML-file with it's images and stylesheets, not the entire site and its virtual surroundings. Just a kind of advanced download that creates a snapshot of the referenced page.
Of course, it's still not so easy to create a working snapshot as all relative paths may need some readjustment and one would have to find a suitable way to save it, but it's possible and perhaps not so hard to implement.

And now I will try and download the 0.8-version.
garvinhicking
Core Developer
Posts: 30022
Joined: Tue Sep 16, 2003 9:45 pm
Location: Cologne, Germany
Contact:

Re: Re

Post by garvinhicking »

Even that is much too hard. There are about a dozen ways to specify CSS files and images via HTML I can think of instaltny, and probably much more if I think longer about it ;)

So you better shouldn't hold your breath for this functionality, it may never come because of its complexity...

Regards,
Garvin
# Garvin Hicking (s9y Developer)
# Did I help you? Consider making me happy: http://wishes.garv.in/
# or use my PayPal account "paypal {at} supergarv (dot) de"
# My "other" hobby: http://flickr.garv.in/
Rade

Post by Rade »

Nja, too bad :D
Post Reply