Page 1 of 2

Updating 'Media Gallery' to reference local image directory

Posted: Tue Jan 10, 2006 7:18 pm
by Marty_
Hi there,

I've recently 'seen the light' and have moved to serendipity. I also use 'album' (http://marginalhacks.com/Hacks/album/) for generating my photo albums. What I'd like to do is have the 'Media Gallery' part of Serendipity reference the existing photo albums I already have from disk, rather than having them duplicated into another directory or database.

Is there an easy way to do this using the plugin mechanism (or ideally, an existing plugin ;-)) ? Or should I edit serendipity_admin_image_selector.php or functions_images.inc.php ?

Any hints gratefully appreciated,

Thanks,

Martin

Re: Updating 'Media Gallery' to reference local image direct

Posted: Wed Jan 11, 2006 1:36 pm
by garvinhicking
Hi Marty!

There's no such plugin yet I know of, but yes - it should be doable via a plugin approach.

You might want to have a look at the Amazon Media Selector plugin, which does something like that for fetching Amazon images. With some PHP skill you should bend it into the right direction :)

In either case, you should be able to do all of this within plugins, without the need to edit core files. Just report here, if you have specific questions to our plugin API or sth. like that. I suggest you to fetch a recent 1.0 snapshot of Serendipity, because of the added phpDoc documentaiton there. :)

Regards,
Garvin

Posted: Mon Jan 16, 2006 2:45 pm
by Marty
Hi Garv et al,

I have it all working as I want it to now, and will attach a patch of my changes to this post. I appreciate that my changes may be considered a bit more reliant on a chainsaw than a scalpel, so I will need some help on how to best clean it up. To get it all to work, I made some changes to both functions_images.inc.php and serendipity_admin_image_selector.php. The most significant of these changes was to remove the usage of the 'serendipity_images' database table.

I figured that this table is superfluous for locally stored images, and introduces nastiness, such as having to keep the database contents synchronised with what's actually in the directories etc. Instead of using this table, I used the serendipity_traversePath to retrieve a full list of files in the uploads or equivalent directory, and then when information is needed on any file, query it at that time, rather than retrieving it from the database.

I realise that this breaks hotlinked files and possibly other things I don't know about, but I hope that we can come up with a solution for that problem which builds on the modifications I've made.

Once these changes were implemented, I just needed a very simple plugin for using 'album', which basically just rewrote the path to the thumbnail, the link that the thumbnail references and which directories to filter when doing the directory traversal.

Anyway, the patch can be seen here:
http://www.skynet.ie/~martin/pages/s9y. ... tory.patch

And the plugin code can be seen here:
http://www.skynet.ie/~martin/pages/sere ... or.php.txt

One other question: Why is there two seperate variables for the path to the 'uploads' directory: $serendipity['uploadHTTPPath'] and $serendipity['uploadPath'] ?

All feedback and comments welcome!

Thanks,

Martin

Posted: Mon Jan 16, 2006 3:14 pm
by garvinhicking
Hi Martin!

Well, dropping the serendipity_images table is a thing we will not be considering for the Serendipity Core repository.

The reason is performance and integration, mostly. Always going via file system calls over all files is a pain in the but, so a database really makes sense. It is also much better and easier for future usage cases, when image references are made to a DB ID instead of a filename.

This then enables you to rename an image without breaking referencing in postings. This wouldn't be possible without a database.

Also, storing comments to an image wouldn't work without a database. And you already mentioned hotlinks, which also won't work without a DB.

One other question: Why is there two seperate variables for the path to the 'uploads' directory: $serendipity['uploadHTTPPath'] and $serendipity['uploadPath'] ?
The HTTP Path is the path via HTTP to this repository. The uploadPath is a relative upload path within the filesystem. They can be different, for example if multiple installations within virtualhosts point to the same HTTP directory but different file system directory, or vice versa. Supersized.org requires such distinction. :)

So maybe you could look out for a solution that implements what you need, without breaking or getting rid of serendipity_images plus the function calls depending on it? :-)

Best regards,
Garvin

Posted: Mon Jan 16, 2006 4:24 pm
by Marty
Hi Martin!

Well, dropping the serendipity_images table is a thing we will not be considering for the Serendipity Core repository.
Yes, I wasn't suggesting that, I was suggesting it wouldn't always strictly be necessary for locally stored files. For additional features such as image comments and hotlinking, yes it is required. It could be re-implemented in such a way that comments and hotlinking are the only things stored in the database. That is, if you have no comments and no hotlinks, then you would have no need for the serendipity_images table.
The reason is performance and integration, mostly. Always going via file system calls over all files is a pain in the but, so a database really makes sense. It is also much better and easier for future usage cases, when image references are made to a DB ID instead of a filename.

This then enables you to rename an image without breaking referencing in postings. This wouldn't be possible without a database.
How does a database allow this? In the post, is it not stored as as a reference to the file location, ie 'uploads/pictures/this.jpg'? Traversing of the file system is already done to set up the '$paths' variable, and it would be trivial to to implement it in such a way that methods such as 'serendipity_getimagesize' and 'filesize' are not called until they are needed, ie lazy initialisation.
Also, storing comments to an image wouldn't work without a database. And you already mentioned hotlinks, which also won't work without a DB.
Yes, but the comments are useless if I then delete the image or move it. The serendipity_image database could use the path to the image as the database key, rather than a numeric key.
So maybe you could look out for a solution that implements what you need, without breaking or getting rid of serendipity_images plus the function calls depending on it? Smile
It doesn't need to be removed, but maybe it's not strictly necessary unless features like comments and hotlinking are used? It does replicate alot of information that is accessible via the file system, and replication of information is always a maintainence nightmare.

Cheers!

Martin

Posted: Mon Jan 16, 2006 4:30 pm
by garvinhicking
Hi!
How does a database allow this? In the post, is it not stored as as a reference to the file location, ie 'uploads/pictures/this.jpg'?
There once was a plugin (that was lost) which did NOT insert the image file reference into the post. Instead it returned "<!--image:4711-->" or some similar code. This was later transformed by querying the database (once, then it was cached). So if you renamed a file later on, the ID would stay the same and the code could get reparsed.
It doesn't need to be removed, but maybe it's not strictly necessary unless features like comments and hotlinking are used? It does replicate alot of information that is accessible via the file system, and replication of information is always a maintainence nightmare.
I would really recommend that the defualt serendipity functions use the serendipity_images table. It is much safer for forward compatibility and enhancements, and it takes up less performance because mime-types, author-permissions and many other meta-data can be stored there. So this is not really replication, it is just a level of meta-data cache and storage facility.

Plus, the functionality you want to achieve should perfectly work without all these core modifications. It's perfectly fine if your plugin does not require the serendipity_images table - but then you'd need to wrap your plugin hook calls around the existing framework, instead of generalizing your plugin's needs into the core. :-)

I'd still be very open to adjusting the core with any plugin hooks, of course. :-)

Best regards,
Garvin

Posted: Mon Jan 16, 2006 4:50 pm
by Marty
There once was a plugin (that was lost) which did NOT insert the image file reference into the post. Instead it returned "<!--image:4711-->" or some similar code. This was later transformed by querying the database (once, then it was cached). So if you renamed a file later on, the ID would stay the same and the code could get reparsed.
Okay, but this functionality doesn't currently exist in s9y right? And if you renamed the file in the file system, you'd still have to update it in the database table. And if you were going to update it in the database table using some s9y front end, you could have that update the direct <a href>'s in the entries anyway.
I would really recommend that the defualt serendipity functions use the serendipity_images table. It is much safer for forward compatibility and enhancements, and it takes up less performance because mime-types, author-permissions and many other meta-data can be stored there. So this is not really replication, it is just a level of meta-data cache and storage facility.
Okay, the serendipity_images table contains the following columns:

Code: Select all

 
 `id` int(11) NOT NULL auto_increment,
  `name` varchar(255) collate utf8_unicode_ci NOT NULL default '',
  `extension` varchar(5) collate utf8_unicode_ci NOT NULL default '',
  `mime` varchar(255) collate utf8_unicode_ci NOT NULL default '',
  `size` int(11) NOT NULL default '0',
  `dimensions_width` int(11) NOT NULL default '0',
  `dimensions_height` int(11) NOT NULL default '0',
  `date` int(11) NOT NULL default '0',
  `thumbnail_name` varchar(255) collate utf8_unicode_ci NOT NULL default '',
  `authorid` int(11) default '0',
  `path` text collate utf8_unicode_ci,
  `hotlink` int(1) default NULL,
Of those entries, all of the information is on the filesystem except the authorid and the hotlink.
Plus, the functionality you want to achieve should perfectly work without all these core modifications. It's perfectly fine if your plugin does not require the serendipity_images table - but then you'd need to wrap your plugin hook calls around the existing framework, instead of generalizing your plugin's needs into the core. Smile
Yes, I can implement this stuff as a plugin by only adding hooks for a couple of new events, but I think that by removing the data replication, it becomes easier to manage the s9y media gallery.

Cheers!

Martin

Posted: Mon Jan 16, 2006 4:55 pm
by garvinhicking
Hi!
Okay, but this functionality doesn't currently exist in s9y right?
No more, but I intend to recode it in the near future.
And if you renamed the file in the file system, you'd still have to update it in the database table. And if you were going to update it in the database table using some s9y front end, you could have that update the direct <a href>'s in the entries anyway.
The rename functioanlity currently already replaces both the filename and the database column for the filename.

Updating <a hrefs> would be a very intensive job of preg-matching. Having a specific format for referencing DB image files would be the only working approach.
Of those entries, all of the information is on the filesystem except the authorid and the hotlink.
That's right. And that's two pieces of information you can't easily store within the file. Plus, the DB holds all that as cached information. No need to get the mime type time and again, that would be very redundant.
Yes, I can implement this stuff as a plugin by only adding hooks for a couple of new events, but I think that by removing the data replication, it becomes easier to manage the s9y media gallery.
Again, it's not replication - it is caching plus metadata-storage. You'll block your way for future enhancements by going to the filesystem. And you'll decrease your performance with a filesystem meta-data storage.

Plus, some plugins require this table - so why break it? It would be a step backwards...

(I don't mean you any harm, so I hope you don't mind my arguments :-)

Regards,
Garvin

Posted: Mon Jan 16, 2006 5:23 pm
by Marty
And if you renamed the file in the file system, you'd still have to update it in the database table. And if you were going to update it in the database table using some s9y front end, you could have that update the direct <a href>'s in the entries anyway.

The rename functioanlity currently already replaces both the filename and
the database column for the filename.
And what if I rename the file outside of s9y? Or delete it ? Or move it?
That's right. And that's two pieces of information you can't easily store within the file. Plus, the DB holds all that as cached information. No need to get the mime type time and again, that would be very redundant.
Yes, we've agreed that we need a table to store the authorid and hotlink, although I suspect that you could store the authorid by directory rather than by file and make it alot easier to work with.

With the mime-type, you can use lazy initialisation to determine it as and when you need it. And you can be sure that the mime-type you end up with is correct and up to date-> if someone had replaced 'foo.jpg' with a text file, your database would be incorrect. Further, the performance of checking the mime type of a file has to be significantly less than a database query to determine the same (but possibly stale) information.
Again, it's not replication - it is caching plus metadata-storage. You'll block your way for future enhancements by going to the filesystem. And you'll decrease your performance with a filesystem meta-data storage.
I disagree. You must admit that 9 of the 11 fields of the table are duplicated and can contain stale information. Future enhancements can use whatever solution is most appropriate for handling hotlinks + authorids. The performance of a database query for the data on a single image is going to be vastly slower than querying the filesystem for the same information.
Plus, some plugins require this table - so why break it? It would be a step backwards...
Don't they all access it via the functions_images.inc.php? They should notice no difference in functionality unless they directly access the serendipity_images table.
(I don't mean you any harm, so I hope you don't mind my arguments Smile
Not at all! A bit of healthy debate never hurt anyone :-)

Martin

Posted: Mon Jan 16, 2006 5:47 pm
by garvinhicking
Hi!
And what if I rename the file outside of s9y? Or delete it ? Or move it?
This is not intended operation. You cannot expect serendipity to fix your links to renamed files if you don't use s9y to rename your files.
Yes, we've agreed that we need a table to store the authorid and hotlink, although I suspect that you could store the authorid by directory rather than by file and make it alot easier to work with.
But that's not how people seem to use it; they do want to have per-author images.
With the mime-type, you can use lazy initialisation to determine it as and when you need it.
You say that so easily, when it fact it isn't. You don't know when you need to "lazily" initialize it...
And you can be sure that the mime-type you end up with is correct and up to date-> if someone had replaced 'foo.jpg' with a text file, your database would be incorrect.
Why would someone replace the internal s9y media files with other files, without the use of s9y? It's the s9y media database where people are operating in, so they need to use s9y or a plugin to modify content there. They can use FTP upload and synchronizing of course. This is like asking "why does windows not boot up, if I copy a PNG image into it's swapfile?" :-)
Further, the performance of checking the mime type of a file has to be significantly less than a database query to determine the same (but possibly stale) information.
No, it's not, because the DB handle returns all the information at once. So it's performance is always realted to a single DB query call.

Getting 100 Mimetypes is only a single DB call vs. 100 filesystem calls.
Again, it's not replication - it is caching plus metadata-storage. You'll block your way for future enhancements by going to the filesystem. And you'll decrease your performance with a filesystem meta-data storage.
I disagree. You must admit that 9 of the 11 fields of the table are duplicated and can contain stale information.
They can't contain stale information, because s9y API calls ensure the validity of the database. All filesystem calls need to use the s9y API, that's what it's there for.

It is duplication, though. But only single-way for caching.
Future enhancements can use whatever solution is most appropriate for handling hotlinks + authorids. The performance of a database query for the data on a single image is going to be vastly slower than querying the filesystem for the same information.
When opening the media database, it mostly queries more than one image. When a blog page renders with 15 images, a plugin that uses the database could stack up those request in one single DB call, which would be faster than 15 filesystem calls.
Don't they all access it via the functions_images.inc.php? They should notice no difference in functionality unless they directly access the serendipity_images table.
I think there are plugins that directly query and join serendipity_images tables.
Not at all! A bit of healthy debate never hurt anyone :-)
;) I only wish some other people would tell their thoughts here. :-)

Regards,
Garvin

Posted: Tue Jan 17, 2006 2:34 pm
by Marty
And what if I rename the file outside of s9y? Or delete it ? Or move it?

This is not intended operation. You cannot expect serendipity to fix your links to renamed files if you don't use s9y to rename your files.
It still might happen. The concept of allowing easy insertion of images into blog entries is an excellent feature. Forcing the user to only manipulate those images through s9y is not.

To support this feature, a better solution is a plugin which has something like 'update link', which would allow the user to change the contents of an arbitrary <a href="thing"> to <a href="newthing">. That would support the updating of any kind of link, including images.
You say that so easily, when it fact it isn't. You don't know when you need to "lazily" initialize it...
As far as I can see the only time that you need to know this information is if you are calling 'serendipity_fetchImagesFromDatabase' or 'serendipity_fetchImageFromDatabase'. Even in these cases, you probably don't need the mime-type each time, and can use the filename to query for it should you actually need it.
Why would someone replace the internal s9y media files with other files, without the use of s9y? It's the s9y media database where people are operating in, so they need to use s9y or a plugin to modify content there. They can use FTP upload and synchronizing of course. This is like asking "why does windows not boot up, if I copy a PNG image into it's swapfile?" Smile
But why force people to keep an 'internal s9y' copy of their media? Why not generalise it so that all of the s9y functionality works equally with any local image store?
Further, the performance of checking the mime type of a file has to be significantly less than a database query to determine the same (but possibly stale) information.

No, it's not, because the DB handle returns all the information at once. So it's performance is always realted to a single DB query call.

Getting 100 Mimetypes is only a single DB call vs. 100 filesystem calls.

Again, it's not replication - it is caching plus metadata-storage. You'll block your way for future enhancements by going to the filesystem. And you'll decrease your performance with a filesystem meta-data storage.
Getting 100 mimetypes is the worst case scenario. I timed how long it takes to do it via file systems calls and in a DB call and this is the result:

Using database:
0.206089973
0.172338009
0.184152126
0.210631132
0.203181028
Avg: 0.195278454

Using filesystem:
0.517910957
0.134535074
0.153919935
0.125296831
0.189307928
Avg: 0.224194145

This suggests that the performance is similar in the non-optimised filesystem method. In any case, times in the order of .2 of a second are insignificant compared to the network latency of loading the page.
When opening the media database, it mostly queries more than one image. When a blog page renders with 15 images, a plugin that uses the database could stack up those request in one single DB call, which would be faster than 15 filesystem calls.
The blog will render by using <a href=".."> to make the HTTP client download the images, the database is not used in this case.
Don't they all access it via the functions_images.inc.php? They should notice no difference in functionality unless they directly access the serendipity_images table.

I think there are plugins that directly query and join serendipity_images tables.
We can look at those when and if the time comes then.

I will re-write the filesystem method to optimise the number of calls to the file system, and post some new performance data when I have it.

Cheers,

Martin

Posted: Tue Jan 17, 2006 3:08 pm
by garvinhicking
Hi!
It still might happen. The concept of allowing easy insertion of images into blog entries is an excellent feature. Forcing the user to only manipulate those images through s9y is not.
We don't force the user, because we provide API functions.

If you maintain images via a Windows Application and then use Word to edit the database file, you also don'T expect the Windows Application to work properly after that ;)
To support this feature, a better solution is a plugin which has something like 'update link', which would allow the user to change the contents of an arbitrary <a href="thing"> to <a href="newthing">. That would support the updating of any kind of link, including images.
But that might not properly work because there are all kind of Regular Expressions where one inserted an image on another server that he doesn't want to change.

Parsing this raw HTML is much harder than using a DB-reference markup.
You say that so easily, when it fact it isn't. You don't know when you need to "lazily" initialize it...
As far as I can see the only time that you need to know this information is if you are calling 'serendipity_fetchImagesFromDatabase' or 'serendipity_fetchImageFromDatabase'. Even in these cases, you probably don't need the mime-type each time, and can use the filename to query for it should you actually need it.
That's right, but those calls are always required when browsing the media gallery.
But why force people to keep an 'internal s9y' copy of their media?
It's not a copy, it's a cached representation. That is much different, because it allows to be updated in single- or eval dual-ways.

We/You can add functionality to make s9y be able to import other directories into that storage; but the internal storage should use the Database for storage of Meta-Data and fast access.
Getting 100 mimetypes is the worst case scenario. I timed how long it takes to do it via file systems calls and in a DB call and this is the result:
Well, 100 is a bit too much, but usually we display 24 images or so per page. That would mean 24 FS calls.
When opening the media database, it mostly queries more than one image. When a blog page renders with 15 images, a plugin that uses the database could stack up those request in one single DB call, which would be faster than 15 filesystem calls.
The blog will render by using <a href=".."> to make the HTTP client download the images, the database is not used in this case.
I was talking about abstraction when we fetch meta-data of images for specific display reasons, like dynamic width, or the use of <object> tags for other file types.
I will re-write the filesystem method to optimise the number of calls to the file system, and post some new performance data when I have it.
I still refuse to drop serendipity_images as the preferred storage mechanism because of the reasons already mentioned. Using the FS to always fetch file data is a dead-end and not enhancible or a wise thing to do, when we already have an established database storage.

Best regards,
Garvin

Posted: Tue Jan 17, 2006 4:09 pm
by Marty
Hi!
We don't force the user, because we provide API functions.

If you maintain images via a Windows Application and then use Word to edit the database file, you also don'T expect the Windows Application to work properly after that Wink
That is not an appropriate analogy. If you use two different applications to maintain the same images on disk, you don't expect to have to 'tell' each of them when you modify those images or add more. Each application should pick up this information automatically.
But that might not properly work because there are all kind of Regular Expressions where one inserted an image on another server that he doesn't want to change.

Parsing this raw HTML is much harder than using a DB-reference markup.
Who cares if it's harder, the computer will do it. In any case the difference is doing a search for '<!-- image id=12345>' versus searching for '<a href="uploads/picture12345.jpg">' - not a huge difference. It also represents a one-off change, rather than a translation that has to take place each time a blog entry is read from the database. Further, since the blog entries are stored in the database in their final, HTML form, rather than a version which needs to be processed before display, it makes it easier to handle those entries on the database level. For example, if you were importing the blog-entry database table into another blog system.

If you were to change the serendipity core so that in the future, all attempts to insert an image using the media gallery instead inserted a comment of the form you suggest <!-- image id="12345" -->, then you are breaking backwards compatibility. All images inserted after that core change would be in the new format, and all images inserted before would use <a href>s. Thus if the user attempted to rename an image using s9y, all the blog entries after the core upgrade would automatically handle it, but all of the old entries would not.

A much simpler solution is to do one-off link renaming. I will even write this feature for you ;-).
That's right, but those calls are always required when browsing the media gallery.
Ok, then they can't be optimised away.

It's not a copy, it's a cached representation. That is much different, because it allows to be updated in single- or eval dual-ways.

We/You can add functionality to make s9y be able to import other directories into that storage; but the internal storage should use the Database for storage of Meta-Data and fast access.
I agree that a database should be used for meta-data. But it should not duplicate meta-data which already exists in the filesystem.
Well, 100 is a bit too much, but usually we display 24 images or so per page. That would mean 24 FS calls.
Okay, performance data for 24 images is:

24 images with database:
0.05708003
0.031708002
0.036072016
0.134638071
0.028676033
Average: 0.05763483

24 images with filesystem:
0.03143096
0.027669907
0.029160023
0.036036015
0.033604145
Average: 0.03158021
When opening the media database, it mostly queries more than one image. When a blog page renders with 15 images, a plugin that uses the database could stack up those request in one single DB call, which would be faster than 15 filesystem calls.
It is not faster. See data above.
I still refuse to drop serendipity_images as the preferred storage mechanism because of the reasons already mentioned. Using the FS to always fetch file data is a dead-end and not enhancible or a wise thing to do, when we already have an established database storage.
The reasons you've mentioned are:

1. Performance. Testing proves that this is not an issue.
2. Hotlinks and authorid. These can still be stored in the database, but can use the path to the file for the key, rather than an image id.
3. You hope to write a plugin that will allow image renaming without creating broken links. I will write a plugin for you which does this for you which works by parsing the a href tag.

Advantages of using the filesystem:

1. Always have up to date information on the directory contents and the file meta data.
2. New images can be added simply by copying them to the appropriate directory.
3. Performance is the same or better.

Martin

Posted: Tue Jan 17, 2006 4:20 pm
by garvinhicking
Hi!
1. Always have up to date information on the directory contents and the file meta data.
2. New images can be added simply by copying them to the appropriate directory.
3. Performance is the same or better.
So how would you synchronize the meta-data for files added, removed or renamed outside of Serendipity?

What you are mentioning seems to lead up to a "live" synchronizing of the media database with the filesystem on each media database call. That would be fine for me, if it can be optionally disabled for people who don't like that performance hit. People who don't manipulate the filesystem shouldn't take a performance penalty for operations on every file time and again.

Regards,
Garvin

Posted: Tue Jan 17, 2006 4:42 pm
by garvinhicking
Another question: How would you support "Order by filesize/author/image size/upload date/mimetype" image ordering and pagination? You'd always need to query/fetch ALL files in the directory to support that.

Having 1000 files in a directory would then mean to query each file for its image dimensions and mimetype; whereas in the DB you could instantly order by that attribute.

So if you already need to synchronize the filenames within the DB to be able to correlate metadata, you'll not be able to do without the DB information. In which case you can then store all the metadata in the DB for easier retriving, and need not to mangle with first getting files from the DB and then doing other lookups on the FS for metadata.

Thus I think one would need to focus on well-done synchronizing methods. You could build a hash of all existing files plus their size and only call the synchronize mechanism if this hash changes...

But either way, it really seems the best idea to use the DB as primary data storage and retrieval object, and only focus on implementing a good update mechanism to update the DB according to changes made in the FS.

Regards,
Garvin