Page 1 of 1
Language file recommendation
Posted: Thu Nov 30, 2006 10:50 am
by pronto
I've been using Serendipity for more than a year now and so far the experience has been most satisfying. Lately I decided to contribute a bit more into the community by translating it into my native language -- Estonian. However I ran into several issues which I'm sorry to say are rather typical (I've been contributed translations to several open-source projects).
I've been reading the forum and found that my problems are not unique and several other translation project have experienced similar drawbacks -- the language structure is too indo-european specific. I give you an example:
The keyword "Comments" is used for several occasions, each time in a slightly different context. In english language and in most of indo-european languages the word always remains the same and the same keyword can be used. As estonian language is a very different the word itself changes because of it. Sometimes it is because of the grammatics, sometimes because of the context itself. You can't just replace words.
Anyway .. I strongly recommend to change keywords so that they are only used on a single page (even though there are rather obvious exceptions). This makes translation MUCH easier, because even though you have to shift through much bigger pile of keywords eventually you don't have to worry about that at some point the keyword you are translating will resurface on different page and on different context rendering whole effort silly at best.
Re: Language file recommendation
Posted: Thu Nov 30, 2006 11:22 am
by garvinhicking
Hi!
Hm, I understand that. However as you also already found out, this is because we usually strife for low redundancy. "Comments" for example is used very often - if we made it unique for each insertion, we'd have 10 or 15 strings all defined as "Comments".
This is a problem in 2 ways:
1. Performance. The more strings that need to be loaded, the more memory and compilation time it consumes.
2. Ease of change. If you wanted to customize your theme, you'd have to be very specific which "Comments" you need to use. Template designers might get very confused why there are so many "comments" strings.
So I think we should focus on few strings where this occurence is problematic. Maybe you'd like to compile a list of keywords where abstraction is absolutely necessary?
Best regards,
Garvin
Re: Language file recommendation
Posted: Thu Nov 30, 2006 12:40 pm
by pronto
garvinhicking wrote:Hm, I understand that. However as you also already found out, this is because we usually strife for low redundancy. "Comments" for example is used very often - if we made it unique for each insertion, we'd have 10 or 15 strings all defined as "Comments".
I agree. However I don't think the overhead is going to be more than 10%. Anyway, I think that the rule of the thumb should be that you try out the "full meaning" of these keywords. If they are different a new line should be created. For example "Comments" in comment editor means actually "Number of comments per editor page" while "Comments" in Blog entry actually means "Total number of comments submitted to this entry". As you can see even in English they mean different things -- one is a paging parameter, another a total.
I can give another example that I think has to be changed at some point to make translating easier: sorting. Actually, I think that this should be changed for English version sake as well. Right now we have such thing "Sort order" and serendipity uses it for both Sort block header AND for sort direction (ascending / descending) label. In Estonian language it is theoretically possible to find a surrogate word that fits both of the criteria, but it's very awkward and in every sense a kludge.
These two should be enough for starter, I guess. I'll keep on translating and when I run into similar problems I'll post them here.
Re: Language file recommendation
Posted: Thu Nov 30, 2006 12:48 pm
by garvinhicking
Hi!
Maybe you could compile a list of problematic keywords, then we try to take care of them one by one and add new constants?
For now, we have the "COMMENTS" and "SORT_ORDER" that should be taken care of. We will need to grep all plugins and sourcecode files to see where they happen and where they need to be distincted. I doubt I will find the time for this before the upcoming 1.1 release mid-december, so I'd like to make those constant changes for the 1.2 branch after that...
Thanks for participating, your help is much required!
Best regards,
Garvin
Posted: Thu Nov 30, 2006 3:14 pm
by pronto
BTW .. I suspect that this idea has been discussed at least billion times, but if the language file size is an issue why is it is still in a singe chunk? What I mean is that isn't it cheaper (resource wise) not to load the admin keywords every time somebody visits the blog? Again it makes both maintaining and translating much easier.
Posted: Thu Nov 30, 2006 3:37 pm
by garvinhicking
Hi!
Yes, it has come up, but no one yet took on the task to seperate Admin keywords from Frontend keywords. It is a huge load of work to do, and I would be most thankful for anyone doing it.
Technically it's also a challenge because some functions are shared between frontend and backend, and it might not always be clear that some backend functionality can be embedded into the frontend (simple example: The media manager usually is a backend part, but some plugins make it available to the frontned).
Best regards,
Garvin
Posted: Thu Nov 30, 2006 5:27 pm
by pronto
Ok ... cutting the big boy down into smaller packages is a problem. However loading the smaller chunk is easy. Because language pack is really just another PHP include file, all you have to do is to add an global identifier in the beginning of it which allows system to figure out whether or not the language pack for this module has been loaded (such as define(MODULE_MEDIA_LANG_LOADED, true)). Eventually every module winds down to handful of API functions and what you have to do is to check if the the identifier has been defined and if not then load it.
Posted: Thu Nov 30, 2006 5:33 pm
by garvinhicking
Hi!
What I didn't think of earlier was one more performance problem.
The more include files we have, the higher the inclusion cost. The s9y frameworks already loads a lot of plugin lang files, plugins, template files, smarty framework etc., so we already have a huge include overhead. Adding to that 5-10 more language include files might have a notable impact.
The other problem is that it will increase the number of files: Say 7 language parts for 10 languages for 2 charsets (native + utf-8) would be 140 files. That would also have a notable effect on the time an FTP upload takes to upload a s9y release file...
Sigh.
as define(MODULE_MEDIA_LANG_LOADED, true)). Eventually every module winds down to handful of API functions and what you have to do is to check if the the identifier has been defined and if not then load it.
That would mean one "if" check per constant usage, which would also hugely impact performance.
Plus, currently bytecode caches work somewhat well with serendipity; but conditional includes make bytecode caches unable to work.
Best regards,
Garvin