gettext best practices?

View: New views
6 Messages — Rating Filter:   Alert me  

gettext best practices?

by Michael Jervis-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

When using gettext, is there any advantage to breaking your
translation down into modules?

I was working on the basis of having a core.en.mo file for my "base"
system and then adding translations for each given controller (e.g.
frontpage.controller.class.php would add frontpage.en.mo to the
translation object, I'm not using Zend_Controller by the way).

However, the recommended translation management tool seems to insist
on scanning entire folders of source code to generate translations, so
I end up with a single catalogue containing everything in my project.

Is there any advantage to breaking it up? Or is that a disadvantage?
Does it load and parse the entire mo file? Or just use it as a
database to look up translations?

Cheers,

--
Michael Jervis
mjervis@...

Re: gettext best practices?

by Marko Korhonen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm also interested in hearing some professional opinions on the matter.

On Drupal community CMS the whole translation seems to be in one package.
The .mo file content is stored inside database in one field. At least if I have understood it correctly.
And I don't know if Drupal makes some kind of caching with gettext data.

Re: gettext best practices?

by Michael Jervis-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 19/06/07, Michael Jervis <mjervis@...> wrote:
> When using gettext, is there any advantage to breaking your
> translation down into modules?


I've done a bit of investigation in light of the lack of other information.

It would appear that the gettext translation module parses the entire
.mo file into itself and uses that information to perform the
translations.

It's actually compiled into a PHP Array so.

1) The best practice would be to break your translation into a large
number of modules (if performance is your main non-functional
requirement but not if it's getting easy translations done!)
2) gettext is a more expensive version of using the arrays backend.

So I'm going to be changing my approach and scrapping the gettext api
in favour of PHP arrays and writing a tool for translators to use to
ensure I get valid PHP arrays back from them.

Details of my investigation (which was simple) can be found here:
http://www.inanger.com/languages/php/internationalisation/

Cheers,

Mike

Re: gettext best practices?

by thomasW :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hy Michael,

> When using gettext, is there any advantage to breaking your
> translation down into modules?

The advantage or disadvantage would be that you have to load every
module-source on request.
An possible advantage would be to have the source splitted but this means
also more time for managing the source.

From speed there is no advantage. The gettext adapter is the fastest one
from the supported sources.
If we think of a normal webapplication with about 500-1000 sentences there
would be no big advantage in splitting the source.
The difference is in some milliseconds. When your web application has more
than about 5000 sentences I would strictly advice for splitting the sources.

> I was working on the basis of having a core.en.mo file for my "base"
> system and then adding translations for each given controller (e.g.
> frontpage.controller.class.php would add frontpage.en.mo to the
> translation object, I'm not using Zend_Controller by the way).

Splitting the source into single files instead of modules adds more overhead
which also speeds down the whole process.

> However, the recommended translation management tool seems to insist
> on scanning entire folders of source code to generate translations, so
> I end up with a single catalogue containing everything in my project.

We do not recommend any translation management tool. When you'r speaking of
poEdit which is an  example for a freeware gettext tool, you are able to
scan folders. And modules should be splitted into folders. See the
Zend_Controller part of the manual.

> Is there any advantage to breaking it up? Or is that a disadvantage?
> Does it load and parse the entire mo file? Or just use it as a
> database to look up translations?

The more files, the more overhead. In my opinion no advantage.

It always loads the complete mo file, because the overhead of parsing the mo
until it finds the requested string would be much bigger than just having it
already parsed. Reading each translation from the mo file directly would
decrease the speed because you can not jump to a defined position within a
mo file, wou have to read it from beginning to the requested string.

For now there is no support for backends. And it makes only sense for
applications which are really big. The clue is, that reading the mo file is
as fast as reading it from the cache. Because there is no "parsing" as we
have it for xml files. This is also the reasong why gettext is recommended,
because it is actually the fastest adapter.

Greetings
Thomas (Zend_Translation author)
I18N Team Leader


Re: gettext best practices?

by thomasW :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hy,

> On Drupal community CMS the whole translation seems to be in one package.
> The .mo file content is stored inside database in one field. At least if I
> have understood it correctly.

One package is from the translators view and from the programmers view
always the best and fastest possibility.
From the view of speed there is no advantage.

Also to mention, if you have strings which are equal in other "modules" you
would have them multiple times translated.
Think of a "Ok" and "Cancel" Button... these 2 strings you must translate
for all modules where you have such buttons. The bigger your application is
the more duplicated string you will have.

> And I don't know if Drupal makes some kind of caching with gettext data.

As I know from drupal, they do not do any caching... reading the cached
file/cache is as fast as reading the original gettext file.

Greetings
Thomas (Zend_Translate author)
I18N Team Leader


Re: Re: gettext best practices?

by thomasW :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hy Michael,

> It would appear that the gettext translation module parses the entire
> .mo file into itself and uses that information to perform the
> translations.

That's right

> It's actually compiled into a PHP Array so.

Compiled is a little bit oversized word for this but ok.

> 1) The best practice would be to break your translation into a large
> number of modules (if performance is your main non-functional
> requirement but not if it's getting easy translations done!)

This increases overhead for Zend_Translate and for maintenance.
This also increases the workflow for translators.

For a normal application is performance with Zend_Translate not your problem
if your application is slow...
If you have about 5000 or more translations I recommend to split the
translation source.

But this is not because Zend_Translate is then slow, this is because of
restrictments of PHP itself.

> 2) gettext is a more expensive version of using the arrays backend.

No... it's a less expensive version. What takes time is reading the original
source. Your processor is always faster than your harddrive.
It is better to do some computations than reading a bigger file. And mo
files are much smaller than the same sized array files.

> So I'm going to be changing my approach and scrapping the gettext api
> in favour of PHP arrays and writing a tool for translators to use to
> ensure I get valid PHP arrays back from them.

If you are writing such a tool and make it avaiable for free, I think also
others would have need of this.
The array source is best when you are playing around and want to self edit
the source.

Greetings
Thomas (Zend_Translate's author)
I18N Team Leader