WIP - share/{monet,msg,numeric,time}def

View: New views
4 Messages — Rating Filter:   Alert me  

WIP - share/{monet,msg,numeric,time}def

by Edwin Groothuis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In the last couple of months I've spend some time with the data in
the share/{monet,msg,numeric,time}def directories and the data from
the CLDR (Common Locale Data Repository) project.

The biggest issues with the way the current data in the *def directories
is maintained is that it is partly high-ascii (specially for the
non-US-ASCII and non-ISO8859-{1,2,15} character maps) and partly
un-synchronized between the different character maps for the same
locale.

The first approach was to see if I could transform the data from
the CLDR project into the format the FreeBSD project wanted to have
it.  It taught me a lot about the data stored in the CLDR project,
but also that it isn't compatible enough to do it automatic.

The second approach, still happening now, is going much better:
Instead of storing the high-ascii and multiple charactermap
translations in the SCM, we have per locale one file with a proper
definition of the words and syntax used, which gets converted into
UTF-8 and which then gets transformed to the required charactermaps.


For example, the file share/msgdef/nl_NL.unicode:

    # yesexpr
    ^[<LATIN SMALL LETTER J><LATIN CAPITAL LETTER J><LATIN SMALL LETTER Y><LATIN CAPITAL LETTER Y>].*
    # noexpr
    ^[<LATIN SMALL LETTER N><LATIN CAPITAL LETTER N>].*
    # EOF

gets converted into nl_NL.UTF-8:

    # yesexpr
    ^[jJyY].*
    # noexpr
    ^[nN].*
    # EOF

and gets transformed into its ISO8859-1 and ISO8859-15 equivalents.
Since this is low-ascii it is a boring example, but the idea is
there.


What are currently show-stoppers?

- The conversion between .unicode and .UTF-8 is done via a Perl
  script and the CLDR database, which means that it won't be in the
  base system for now. So we need both the .unicode source files
  and the .UTF-8 files in the SCM system.

- There is no iconv in the base operating system yet. Gabor@ is in
  the process of porting citrus-iconv from NetBSD, but it isn't
  available yet. So we also need the converted charactermaps in the
  SCM for now. I have access to his iconv and will feed the current
  issues back to him.


These two show-stoppers right now cause that we will get a lot more
data in the SCM system than what we have right now until they are
resolved. The first one should not be difficult, the second one is
with somebody who understands it :-)


So the advantages, when everything is ready:

- Human readable source files with Unicode style encoding.
- All locales with the different character maps are generated from
  one source and thus up-to-date with each other.


Once this part is working properly (and to others people satisfaction)
we can update the contents with information from third party sources
like the CLDR. But that is still a long time away for now.


Edwin
svn://svn.freebsd.org/base/user/edwin/locale

--
Edwin Groothuis Website: http://www.mavetju.org/
edwin@... Weblog:  http://www.mavetju.org/weblog/
_______________________________________________
freebsd-i18n@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "freebsd-i18n-unsubscribe@..."

Re: WIP - share/{monet,msg,numeric,time}def

by Edwin Groothuis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Once this part is working properly (and to others people satisfaction)

Tumbleweed all over the place!

Right now the output of this WIP matches the FreeBSD datafiles.

So if somebody smart wants to give his 2cents on this, please do now.
And not after I merged the data into head please :-)

Is there somebody here who understands the hi_IN language (.ISCII-DEV
charactermap) and the ru_RU language (CP866 charactermap) for some
questions?

Edwin

--
Edwin Groothuis Website: http://www.mavetju.org/
edwin@... Weblog:  http://www.mavetju.org/weblog/
_______________________________________________
freebsd-i18n@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "freebsd-i18n-unsubscribe@..."

Re: WIP - share/{monet,msg,numeric,time}def

by Victor Snezhko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edwin Groothuis <edwin@...> writes:

> Is there somebody here who understands the hi_IN language (.ISCII-DEV
> charactermap) and the ru_RU language (CP866 charactermap) for some
> questions?

Yes for ru_RU. Not quite familiar with locale infrastructure though.

--
Victor Snezhko



_______________________________________________
freebsd-i18n@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "freebsd-i18n-unsubscribe@..."

Re: WIP - share/{monet,msg,numeric,time}def

by Andrey Chernov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Sep 17, 2009 at 07:49:56AM +1000, Edwin Groothuis wrote:
> > Once this part is working properly (and to others people satisfaction)
> So if somebody smart wants to give his 2cents on this, please do now.
> And not after I merged the data into head please :-)

Please don't forget about recently added lt_LT.

> Is there somebody here who understands the hi_IN language (.ISCII-DEV
> charactermap) and the ru_RU language (CP866 charactermap) for some
> questions?

ru_RU.CP866 yes.

--
http://ache.pp.ru/
_______________________________________________
freebsd-i18n@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-i18n
To unsubscribe, send any mail to "freebsd-i18n-unsubscribe@..."