« Return to Thread: Rendering of ellipsis for different scripts

Re: Rendering of ellipsis for different scripts

by Jean-Marc Desperrier :: Rate this Message:

| View in Thread

Rimas Kudelis wrote:
> [...]
> Actually, I think you're wrong about this particular case.
>
> Check http://www.unicode.org/charts/PDF/U3000.pdf, which depicts a few
> CJK punctuation symbols, like IDEOGRAPHIC COMMA and IDEOGRAPHIC FULL
> STOP, for example.

I'll tell you the little dirty secret of unicode :-)

Unicode is not perfect, sometimes the rules were not applied in a really
coherent manner, and a few of the characters encoded in unicode are
definitively errors. And punctuations, as well as spaces, are probably
the two most inconsistent areas.

> I tend to think, that if there actually exists a tradition to use an
> ellipsis character in Japanese, then perhaps there should be something
> like an IDEOGRAPHIC ELLIPSIS character in Unicode (similarly to the
> above cases).

In the ELLIPSIS case, unicode correctly applies it's rule of character
unification.

It's for the characters you cite that it doesn't.
Note that there probably is a very good reason for most of those
characters.
In addition to unification, unicode also as a rule of supporting
round-tripping of pre-unicode encodings.

I think each of those you cite already existed both in the JIS tables
and in ASCII. So to support round-trip of ASCII + JIS text, they had to
have a separate code point in unicode.

Ellipsis was already in JIS, but not in any of the basic western
encoding, so no compatibility need for separate encoding.

> Remember, that our actual problem is that the user runs an English
> version of Firefox (or any other Latin or even Cyrillic script-based
> language version anyway) on Japanese version of Windows. I can't say for
> sure, and I guess we should consult someone with a good knowledge in
> Japanese here, but perhaps the problem we're dealing with now is
> actually nothing but a bug in MS UI Gothic?

No, it's not. MS UI Gothic is displaying U2026 - ELLIPSIS with the
prefered glyph to use in association with japanese text, knowing that
U2026 is defined as the unicode code point corresponding to JISX0208
1-36 HORIZONTAL ELLIPSIS and the official glyph for JIS 1-36 is with the
middle dots. Which means that it's not the prefered glyph to use in
association with latin text.

The irony is that there is a unicode character with three middle dot,
U22EF, but it a mathematical symbol, not an ellipsis, therefore it's not
allowed to convert 'JISX0208 1-36 HORIZONTAL ELLIPSIS' to it.
Some old version of MacOS did it though:
http://hp.vector.co.jp/authors/VA010341/unicode/

_______________________________________________
dev-i18n mailing list
dev-i18n@...
https://lists.mozilla.org/listinfo/dev-i18n

 « Return to Thread: Rendering of ellipsis for different scripts