Rimas Kudelis wrote:
> Actually, I think you're wrong about this particular case.
> Check http://www.unicode.org/charts/PDF/U3000.pdf, which depicts a few
> CJK punctuation symbols, like IDEOGRAPHIC COMMA and IDEOGRAPHIC FULL
> STOP, for example.
I'll tell you the little dirty secret of unicode :-)
Unicode is not perfect, sometimes the rules were not applied in a really
coherent manner, and a few of the characters encoded in unicode are
definitively errors. And punctuations, as well as spaces, are probably
the two most inconsistent areas.
> I tend to think, that if there actually exists a tradition to use an
> ellipsis character in Japanese, then perhaps there should be something
> like an IDEOGRAPHIC ELLIPSIS character in Unicode (similarly to the
> above cases).
In the ELLIPSIS case, unicode correctly applies it's rule of character
It's for the characters you cite that it doesn't.
Note that there probably is a very good reason for most of those
In addition to unification, unicode also as a rule of supporting
round-tripping of pre-unicode encodings.
I think each of those you cite already existed both in the JIS tables
and in ASCII. So to support round-trip of ASCII + JIS text, they had to
have a separate code point in unicode.
Ellipsis was already in JIS, but not in any of the basic western
encoding, so no compatibility need for separate encoding.
> Remember, that our actual problem is that the user runs an English
> version of Firefox (or any other Latin or even Cyrillic script-based
> language version anyway) on Japanese version of Windows. I can't say for
> sure, and I guess we should consult someone with a good knowledge in
> Japanese here, but perhaps the problem we're dealing with now is
> actually nothing but a bug in MS UI Gothic?
No, it's not. MS UI Gothic is displaying U2026 - ELLIPSIS with the
prefered glyph to use in association with japanese text, knowing that
U2026 is defined as the unicode code point corresponding to JISX0208
1-36 HORIZONTAL ELLIPSIS and the official glyph for JIS 1-36 is with the
middle dots. Which means that it's not the prefered glyph to use in
association with latin text.
The irony is that there is a unicode character with three middle dot,
U22EF, but it a mathematical symbol, not an ellipsis, therefore it's not
allowed to convert 'JISX0208 1-36 HORIZONTAL ELLIPSIS' to it.
Some old version of MacOS did it though: