[PATCH] Wrong encoding for TT_MS_ID_UCS_4

View: New views
6 Messages — Rating Filter:   Alert me  

[PATCH] Wrong encoding for TT_MS_ID_UCS_4

by Yuriy Kaminskiy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello!
In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE
encoding, not UCS4 (as can be implied from name); see also
freetype-2.3.5/src/sfnt/sfobjs.c. I've noticed this problem with second
(MS PGothic) and third faces (MS UI Gothic) of msgothic.ttc font
(version 5.00) - japanese family name and style name garbled and
familylang wrong.
Attached patch should work with fontconfig versions from 2.3.95 to
2.6.99; tested on 2.6.0 and 2.4.2.

--- fontconfig-2.5.0/src/fcfreetype.c.orig 2007-10-26 00:49:10.000000000 +0400
+++ fontconfig-2.5.0/src/fcfreetype.c 2008-04-10 04:40:02.000000000 +0400
@@ -123,7 +123,7 @@
  {  TT_PLATFORM_MICROSOFT, TT_MS_ID_BIG_5, "BIG-5" },
  {  TT_PLATFORM_MICROSOFT, TT_MS_ID_WANSUNG, "Wansung" },
  {  TT_PLATFORM_MICROSOFT, TT_MS_ID_JOHAB, "Johab" },
- {  TT_PLATFORM_MICROSOFT, TT_MS_ID_UCS_4, "UCS4" },
+ {  TT_PLATFORM_MICROSOFT, TT_MS_ID_UCS_4, "UTF-16BE" },
  {  TT_PLATFORM_ISO, TT_ISO_ID_7BIT_ASCII, "ASCII" },
  {  TT_PLATFORM_ISO, TT_ISO_ID_10646, "UCS-2BE" },
  {  TT_PLATFORM_ISO, TT_ISO_ID_8859_1, "ISO-8859-1" },

_______________________________________________
Fontconfig mailing list
Fontconfig@...
http://lists.freedesktop.org/mailman/listinfo/fontconfig

Re: [PATCH] Wrong encoding for TT_MS_ID_UCS_4

by Yuriy Kaminskiy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 24.06.2009 19:42, Yuriy Kaminskiy wrote:
> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE
> encoding, not UCS4 (as can be implied from name);
ping.

_______________________________________________
Fontconfig mailing list
Fontconfig@...
http://lists.freedesktop.org/mailman/listinfo/fontconfig

Re: [PATCH] Wrong encoding for TT_MS_ID_UCS_4

by Behdad Esfahbod-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote:
> On 24.06.2009 19:42, Yuriy Kaminskiy wrote:
>> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE
>> encoding, not UCS4 (as can be implied from name);
> ping.

Are you sure?  This is what I see in the code:

static const FcFtEncoding   fcFtEncoding[] = {
  {  TT_PLATFORM_APPLE_UNICODE,↦ TT_ENCODING_DONT_CARE,↦ "UCS-2BE" },
  {  TT_PLATFORM_MACINTOSH,↦     TT_MAC_ID_ROMAN,↦       "MACINTOSH" },
  {  TT_PLATFORM_MACINTOSH,↦     TT_MAC_ID_JAPANESE,↦    "SJIS" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_UNICODE_CS,↦   "UTF-16BE" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_SJIS,↦ ↦       "SJIS-WIN" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_GB2312,↦       "GB2312" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_BIG_5,↦↦       "BIG-5" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_WANSUNG,↦      "Wansung" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_JOHAB,↦↦       "Johab" },
  {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_UCS_4,↦↦       "UCS4" },
  {  TT_PLATFORM_ISO,↦   ↦       TT_ISO_ID_7BIT_ASCII,↦  "ASCII" },
  {  TT_PLATFORM_ISO,↦   ↦       TT_ISO_ID_10646,↦       "UCS-2BE" },
  {  TT_PLATFORM_ISO,↦   ↦       TT_ISO_ID_8859_1,↦      "ISO-8859-1" },
};


Been there since 2004.

behdad
_______________________________________________
Fontconfig mailing list
Fontconfig@...
http://lists.freedesktop.org/mailman/listinfo/fontconfig

Re: [PATCH] Wrong encoding for TT_MS_ID_UCS_4

by Yuriy Kaminskiy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 23.07.2009 01:10, Behdad Esfahbod wrote:
> On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote:
>> On 24.06.2009 19:42, Yuriy Kaminskiy wrote:
>>> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE
>>> encoding, not UCS4 (as can be implied from name);
>> ping.
> Are you sure?  This is what I see in the code:
[shurg] I did not checked any standards on this, but that's what I have in
practice (i.e. on real font; before my change it's garbled, after - all ok); and
what I see in freetype2 code. See original post for details.
<http://permalink.gmane.org/gmane.comp.fonts.fontconfig/3193>
>   {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_UCS_4,↦↦       "UCS4" },
> Been there since 2004.
Yep. As I said in original post, `patch applies to fontconfig from 2.3.95 to
2.6.99' (did not checked earlier/later versions).
That's just quite rarely used, and counter-intuitive, so no-one noticed.

_______________________________________________
Fontconfig mailing list
Fontconfig@...
http://lists.freedesktop.org/mailman/listinfo/fontconfig

Re: [PATCH] Wrong encoding for TT_MS_ID_UCS_4

by Yuriy Kaminskiy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 23.07.2009 03:23, Yuriy Kaminskiy wrote:
> On 23.07.2009 01:10, Behdad Esfahbod wrote:
>> On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote:
>>> On 24.06.2009 19:42, Yuriy Kaminskiy wrote:
>> Are you sure?  This is what I see in the code:
> [shurg] I did not checked any standards on this, but that's what I have in
> practice (i.e. on real font; before my change it's garbled, after - all ok); and
> what I see in freetype2 code.
=== cut freetype-2.3.9/sfnt/sfobjs.c:239 ====
      case TT_MS_ID_UCS_4:
        /* Apparently, if this value is found in a name table entry, it is */
        /* documented as `full Unicode repertoire'.  Experience with the   */
        /* MsGothic font shipped with Windows Vista shows that this really */
        /* means UTF-16 encoded names (UCS-4 values are only used within   */
        /* charmaps).                                                      */
        convert = tt_name_entry_ascii_from_utf16;
=== cut ===

_______________________________________________
Fontconfig mailing list
Fontconfig@...
http://lists.freedesktop.org/mailman/listinfo/fontconfig

Re: [PATCH] Wrong encoding for TT_MS_ID_UCS_4

by Behdad Esfahbod-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 07/22/2009 07:23 PM, Yuriy Kaminskiy wrote:

> On 23.07.2009 01:10, Behdad Esfahbod wrote:
>> On 07/22/2009 05:06 PM, Yuriy Kaminskiy wrote:
>>> On 24.06.2009 19:42, Yuriy Kaminskiy wrote:
>>>> In ttf namelists TT_PLATFORM_MICROSOFT/TT_MS_ID_UCS_4 uses UTF-16BE
>>>> encoding, not UCS4 (as can be implied from name);
>>> ping.
>> Are you sure?  This is what I see in the code:
> [shurg] I did not checked any standards on this, but that's what I have in
> practice (i.e. on real font; before my change it's garbled, after - all ok); and
> what I see in freetype2 code. See original post for details.
> <http://permalink.gmane.org/gmane.comp.fonts.fontconfig/3193>
>>    {  TT_PLATFORM_MICROSOFT,↦     TT_MS_ID_UCS_4,↦↦       "UCS4" },
>> Been there since 2004.
> Yep. As I said in original post, `patch applies to fontconfig from 2.3.95 to
> 2.6.99' (did not checked earlier/later versions).
> That's just quite rarely used, and counter-intuitive, so no-one noticed.

Ah, ok, I thought you mean the other way around.  Fixed.

behdad
_______________________________________________
Fontconfig mailing list
Fontconfig@...
http://lists.freedesktop.org/mailman/listinfo/fontconfig