What is the difference between two Big5-HKSCS conversion table?

View: New views
9 Messages — Rating Filter:   Alert me  

What is the difference between two Big5-HKSCS conversion table?

by yandong.yao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi guys,

>From icu/source/data/mappings/convrtrs.txt, I found below two conversion
tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
from
http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp.
Could you help to clarify that what is the difference between them and
which one should be used at what context?

Thank you very much!

---------------------------------------------
ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
ibm-5471 { IBM* }
Big5-HKSCS { IANA* JAVA* }
big5hk { JAVA }
HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
Big-5 w/ HKSCS extensions
ibm-1375 { IBM* }
Big5-HKSCS
MS950_HKSCS { JAVA* }
hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
big5-hkscs:unicode3.0
# windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
default it's not.
# windows-950_hkscs
------------------------------------------------

Regards,
Yandong


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by George Rhoten :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You're looking at behavior that is not available in any release of ICU
yet.  You're looking at the future ICU 3.8 behavior.

Unfortunately, that CDRA page hasn't been updated with the latest
information.  CCSID 5417 is Big5-HKSCS.  CCSID 1375 is Big5-HKSCS with
Microsoft extensions.  Each have two alternate mapping tables that map the
codepoints to Unicode 3.0 and Unicode 3.1.  So there are at least 4
mapping tables for these two CCSIDs.  There's actually a total of 6 tables
for the 2 CCSIDs, but the other 2 aren't relevant to this discussion.

The last time I checked, Windows has a patch that modifies windows-950 to
support the HKSCS characters, but it's for Unicode 3.0.  This means that
many characters are mapped to the private use area of Unicode.  So this
behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior.  This is
also similar to some implementations on Solaris and HP-UX.

CCSID 5417 tries to match the Big5-HKSCS specification without so many
extensions.  I've also picked the variant table with the Unicode 3.1
mappings, since the Unicode 3.0 mapping table usually isn't used without
the Microsoft extensions.  The official description can be found at <
http://www.info.gov.hk/digital21/eng/hkscs/ >.  It's very similar to Mac
OS X's implementation.  It can also be considered a "proper"
implementation because it's using the Unicode supplementary characters.

The glibc implementation of Big5-HKSCS is significantly different from
other implementations.  It's Big5-HKSCS with a lot of Unicode 3.1
mappings, but it's incomplete.  It doesn't map some characters that are
mapped in other Big5-HKSCS implementations.  It also maps some characters
to different Unicode private use codepoints.  It's closer to CCSID 5417
with Unicode 3.1 mappings.

George Rhoten
IBM Globalization Center of Competency/ICU  San José, CA, USA
http://www.icu-project.org/



Yandong Yao <Yandong.Yao@...>
Sent by: icu-support-bounces@...
05/14/2007 08:20 PM
Please respond to
Yandong.Yao@...; Please respond to
ICU support mailing list <icu-support@...>


To
icu-support@...
cc

Subject
[icu-support] What is the difference between two Big5-HKSCS     conversion
table?






Hi guys,

>From icu/source/data/mappings/convrtrs.txt, I found below two conversion
tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
from
http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp.
Could you help to clarify that what is the difference between them and
which one should be used at what context?

Thank you very much!

---------------------------------------------
ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
ibm-5471 { IBM* }
Big5-HKSCS { IANA* JAVA* }
big5hk { JAVA }
HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
Big-5 w/ HKSCS extensions
ibm-1375 { IBM* }
Big5-HKSCS
MS950_HKSCS { JAVA* }
hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
big5-hkscs:unicode3.0
# windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
default it's not.
# windows-950_hkscs
------------------------------------------------

Regards,
Yandong



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by yandong.yao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi George,

George Rhoten 写道:
> You're looking at behavior that is not available in any release of ICU
> yet.  You're looking at the future ICU 3.8 behavior.
>
> Unfortunately, that CDRA page hasn't been updated with the latest
> information.  CCSID 5417 is Big5-HKSCS.  
What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
 From
http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt,
0xC87A was added
into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm.

> CCSID 1375 is Big5-HKSCS with
> Microsoft extensions.  Each have two alternate mapping tables that map the
> codepoints to Unicode 3.0 and Unicode 3.1.
How to get two mapping table from one file?

>   So there are at least 4
> mapping tables for these two CCSIDs.  There's actually a total of 6 tables
> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion.
>
> The last time I checked, Windows has a patch that modifies windows-950 to
> support the HKSCS characters, but it's for Unicode 3.0.  This means that
> many characters are mapped to the private use area of Unicode.  So this
> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior.  This is
> also similar to some implementations on Solaris and HP-UX.
>  
Does this mean that to keep compatibility with Windows, CCSID1375 with
the Unicode 3.0 behavor
should be used?

Thank you very much!

Regards,
Yandong

> CCSID 5417 tries to match the Big5-HKSCS specification without so many
> extensions.  I've also picked the variant table with the Unicode 3.1
> mappings, since the Unicode 3.0 mapping table usually isn't used without
> the Microsoft extensions.  The official description can be found at <
> http://www.info.gov.hk/digital21/eng/hkscs/ >.  It's very similar to Mac
> OS X's implementation.  It can also be considered a "proper"
> implementation because it's using the Unicode supplementary characters.
>
> The glibc implementation of Big5-HKSCS is significantly different from
> other implementations.  It's Big5-HKSCS with a lot of Unicode 3.1
> mappings, but it's incomplete.  It doesn't map some characters that are
> mapped in other Big5-HKSCS implementations.  It also maps some characters
> to different Unicode private use codepoints.  It's closer to CCSID 5417
> with Unicode 3.1 mappings.
>
> George Rhoten
> IBM Globalization Center of Competency/ICU  San José, CA, USA
> http://www.icu-project.org/
>
>
>
> Yandong Yao <Yandong.Yao@...>
> Sent by: icu-support-bounces@...
> 05/14/2007 08:20 PM
> Please respond to
> Yandong.Yao@...; Please respond to
> ICU support mailing list <icu-support@...>
>
>
> To
> icu-support@...
> cc
>
> Subject
> [icu-support] What is the difference between two Big5-HKSCS     conversion
> table?
>
>
>
>
>
>
> Hi guys,
>
> >From icu/source/data/mappings/convrtrs.txt, I found below two conversion
> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
> from
> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp.
> Could you help to clarify that what is the difference between them and
> which one should be used at what context?
>
> Thank you very much!
>
> ---------------------------------------------
> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
> ibm-5471 { IBM* }
> Big5-HKSCS { IANA* JAVA* }
> big5hk { JAVA }
> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
> Big-5 w/ HKSCS extensions
> ibm-1375 { IBM* }
> Big5-HKSCS
> MS950_HKSCS { JAVA* }
> hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
> big5-hkscs:unicode3.0
> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
> default it's not.
> # windows-950_hkscs
> ------------------------------------------------
>
> Regards,
> Yandong
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>  


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by Tetsuji Orita :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?

   CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375
   contains HKSCS-2004 character set.

Does this mean that to keep compatibility with Windows, CCSID1375 with the
Unicode 3.0 behavior should be used?

   Yes, I think so.

Best regards,
Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB),
Globalization Center of Competency  - Yamato, IBM Japan
T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497
e-Mail: orita@...



                                                                           
             Yandong Yao                                                  
             <Yandong.Yao@Sun.                                            
             COM>                                                       To
             Sent by:                  ICU support mailing list            
             icu-support-bounc         <icu-support@...>
             es@...                                          cc
             rge.net                                                      
                                                                   Subject
                                       Re: [icu-support] What is the      
             2007/05/15 16:23          difference between two  Big5-HKSCS  
                                       conversion table?                  
                                                                           
             Please respond to                                            
             Yandong.Yao@...                                            
                OM; Please                                                
                respond to                                                
                ICU support                                                
               mailing list                                                
             <icu-support@list                                            
             s.sourceforge.net                                            
                     >                                                    
                                                                           
                                                                           




Hi George,

George Rhoten 写道:
> You're looking at behavior that is not available in any release of ICU
> yet.  You're looking at the future ICU 3.8 behavior.
>
> Unfortunately, that CDRA page hasn't been updated with the latest
> information.  CCSID 5417 is Big5-HKSCS.
What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
 From
http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt
,
0xC87A was added
into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm.

> CCSID 1375 is Big5-HKSCS with
> Microsoft extensions.  Each have two alternate mapping tables that map
the
> codepoints to Unicode 3.0 and Unicode 3.1.
How to get two mapping table from one file?
>   So there are at least 4
> mapping tables for these two CCSIDs.  There's actually a total of 6
tables
> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion.
>
> The last time I checked, Windows has a patch that modifies windows-950 to

> support the HKSCS characters, but it's for Unicode 3.0.  This means that
> many characters are mapped to the private use area of Unicode.  So this
> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior.  This is
> also similar to some implementations on Solaris and HP-UX.
>
Does this mean that to keep compatibility with Windows, CCSID1375 with
the Unicode 3.0 behavor
should be used?

Thank you very much!

Regards,
Yandong

> CCSID 5417 tries to match the Big5-HKSCS specification without so many
> extensions.  I've also picked the variant table with the Unicode 3.1
> mappings, since the Unicode 3.0 mapping table usually isn't used without
> the Microsoft extensions.  The official description can be found at <
> http://www.info.gov.hk/digital21/eng/hkscs/ >.  It's very similar to Mac
> OS X's implementation.  It can also be considered a "proper"
> implementation because it's using the Unicode supplementary characters.
>
> The glibc implementation of Big5-HKSCS is significantly different from
> other implementations.  It's Big5-HKSCS with a lot of Unicode 3.1
> mappings, but it's incomplete.  It doesn't map some characters that are
> mapped in other Big5-HKSCS implementations.  It also maps some characters

> to different Unicode private use codepoints.  It's closer to CCSID 5417
> with Unicode 3.1 mappings.
>
> George Rhoten
> IBM Globalization Center of Competency/ICU  San Jos?, CA, USA
> http://www.icu-project.org/
>
>
>
> Yandong Yao <Yandong.Yao@...>
> Sent by: icu-support-bounces@...
> 05/14/2007 08:20 PM
> Please respond to
> Yandong.Yao@...; Please respond to
> ICU support mailing list <icu-support@...>
>
>
> To
> icu-support@...
> cc
>
> Subject
> [icu-support] What is the difference between two Big5-HKSCS
conversion

> table?
>
>
>
>
>
>
> Hi guys,
>
> >From icu/source/data/mappings/convrtrs.txt, I found below two conversion
> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
> from
> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp.
> Could you help to clarify that what is the difference between them and
> which one should be used at what context?
>
> Thank you very much!
>
> ---------------------------------------------
> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
> ibm-5471 { IBM* }
> Big5-HKSCS { IANA* JAVA* }
> big5hk { JAVA }
> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
> Big-5 w/ HKSCS extensions
> ibm-1375 { IBM* }
> Big5-HKSCS
> MS950_HKSCS { JAVA* }
> hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
> big5-hkscs:unicode3.0
> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
> default it's not.
> # windows-950_hkscs
> ------------------------------------------------
>
> Regards,
> Yandong
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by yandong.yao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tetsuji,

Tetsuji Orita 写道:
> Hello,
>
> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>
>    CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375
>    contains HKSCS-2004 character set.
>  
then 0xC87A which is in HKSCS-1999 and HKSCS-2001 is not in
ibm-5471_P100-2007.ucm.
is this a bug?
> Does this mean that to keep compatibility with Windows, CCSID1375 with the
> Unicode 3.0 behavior should be used?
>
>    Yes, I think so.
>  
thanks.

Regards,
Yandong

> Best regards,
> Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB),
> Globalization Center of Competency  - Yamato, IBM Japan
> T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497
> e-Mail: orita@...
>
>
>
>                                                                            
>              Yandong Yao                                                  
>              <Yandong.Yao@Sun.                                            
>              COM>                                                       To
>              Sent by:                  ICU support mailing list            
>              icu-support-bounc         <icu-support@...>
>              es@...                                          cc
>              rge.net                                                      
>                                                                    Subject
>                                        Re: [icu-support] What is the      
>              2007/05/15 16:23          difference between two  Big5-HKSCS  
>                                        conversion table?                  
>                                                                            
>              Please respond to                                            
>              Yandong.Yao@...                                            
>                 OM; Please                                                
>                 respond to                                                
>                 ICU support                                                
>                mailing list                                                
>              <icu-support@list                                            
>              s.sourceforge.net                                            
>                      >                                                    
>                                                                            
>                                                                            
>
>
>
>
> Hi George,
>
> George Rhoten 写道:
>  
>> You're looking at behavior that is not available in any release of ICU
>> yet.  You're looking at the future ICU 3.8 behavior.
>>
>> Unfortunately, that CDRA page hasn't been updated with the latest
>> information.  CCSID 5417 is Big5-HKSCS.
>>    
> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>  From
> http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt
> ,
> 0xC87A was added
> into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm.
>
>  
>> CCSID 1375 is Big5-HKSCS with
>> Microsoft extensions.  Each have two alternate mapping tables that map
>>    
> the
>  
>> codepoints to Unicode 3.0 and Unicode 3.1.
>>    
> How to get two mapping table from one file?
>  
>>   So there are at least 4
>> mapping tables for these two CCSIDs.  There's actually a total of 6
>>    
> tables
>  
>> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion.
>>
>> The last time I checked, Windows has a patch that modifies windows-950 to
>>    
>
>  
>> support the HKSCS characters, but it's for Unicode 3.0.  This means that
>> many characters are mapped to the private use area of Unicode.  So this
>> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior.  This is
>> also similar to some implementations on Solaris and HP-UX.
>>
>>    
> Does this mean that to keep compatibility with Windows, CCSID1375 with
> the Unicode 3.0 behavor
> should be used?
>
> Thank you very much!
>
> Regards,
> Yandong
>  
>> CCSID 5417 tries to match the Big5-HKSCS specification without so many
>> extensions.  I've also picked the variant table with the Unicode 3.1
>> mappings, since the Unicode 3.0 mapping table usually isn't used without
>> the Microsoft extensions.  The official description can be found at <
>> http://www.info.gov.hk/digital21/eng/hkscs/ >.  It's very similar to Mac
>> OS X's implementation.  It can also be considered a "proper"
>> implementation because it's using the Unicode supplementary characters.
>>
>> The glibc implementation of Big5-HKSCS is significantly different from
>> other implementations.  It's Big5-HKSCS with a lot of Unicode 3.1
>> mappings, but it's incomplete.  It doesn't map some characters that are
>> mapped in other Big5-HKSCS implementations.  It also maps some characters
>>    
>
>  
>> to different Unicode private use codepoints.  It's closer to CCSID 5417
>> with Unicode 3.1 mappings.
>>
>> George Rhoten
>> IBM Globalization Center of Competency/ICU  San Jos?, CA, USA
>> http://www.icu-project.org/
>>
>>
>>
>> Yandong Yao <Yandong.Yao@...>
>> Sent by: icu-support-bounces@...
>> 05/14/2007 08:20 PM
>> Please respond to
>> Yandong.Yao@...; Please respond to
>> ICU support mailing list <icu-support@...>
>>
>>
>> To
>> icu-support@...
>> cc
>>
>> Subject
>> [icu-support] What is the difference between two Big5-HKSCS
>>    
> conversion
>  
>> table?
>>
>>
>>
>>
>>
>>
>> Hi guys,
>>
>> >From icu/source/data/mappings/convrtrs.txt, I found below two conversion
>> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
>> from
>> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp.
>> Could you help to clarify that what is the difference between them and
>> which one should be used at what context?
>>
>> Thank you very much!
>>
>> ---------------------------------------------
>> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
>> ibm-5471 { IBM* }
>> Big5-HKSCS { IANA* JAVA* }
>> big5hk { JAVA }
>> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
>> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
>> Big-5 w/ HKSCS extensions
>> ibm-1375 { IBM* }
>> Big5-HKSCS
>> MS950_HKSCS { JAVA* }
>> hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
>> big5-hkscs:unicode3.0
>> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
>> default it's not.
>> # windows-950_hkscs
>> ------------------------------------------------
>>
>> Regards,
>> Yandong
>>
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> icu-support mailing list - icu-support@...
>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>>
>>    
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>  


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by Tetsuji Orita :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,
0xC87A should be in Unicode table for CCSID 5471. CCSID 5471 table that I
have here contains 0xC87A. I do not know why the table you are looking does
not contain it.

Best regards,
Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB),
Globalization Center of Competency  - Yamato, IBM Japan
T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497
e-Mail: orita@...



                                                                           
             Yandong Yao                                                  
             <Yandong.Yao@Sun.                                            
             COM>                                                       To
             Sent by:                  ICU support mailing list            
             icu-support-bounc         <icu-support@...>
             es@...                                          cc
             rge.net                                                      
                                                                   Subject
                                       Re: [icu-support] What is the      
             2007/05/15 17:47          difference between      two        
                                       Big5-HKSCS  conversion table?      
                                                                           
             Please respond to                                            
             Yandong.Yao@...                                            
                OM; Please                                                
                respond to                                                
                ICU support                                                
               mailing list                                                
             <icu-support@list                                            
             s.sourceforge.net                                            
                     >                                                    
                                                                           
                                                                           




Hi Tetsuji,

Tetsuji Orita 写道:
> Hello,
>
> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>
>    CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375
>    contains HKSCS-2004 character set.
>
then 0xC87A which is in HKSCS-1999 and HKSCS-2001 is not in
ibm-5471_P100-2007.ucm.
is this a bug?
> Does this mean that to keep compatibility with Windows, CCSID1375 with
the
> Unicode 3.0 behavior should be used?
>
>    Yes, I think so.
>
thanks.

Regards,
Yandong
> Best regards,
> Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB),
> Globalization Center of Competency  - Yamato, IBM Japan
> T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497
> e-Mail: orita@...
>
>
>
>

>              Yandong Yao

>              <Yandong.Yao@Sun.

>              COM>
To
>              Sent by:                  ICU support mailing list

>              icu-support-bounc
<icu-support@...>
>              es@...
cc
>              rge.net

>
Subject
>                                        Re: [icu-support] What is the

>              2007/05/15 16:23          difference between two  Big5-HKSCS

>                                        conversion table?

>

>              Please respond to

>              Yandong.Yao@...

>                 OM; Please

>                 respond to

>                 ICU support

>                mailing list

>              <icu-support@list

>              s.sourceforge.net

>                      >

>

>

>
>
>
>
> Hi George,
>
> George Rhoten 写道:
>
>> You're looking at behavior that is not available in any release of ICU
>> yet.  You're looking at the future ICU 3.8 behavior.
>>
>> Unfortunately, that CDRA page hasn't been updated with the latest
>> information.  CCSID 5417 is Big5-HKSCS.
>>
> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>  From
>
http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt

> ,
> 0xC87A was added
> into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm.
>
>
>> CCSID 1375 is Big5-HKSCS with
>> Microsoft extensions.  Each have two alternate mapping tables that map
>>
> the
>
>> codepoints to Unicode 3.0 and Unicode 3.1.
>>
> How to get two mapping table from one file?
>
>>   So there are at least 4
>> mapping tables for these two CCSIDs.  There's actually a total of 6
>>
> tables
>
>> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion.
>>
>> The last time I checked, Windows has a patch that modifies windows-950
to

>>
>
>
>> support the HKSCS characters, but it's for Unicode 3.0.  This means that
>> many characters are mapped to the private use area of Unicode.  So this
>> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior.  This is
>> also similar to some implementations on Solaris and HP-UX.
>>
>>
> Does this mean that to keep compatibility with Windows, CCSID1375 with
> the Unicode 3.0 behavor
> should be used?
>
> Thank you very much!
>
> Regards,
> Yandong
>
>> CCSID 5417 tries to match the Big5-HKSCS specification without so many
>> extensions.  I've also picked the variant table with the Unicode 3.1
>> mappings, since the Unicode 3.0 mapping table usually isn't used without
>> the Microsoft extensions.  The official description can be found at <
>> http://www.info.gov.hk/digital21/eng/hkscs/ >.  It's very similar to Mac
>> OS X's implementation.  It can also be considered a "proper"
>> implementation because it's using the Unicode supplementary characters.
>>
>> The glibc implementation of Big5-HKSCS is significantly different from
>> other implementations.  It's Big5-HKSCS with a lot of Unicode 3.1
>> mappings, but it's incomplete.  It doesn't map some characters that are
>> mapped in other Big5-HKSCS implementations.  It also maps some
characters

>>
>
>
>> to different Unicode private use codepoints.  It's closer to CCSID 5417
>> with Unicode 3.1 mappings.
>>
>> George Rhoten
>> IBM Globalization Center of Competency/ICU  San Jos?, CA, USA
>> http://www.icu-project.org/
>>
>>
>>
>> Yandong Yao <Yandong.Yao@...>
>> Sent by: icu-support-bounces@...
>> 05/14/2007 08:20 PM
>> Please respond to
>> Yandong.Yao@...; Please respond to
>> ICU support mailing list <icu-support@...>
>>
>>
>> To
>> icu-support@...
>> cc
>>
>> Subject
>> [icu-support] What is the difference between two Big5-HKSCS
>>
> conversion
>
>> table?
>>
>>
>>
>>
>>
>>
>> Hi guys,
>>
>> >From icu/source/data/mappings/convrtrs.txt, I found below two
conversion
>> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
>> from
>> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp
.

>> Could you help to clarify that what is the difference between them and
>> which one should be used at what context?
>>
>> Thank you very much!
>>
>> ---------------------------------------------
>> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
>> ibm-5471 { IBM* }
>> Big5-HKSCS { IANA* JAVA* }
>> big5hk { JAVA }
>> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
>> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
>> Big-5 w/ HKSCS extensions
>> ibm-1375 { IBM* }
>> Big5-HKSCS
>> MS950_HKSCS { JAVA* }
>> hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
>> big5-hkscs:unicode3.0
>> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
>> default it's not.
>> # windows-950_hkscs
>> ------------------------------------------------
>>
>> Regards,
>> Yandong
>>
>>
>>
>>
-------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> icu-support mailing list - icu-support@...
>> To Un/Subscribe:
https://lists.sourceforge.net/lists/listinfo/icu-support

>>
>>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by yandong.yao :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

sorry, i checked the wrong file, 5471 do contain this code point.

thanks.

Regards,
Yandong

Tetsuji Orita 写道:

> Hello,
> 0xC87A should be in Unicode table for CCSID 5471. CCSID 5471 table that I
> have here contains 0xC87A. I do not know why the table you are looking does
> not contain it.
>
> Best regards,
> Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB),
> Globalization Center of Competency  - Yamato, IBM Japan
> T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497
> e-Mail: orita@...
>
>
>
>                                                                            
>              Yandong Yao                                                  
>              <Yandong.Yao@Sun.                                            
>              COM>                                                       To
>              Sent by:                  ICU support mailing list            
>              icu-support-bounc         <icu-support@...>
>              es@...                                          cc
>              rge.net                                                      
>                                                                    Subject
>                                        Re: [icu-support] What is the      
>              2007/05/15 17:47          difference between      two        
>                                        Big5-HKSCS  conversion table?      
>                                                                            
>              Please respond to                                            
>              Yandong.Yao@...                                            
>                 OM; Please                                                
>                 respond to                                                
>                 ICU support                                                
>                mailing list                                                
>              <icu-support@list                                            
>              s.sourceforge.net                                            
>                      >                                                    
>                                                                            
>                                                                            
>
>
>
>
> Hi Tetsuji,
>
> Tetsuji Orita 写道:
>  
>> Hello,
>>
>> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>>
>>    CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375
>>    contains HKSCS-2004 character set.
>>
>>    
> then 0xC87A which is in HKSCS-1999 and HKSCS-2001 is not in
> ibm-5471_P100-2007.ucm.
> is this a bug?
>  
>> Does this mean that to keep compatibility with Windows, CCSID1375 with
>>    
> the
>  
>> Unicode 3.0 behavior should be used?
>>
>>    Yes, I think so.
>>
>>    
> thanks.
>
> Regards,
> Yandong
>  
>> Best regards,
>> Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB),
>> Globalization Center of Competency  - Yamato, IBM Japan
>> T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497
>> e-Mail: orita@...
>>
>>
>>
>>
>>    
>
>  
>>              Yandong Yao
>>    
>
>  
>>              <Yandong.Yao@Sun.
>>    
>
>  
>>              COM>
>>    
> To
>  
>>              Sent by:                  ICU support mailing list
>>    
>
>  
>>              icu-support-bounc
>>    
> <icu-support@...>
>  
>>              es@...
>>    
> cc
>  
>>              rge.net
>>    
>
>  
> Subject
>  
>>                                        Re: [icu-support] What is the
>>    
>
>  
>>              2007/05/15 16:23          difference between two  Big5-HKSCS
>>    
>
>  
>>                                        conversion table?
>>    
>
>  
>
>  
>>              Please respond to
>>    
>
>  
>>              Yandong.Yao@...
>>    
>
>  
>>                 OM; Please
>>    
>
>  
>>                 respond to
>>    
>
>  
>>                 ICU support
>>    
>
>  
>>                mailing list
>>    
>
>  
>>              <icu-support@list
>>    
>
>  
>>              s.sourceforge.net
>>    
>
>  
>>                      >
>>    
>
>  
>
>  
>
>  
>>
>>
>> Hi George,
>>
>> George Rhoten 写道:
>>
>>    
>>> You're looking at behavior that is not available in any release of ICU
>>> yet.  You're looking at the future ICU 3.8 behavior.
>>>
>>> Unfortunately, that CDRA page hasn't been updated with the latest
>>> information.  CCSID 5417 is Big5-HKSCS.
>>>
>>>      
>> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>>  From
>>
>>    
> http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt
>  
>> ,
>> 0xC87A was added
>> into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm.
>>
>>
>>    
>>> CCSID 1375 is Big5-HKSCS with
>>> Microsoft extensions.  Each have two alternate mapping tables that map
>>>
>>>      
>> the
>>
>>    
>>> codepoints to Unicode 3.0 and Unicode 3.1.
>>>
>>>      
>> How to get two mapping table from one file?
>>
>>    
>>>   So there are at least 4
>>> mapping tables for these two CCSIDs.  There's actually a total of 6
>>>
>>>      
>> tables
>>
>>    
>>> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion.
>>>
>>> The last time I checked, Windows has a patch that modifies windows-950
>>>      
> to
>  
>>    
>>> support the HKSCS characters, but it's for Unicode 3.0.  This means that
>>> many characters are mapped to the private use area of Unicode.  So this
>>> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior.  This is
>>> also similar to some implementations on Solaris and HP-UX.
>>>
>>>
>>>      
>> Does this mean that to keep compatibility with Windows, CCSID1375 with
>> the Unicode 3.0 behavor
>> should be used?
>>
>> Thank you very much!
>>
>> Regards,
>> Yandong
>>
>>    
>>> CCSID 5417 tries to match the Big5-HKSCS specification without so many
>>> extensions.  I've also picked the variant table with the Unicode 3.1
>>> mappings, since the Unicode 3.0 mapping table usually isn't used without
>>> the Microsoft extensions.  The official description can be found at <
>>> http://www.info.gov.hk/digital21/eng/hkscs/ >.  It's very similar to Mac
>>> OS X's implementation.  It can also be considered a "proper"
>>> implementation because it's using the Unicode supplementary characters.
>>>
>>> The glibc implementation of Big5-HKSCS is significantly different from
>>> other implementations.  It's Big5-HKSCS with a lot of Unicode 3.1
>>> mappings, but it's incomplete.  It doesn't map some characters that are
>>> mapped in other Big5-HKSCS implementations.  It also maps some
>>>      
> characters
>  
>>    
>>> to different Unicode private use codepoints.  It's closer to CCSID 5417
>>> with Unicode 3.1 mappings.
>>>
>>> George Rhoten
>>> IBM Globalization Center of Competency/ICU  San Jos?, CA, USA
>>> http://www.icu-project.org/
>>>
>>>
>>>
>>> Yandong Yao <Yandong.Yao@...>
>>> Sent by: icu-support-bounces@...
>>> 05/14/2007 08:20 PM
>>> Please respond to
>>> Yandong.Yao@...; Please respond to
>>> ICU support mailing list <icu-support@...>
>>>
>>>
>>> To
>>> icu-support@...
>>> cc
>>>
>>> Subject
>>> [icu-support] What is the difference between two Big5-HKSCS
>>>
>>>      
>> conversion
>>
>>    
>>> table?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hi guys,
>>>
>>> >From icu/source/data/mappings/convrtrs.txt, I found below two
>>>      
> conversion
>  
>>> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471
>>> from
>>> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp
>>>      
> .
>  
>>> Could you help to clarify that what is the difference between them and
>>> which one should be used at what context?
>>>
>>> Thank you very much!
>>>
>>> ---------------------------------------------
>>> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters.
>>> ibm-5471 { IBM* }
>>> Big5-HKSCS { IANA* JAVA* }
>>> big5hk { JAVA }
>>> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/
>>> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan
>>> Big-5 w/ HKSCS extensions
>>> ibm-1375 { IBM* }
>>> Big5-HKSCS
>>> MS950_HKSCS { JAVA* }
>>> hkbig5 # from HP-UX 11i, which can't handle supplementary characters.
>>> big5-hkscs:unicode3.0
>>> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By
>>> default it's not.
>>> # windows-950_hkscs
>>> ------------------------------------------------
>>>
>>> Regards,
>>> Yandong
>>>
>>>
>>>
>>>
>>>      
> -------------------------------------------------------------------------
>  
>>> This SF.net email is sponsored by DB2 Express
>>> Download DB2 Express C - the FREE version of DB2 express and take
>>> control of your XML. No limits. Just data. Click to get it now.
>>> http://sourceforge.net/powerbar/db2/
>>> _______________________________________________
>>> icu-support mailing list - icu-support@...
>>> To Un/Subscribe:
>>>      
> https://lists.sourceforge.net/lists/listinfo/icu-support
>  
>>>      
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> icu-support mailing list - icu-support@...
>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>>
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by DB2 Express
>> Download DB2 Express C - the FREE version of DB2 express and take
>> control of your XML. No limits. Just data. Click to get it now.
>> http://sourceforge.net/powerbar/db2/
>> _______________________________________________
>> icu-support mailing list - icu-support@...
>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>>
>>    
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>  


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by George Rhoten :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>
>    CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375
>    contains HKSCS-2004 character set.

I didn't realize that. I had incorrectly assumed that the update of CCSID
1375 was 5471 because many of the other updates to a CCSID have 4096 added
to the CCSID (e.g. 1255 -> 5354 -> 9447). The CDRA database within IBM was
missing this information in the description. This is helpful information.

> Does this mean that to keep compatibility with Windows, CCSID1375 with
the
> Unicode 3.0 behavior should be used?
>
>    Yes, I think so.

Actually no. I had used
http://www.icu-project.org/charts/charset/roundtripIndex.html#windows-950_hkscs-2001 
to determine the correct CCSID to use. After closer inspection CCSID 5471
should be used with the Unicode 3.0 mappings (ibm-5471_P100-2006). The
Microsoft implementation typically will map unused characters to random
Unicode characters. When you use the Microsoft API to discover their
behavior, you don't get an error for valid but "unmapped" byte sequences.
The differences between Microsoft's Big5-HKSCS and ibm-5471_P100-2006 is
mainly how the Unicode PUA is used.

When the original Big5-HKSCS mapping was collected from Windows XP, the
patch at http://www.microsoft.com/hk/hkscs/ was used. This uses
Big5-HKSCS-2001. The page now states that Big5-HKSCS-2004 is natively
supported in Windows Vista. So I'll have to inspect the Windows Vista
behavior to determine the correct table to use. It's likely that CCSID
1375 will be used for the Windows compatible implementation, and a newer
Unicode mapping will be used. So it may be an alternate CCSID 1375. I
don't know yet.

So whatever you see in ICU's trunk is incorrect. This is post ICU 3.6
work. Don't use it for any decisions on your implementation.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: What is the difference between two Big5-HKSCS conversion table?

by George Rhoten :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Okay, I've taken a closer look at the Windows Vista implementation and the
fine print on the Microsoft web site.  Windows Vista does not support
codepage conversion to Big5-HKSCS-2004 or Big5-HKSCS-2001, but it does
support the characters from Unicode 4.1, which contains the characters
from Big5-HKSCS.  Basically Windows has the fonts and the IME to use the
characters in Big5-HKSCS-2004.

The add-on from the Microsoft website is for Big5-HKSCS-2001, and the site
provides code to convert the PUA characters from Big5-HKSCS-2001 to
Unicode 4.1.

So CCSID 1375 with the newer Unicode mappings will be used to denote
Big5-HKSCS-2004 in ICU.  CCSID 5471 with the older Unicode mappings will
be used to denote Big5-HKSCS-2001 and big5-hkscs:unicode3.0 in ICU.

ICU's usage of CCSID 1375 will convert Big5-HKSCS in a way that will be
viewable by Windows Vista.  This will be the default when you generically
request Big5-HKSCS.  ICU's usage of CCSID 5417 will convert Big5-HKSCS in
a way that is compatible with the Microsoft Windows add-on, and the
results *may not* be 100% viewable by Windows Vista due to the font
support.

If you read between the lines on the Microsoft Big5-HKSCS pages, they're
saying that you should migrate your Big5-HKSCS data to Unicode 4.1.  This
is a perfectly reasonable migration strategy :-)  You should keep that in
mind, if you are concerned about compatibility with Windows Vista.

George Rhoten
IBM Globalization Center of Competency/ICU  San José, CA, USA
http://www.icu-project.org/



George Rhoten/San Jose/IBM@IBMUS
Sent by: icu-support-bounces@...
05/15/2007 10:10 AM
Please respond to
ICU support mailing list <icu-support@...>


To
ICU support mailing list <icu-support@...>
cc

Subject
Re: [icu-support] What is the difference between two    Big5-HKSCS
conversion table?






> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
>
>    CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375
>    contains HKSCS-2004 character set.

I didn't realize that. I had incorrectly assumed that the update of CCSID
1375 was 5471 because many of the other updates to a CCSID have 4096 added

to the CCSID (e.g. 1255 -> 5354 -> 9447). The CDRA database within IBM was

missing this information in the description. This is helpful information.

> Does this mean that to keep compatibility with Windows, CCSID1375 with
the
> Unicode 3.0 behavior should be used?
>
>    Yes, I think so.

Actually no. I had used
http://www.icu-project.org/charts/charset/roundtripIndex.html#windows-950_hkscs-2001 

to determine the correct CCSID to use. After closer inspection CCSID 5471
should be used with the Unicode 3.0 mappings (ibm-5471_P100-2006). The
Microsoft implementation typically will map unused characters to random
Unicode characters. When you use the Microsoft API to discover their
behavior, you don't get an error for valid but "unmapped" byte sequences.
The differences between Microsoft's Big5-HKSCS and ibm-5471_P100-2006 is
mainly how the Unicode PUA is used.

When the original Big5-HKSCS mapping was collected from Windows XP, the
patch at http://www.microsoft.com/hk/hkscs/ was used. This uses
Big5-HKSCS-2001. The page now states that Big5-HKSCS-2004 is natively
supported in Windows Vista. So I'll have to inspect the Windows Vista
behavior to determine the correct table to use. It's likely that CCSID
1375 will be used for the Windows compatible implementation, and a newer
Unicode mapping will be used. So it may be an alternate CCSID 1375. I
don't know yet.

So whatever you see in ICU's trunk is incorrect. This is post ICU 3.6
work. Don't use it for any decisions on your implementation.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support