|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
What is the difference between two Big5-HKSCS conversion table?Hi guys,
>From icu/source/data/mappings/convrtrs.txt, I found below two conversion tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 from http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp. Could you help to clarify that what is the difference between them and which one should be used at what context? Thank you very much! --------------------------------------------- ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. ibm-5471 { IBM* } Big5-HKSCS { IANA* JAVA* } big5hk { JAVA } HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan Big-5 w/ HKSCS extensions ibm-1375 { IBM* } Big5-HKSCS MS950_HKSCS { JAVA* } hkbig5 # from HP-UX 11i, which can't handle supplementary characters. big5-hkscs:unicode3.0 # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By default it's not. # windows-950_hkscs ------------------------------------------------ Regards, Yandong ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?You're looking at behavior that is not available in any release of ICU
yet. You're looking at the future ICU 3.8 behavior. Unfortunately, that CDRA page hasn't been updated with the latest information. CCSID 5417 is Big5-HKSCS. CCSID 1375 is Big5-HKSCS with Microsoft extensions. Each have two alternate mapping tables that map the codepoints to Unicode 3.0 and Unicode 3.1. So there are at least 4 mapping tables for these two CCSIDs. There's actually a total of 6 tables for the 2 CCSIDs, but the other 2 aren't relevant to this discussion. The last time I checked, Windows has a patch that modifies windows-950 to support the HKSCS characters, but it's for Unicode 3.0. This means that many characters are mapped to the private use area of Unicode. So this behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior. This is also similar to some implementations on Solaris and HP-UX. CCSID 5417 tries to match the Big5-HKSCS specification without so many extensions. I've also picked the variant table with the Unicode 3.1 mappings, since the Unicode 3.0 mapping table usually isn't used without the Microsoft extensions. The official description can be found at < http://www.info.gov.hk/digital21/eng/hkscs/ >. It's very similar to Mac OS X's implementation. It can also be considered a "proper" implementation because it's using the Unicode supplementary characters. The glibc implementation of Big5-HKSCS is significantly different from other implementations. It's Big5-HKSCS with a lot of Unicode 3.1 mappings, but it's incomplete. It doesn't map some characters that are mapped in other Big5-HKSCS implementations. It also maps some characters to different Unicode private use codepoints. It's closer to CCSID 5417 with Unicode 3.1 mappings. George Rhoten IBM Globalization Center of Competency/ICU San José, CA, USA http://www.icu-project.org/ Yandong Yao <Yandong.Yao@...> Sent by: icu-support-bounces@... 05/14/2007 08:20 PM Please respond to Yandong.Yao@...; Please respond to ICU support mailing list <icu-support@...> To icu-support@... cc Subject [icu-support] What is the difference between two Big5-HKSCS conversion table? Hi guys, >From icu/source/data/mappings/convrtrs.txt, I found below two conversion tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 from http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp. Could you help to clarify that what is the difference between them and which one should be used at what context? Thank you very much! --------------------------------------------- ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. ibm-5471 { IBM* } Big5-HKSCS { IANA* JAVA* } big5hk { JAVA } HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan Big-5 w/ HKSCS extensions ibm-1375 { IBM* } Big5-HKSCS MS950_HKSCS { JAVA* } hkbig5 # from HP-UX 11i, which can't handle supplementary characters. big5-hkscs:unicode3.0 # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By default it's not. # windows-950_hkscs ------------------------------------------------ Regards, Yandong ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?Hi George,
George Rhoten 写道: > You're looking at behavior that is not available in any release of ICU > yet. You're looking at the future ICU 3.8 behavior. > > Unfortunately, that CDRA page hasn't been updated with the latest > information. CCSID 5417 is Big5-HKSCS. What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? From http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt, 0xC87A was added into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm. > CCSID 1375 is Big5-HKSCS with > Microsoft extensions. Each have two alternate mapping tables that map the > codepoints to Unicode 3.0 and Unicode 3.1. How to get two mapping table from one file? > So there are at least 4 > mapping tables for these two CCSIDs. There's actually a total of 6 tables > for the 2 CCSIDs, but the other 2 aren't relevant to this discussion. > > The last time I checked, Windows has a patch that modifies windows-950 to > support the HKSCS characters, but it's for Unicode 3.0. This means that > many characters are mapped to the private use area of Unicode. So this > behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior. This is > also similar to some implementations on Solaris and HP-UX. > the Unicode 3.0 behavor should be used? Thank you very much! Regards, Yandong > CCSID 5417 tries to match the Big5-HKSCS specification without so many > extensions. I've also picked the variant table with the Unicode 3.1 > mappings, since the Unicode 3.0 mapping table usually isn't used without > the Microsoft extensions. The official description can be found at < > http://www.info.gov.hk/digital21/eng/hkscs/ >. It's very similar to Mac > OS X's implementation. It can also be considered a "proper" > implementation because it's using the Unicode supplementary characters. > > The glibc implementation of Big5-HKSCS is significantly different from > other implementations. It's Big5-HKSCS with a lot of Unicode 3.1 > mappings, but it's incomplete. It doesn't map some characters that are > mapped in other Big5-HKSCS implementations. It also maps some characters > to different Unicode private use codepoints. It's closer to CCSID 5417 > with Unicode 3.1 mappings. > > George Rhoten > IBM Globalization Center of Competency/ICU San José, CA, USA > http://www.icu-project.org/ > > > > Yandong Yao <Yandong.Yao@...> > Sent by: icu-support-bounces@... > 05/14/2007 08:20 PM > Please respond to > Yandong.Yao@...; Please respond to > ICU support mailing list <icu-support@...> > > > To > icu-support@... > cc > > Subject > [icu-support] What is the difference between two Big5-HKSCS conversion > table? > > > > > > > Hi guys, > > >From icu/source/data/mappings/convrtrs.txt, I found below two conversion > tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 > from > http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp. > Could you help to clarify that what is the difference between them and > which one should be used at what context? > > Thank you very much! > > --------------------------------------------- > ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. > ibm-5471 { IBM* } > Big5-HKSCS { IANA* JAVA* } > big5hk { JAVA } > HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ > ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan > Big-5 w/ HKSCS extensions > ibm-1375 { IBM* } > Big5-HKSCS > MS950_HKSCS { JAVA* } > hkbig5 # from HP-UX 11i, which can't handle supplementary characters. > big5-hkscs:unicode3.0 > # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By > default it's not. > # windows-950_hkscs > ------------------------------------------------ > > Regards, > Yandong > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?Hello,
What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375 contains HKSCS-2004 character set. Does this mean that to keep compatibility with Windows, CCSID1375 with the Unicode 3.0 behavior should be used? Yes, I think so. Best regards, Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB), Globalization Center of Competency - Yamato, IBM Japan T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497 e-Mail: orita@... Yandong Yao <Yandong.Yao@Sun. COM> To Sent by: ICU support mailing list icu-support-bounc <icu-support@...> es@... cc rge.net Subject Re: [icu-support] What is the 2007/05/15 16:23 difference between two Big5-HKSCS conversion table? Please respond to Yandong.Yao@... OM; Please respond to ICU support mailing list <icu-support@list s.sourceforge.net > Hi George, George Rhoten 写道: > You're looking at behavior that is not available in any release of ICU > yet. You're looking at the future ICU 3.8 behavior. > > Unfortunately, that CDRA page hasn't been updated with the latest > information. CCSID 5417 is Big5-HKSCS. What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? From http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt , 0xC87A was added into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm. > CCSID 1375 is Big5-HKSCS with > Microsoft extensions. Each have two alternate mapping tables that map the > codepoints to Unicode 3.0 and Unicode 3.1. How to get two mapping table from one file? > So there are at least 4 > mapping tables for these two CCSIDs. There's actually a total of 6 tables > for the 2 CCSIDs, but the other 2 aren't relevant to this discussion. > > The last time I checked, Windows has a patch that modifies windows-950 to > support the HKSCS characters, but it's for Unicode 3.0. This means that > many characters are mapped to the private use area of Unicode. So this > behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior. This is > also similar to some implementations on Solaris and HP-UX. > Does this mean that to keep compatibility with Windows, CCSID1375 with the Unicode 3.0 behavor should be used? Thank you very much! Regards, Yandong > CCSID 5417 tries to match the Big5-HKSCS specification without so many > extensions. I've also picked the variant table with the Unicode 3.1 > mappings, since the Unicode 3.0 mapping table usually isn't used without > the Microsoft extensions. The official description can be found at < > http://www.info.gov.hk/digital21/eng/hkscs/ >. It's very similar to Mac > OS X's implementation. It can also be considered a "proper" > implementation because it's using the Unicode supplementary characters. > > The glibc implementation of Big5-HKSCS is significantly different from > other implementations. It's Big5-HKSCS with a lot of Unicode 3.1 > mappings, but it's incomplete. It doesn't map some characters that are > mapped in other Big5-HKSCS implementations. It also maps some characters > to different Unicode private use codepoints. It's closer to CCSID 5417 > with Unicode 3.1 mappings. > > George Rhoten > IBM Globalization Center of Competency/ICU San Jos?, CA, USA > http://www.icu-project.org/ > > > > Yandong Yao <Yandong.Yao@...> > Sent by: icu-support-bounces@... > 05/14/2007 08:20 PM > Please respond to > Yandong.Yao@...; Please respond to > ICU support mailing list <icu-support@...> > > > To > icu-support@... > cc > > Subject > [icu-support] What is the difference between two Big5-HKSCS > table? > > > > > > > Hi guys, > > >From icu/source/data/mappings/convrtrs.txt, I found below two conversion > tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 > from > http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp. > Could you help to clarify that what is the difference between them and > which one should be used at what context? > > Thank you very much! > > --------------------------------------------- > ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. > ibm-5471 { IBM* } > Big5-HKSCS { IANA* JAVA* } > big5hk { JAVA } > HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ > ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan > Big-5 w/ HKSCS extensions > ibm-1375 { IBM* } > Big5-HKSCS > MS950_HKSCS { JAVA* } > hkbig5 # from HP-UX 11i, which can't handle supplementary characters. > big5-hkscs:unicode3.0 > # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By > default it's not. > # windows-950_hkscs > ------------------------------------------------ > > Regards, > Yandong > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?Hi Tetsuji,
Tetsuji Orita 写道: > Hello, > > What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? > > CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375 > contains HKSCS-2004 character set. > then 0xC87A which is in HKSCS-1999 and HKSCS-2001 is not in ibm-5471_P100-2007.ucm. is this a bug? > Does this mean that to keep compatibility with Windows, CCSID1375 with the > Unicode 3.0 behavior should be used? > > Yes, I think so. > thanks. Regards, Yandong > Best regards, > Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB), > Globalization Center of Competency - Yamato, IBM Japan > T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497 > e-Mail: orita@... > > > > > Yandong Yao > <Yandong.Yao@Sun. > COM> To > Sent by: ICU support mailing list > icu-support-bounc <icu-support@...> > es@... cc > rge.net > Subject > Re: [icu-support] What is the > 2007/05/15 16:23 difference between two Big5-HKSCS > conversion table? > > Please respond to > Yandong.Yao@... > OM; Please > respond to > ICU support > mailing list > <icu-support@list > s.sourceforge.net > > > > > > > > > Hi George, > > George Rhoten 写道: > >> You're looking at behavior that is not available in any release of ICU >> yet. You're looking at the future ICU 3.8 behavior. >> >> Unfortunately, that CDRA page hasn't been updated with the latest >> information. CCSID 5417 is Big5-HKSCS. >> > What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? > From > http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt > , > 0xC87A was added > into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm. > > >> CCSID 1375 is Big5-HKSCS with >> Microsoft extensions. Each have two alternate mapping tables that map >> > the > >> codepoints to Unicode 3.0 and Unicode 3.1. >> > How to get two mapping table from one file? > >> So there are at least 4 >> mapping tables for these two CCSIDs. There's actually a total of 6 >> > tables > >> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion. >> >> The last time I checked, Windows has a patch that modifies windows-950 to >> > > >> support the HKSCS characters, but it's for Unicode 3.0. This means that >> many characters are mapped to the private use area of Unicode. So this >> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior. This is >> also similar to some implementations on Solaris and HP-UX. >> >> > Does this mean that to keep compatibility with Windows, CCSID1375 with > the Unicode 3.0 behavor > should be used? > > Thank you very much! > > Regards, > Yandong > >> CCSID 5417 tries to match the Big5-HKSCS specification without so many >> extensions. I've also picked the variant table with the Unicode 3.1 >> mappings, since the Unicode 3.0 mapping table usually isn't used without >> the Microsoft extensions. The official description can be found at < >> http://www.info.gov.hk/digital21/eng/hkscs/ >. It's very similar to Mac >> OS X's implementation. It can also be considered a "proper" >> implementation because it's using the Unicode supplementary characters. >> >> The glibc implementation of Big5-HKSCS is significantly different from >> other implementations. It's Big5-HKSCS with a lot of Unicode 3.1 >> mappings, but it's incomplete. It doesn't map some characters that are >> mapped in other Big5-HKSCS implementations. It also maps some characters >> > > >> to different Unicode private use codepoints. It's closer to CCSID 5417 >> with Unicode 3.1 mappings. >> >> George Rhoten >> IBM Globalization Center of Competency/ICU San Jos?, CA, USA >> http://www.icu-project.org/ >> >> >> >> Yandong Yao <Yandong.Yao@...> >> Sent by: icu-support-bounces@... >> 05/14/2007 08:20 PM >> Please respond to >> Yandong.Yao@...; Please respond to >> ICU support mailing list <icu-support@...> >> >> >> To >> icu-support@... >> cc >> >> Subject >> [icu-support] What is the difference between two Big5-HKSCS >> > conversion > >> table? >> >> >> >> >> >> >> Hi guys, >> >> >From icu/source/data/mappings/convrtrs.txt, I found below two conversion >> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 >> from >> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp. >> Could you help to clarify that what is the difference between them and >> which one should be used at what context? >> >> Thank you very much! >> >> --------------------------------------------- >> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. >> ibm-5471 { IBM* } >> Big5-HKSCS { IANA* JAVA* } >> big5hk { JAVA } >> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ >> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan >> Big-5 w/ HKSCS extensions >> ibm-1375 { IBM* } >> Big5-HKSCS >> MS950_HKSCS { JAVA* } >> hkbig5 # from HP-UX 11i, which can't handle supplementary characters. >> big5-hkscs:unicode3.0 >> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By >> default it's not. >> # windows-950_hkscs >> ------------------------------------------------ >> >> Regards, >> Yandong >> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> icu-support mailing list - icu-support@... >> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support >> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?Hello,
0xC87A should be in Unicode table for CCSID 5471. CCSID 5471 table that I have here contains 0xC87A. I do not know why the table you are looking does not contain it. Best regards, Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB), Globalization Center of Competency - Yamato, IBM Japan T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497 e-Mail: orita@... Yandong Yao <Yandong.Yao@Sun. COM> To Sent by: ICU support mailing list icu-support-bounc <icu-support@...> es@... cc rge.net Subject Re: [icu-support] What is the 2007/05/15 17:47 difference between two Big5-HKSCS conversion table? Please respond to Yandong.Yao@... OM; Please respond to ICU support mailing list <icu-support@list s.sourceforge.net > Hi Tetsuji, Tetsuji Orita 写道: > Hello, > > What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? > > CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375 > contains HKSCS-2004 character set. > then 0xC87A which is in HKSCS-1999 and HKSCS-2001 is not in ibm-5471_P100-2007.ucm. is this a bug? > Does this mean that to keep compatibility with Windows, CCSID1375 with the > Unicode 3.0 behavior should be used? > > Yes, I think so. > thanks. Regards, Yandong > Best regards, > Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB), > Globalization Center of Competency - Yamato, IBM Japan > T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497 > e-Mail: orita@... > > > > > Yandong Yao > <Yandong.Yao@Sun. > COM> To > Sent by: ICU support mailing list > icu-support-bounc <icu-support@...> > es@... cc > rge.net > Subject > Re: [icu-support] What is the > 2007/05/15 16:23 difference between two Big5-HKSCS > conversion table? > > Please respond to > Yandong.Yao@... > OM; Please > respond to > ICU support > mailing list > <icu-support@list > s.sourceforge.net > > > > > > > > > Hi George, > > George Rhoten 写道: > >> You're looking at behavior that is not available in any release of ICU >> yet. You're looking at the future ICU 3.8 behavior. >> >> Unfortunately, that CDRA page hasn't been updated with the latest >> information. CCSID 5417 is Big5-HKSCS. >> > What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? > From > > , > 0xC87A was added > into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm. > > >> CCSID 1375 is Big5-HKSCS with >> Microsoft extensions. Each have two alternate mapping tables that map >> > the > >> codepoints to Unicode 3.0 and Unicode 3.1. >> > How to get two mapping table from one file? > >> So there are at least 4 >> mapping tables for these two CCSIDs. There's actually a total of 6 >> > tables > >> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion. >> >> The last time I checked, Windows has a patch that modifies windows-950 >> > > >> support the HKSCS characters, but it's for Unicode 3.0. This means that >> many characters are mapped to the private use area of Unicode. So this >> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior. This is >> also similar to some implementations on Solaris and HP-UX. >> >> > Does this mean that to keep compatibility with Windows, CCSID1375 with > the Unicode 3.0 behavor > should be used? > > Thank you very much! > > Regards, > Yandong > >> CCSID 5417 tries to match the Big5-HKSCS specification without so many >> extensions. I've also picked the variant table with the Unicode 3.1 >> mappings, since the Unicode 3.0 mapping table usually isn't used without >> the Microsoft extensions. The official description can be found at < >> http://www.info.gov.hk/digital21/eng/hkscs/ >. It's very similar to Mac >> OS X's implementation. It can also be considered a "proper" >> implementation because it's using the Unicode supplementary characters. >> >> The glibc implementation of Big5-HKSCS is significantly different from >> other implementations. It's Big5-HKSCS with a lot of Unicode 3.1 >> mappings, but it's incomplete. It doesn't map some characters that are >> mapped in other Big5-HKSCS implementations. It also maps some >> > > >> to different Unicode private use codepoints. It's closer to CCSID 5417 >> with Unicode 3.1 mappings. >> >> George Rhoten >> IBM Globalization Center of Competency/ICU San Jos?, CA, USA >> http://www.icu-project.org/ >> >> >> >> Yandong Yao <Yandong.Yao@...> >> Sent by: icu-support-bounces@... >> 05/14/2007 08:20 PM >> Please respond to >> Yandong.Yao@...; Please respond to >> ICU support mailing list <icu-support@...> >> >> >> To >> icu-support@... >> cc >> >> Subject >> [icu-support] What is the difference between two Big5-HKSCS >> > conversion > >> table? >> >> >> >> >> >> >> Hi guys, >> >> >From icu/source/data/mappings/convrtrs.txt, I found below two >> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 >> from >> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp . >> Could you help to clarify that what is the difference between them and >> which one should be used at what context? >> >> Thank you very much! >> >> --------------------------------------------- >> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. >> ibm-5471 { IBM* } >> Big5-HKSCS { IANA* JAVA* } >> big5hk { JAVA } >> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ >> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan >> Big-5 w/ HKSCS extensions >> ibm-1375 { IBM* } >> Big5-HKSCS >> MS950_HKSCS { JAVA* } >> hkbig5 # from HP-UX 11i, which can't handle supplementary characters. >> big5-hkscs:unicode3.0 >> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By >> default it's not. >> # windows-950_hkscs >> ------------------------------------------------ >> >> Regards, >> Yandong >> >> >> >> >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> icu-support mailing list - icu-support@... >> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support >> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?sorry, i checked the wrong file, 5471 do contain this code point.
thanks. Regards, Yandong Tetsuji Orita 写道: > Hello, > 0xC87A should be in Unicode table for CCSID 5471. CCSID 5471 table that I > have here contains 0xC87A. I do not know why the table you are looking does > not contain it. > > Best regards, > Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB), > Globalization Center of Competency - Yamato, IBM Japan > T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497 > e-Mail: orita@... > > > > > Yandong Yao > <Yandong.Yao@Sun. > COM> To > Sent by: ICU support mailing list > icu-support-bounc <icu-support@...> > es@... cc > rge.net > Subject > Re: [icu-support] What is the > 2007/05/15 17:47 difference between two > Big5-HKSCS conversion table? > > Please respond to > Yandong.Yao@... > OM; Please > respond to > ICU support > mailing list > <icu-support@list > s.sourceforge.net > > > > > > > > > Hi Tetsuji, > > Tetsuji Orita 写道: > >> Hello, >> >> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? >> >> CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375 >> contains HKSCS-2004 character set. >> >> > then 0xC87A which is in HKSCS-1999 and HKSCS-2001 is not in > ibm-5471_P100-2007.ucm. > is this a bug? > >> Does this mean that to keep compatibility with Windows, CCSID1375 with >> > the > >> Unicode 3.0 behavior should be used? >> >> Yes, I think so. >> >> > thanks. > > Regards, > Yandong > >> Best regards, >> Tetsuji Orita (U+7E54,U+7530,U+54F2,U+6CBB), >> Globalization Center of Competency - Yamato, IBM Japan >> T/L: 1808-5425, TEL: +81-46-215-5425, FAX:+81-46-273-7497 >> e-Mail: orita@... >> >> >> >> >> > > >> Yandong Yao >> > > >> <Yandong.Yao@Sun. >> > > >> COM> >> > To > >> Sent by: ICU support mailing list >> > > >> icu-support-bounc >> > <icu-support@...> > >> es@... >> > cc > >> rge.net >> > > > Subject > >> Re: [icu-support] What is the >> > > >> 2007/05/15 16:23 difference between two Big5-HKSCS >> > > >> conversion table? >> > > > > >> Please respond to >> > > >> Yandong.Yao@... >> > > >> OM; Please >> > > >> respond to >> > > >> ICU support >> > > >> mailing list >> > > >> <icu-support@list >> > > >> s.sourceforge.net >> > > >> > >> > > > > > > >> >> >> Hi George, >> >> George Rhoten 写道: >> >> >>> You're looking at behavior that is not available in any release of ICU >>> yet. You're looking at the future ICU 3.8 behavior. >>> >>> Unfortunately, that CDRA page hasn't been updated with the latest >>> information. CCSID 5417 is Big5-HKSCS. >>> >>> >> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? >> From >> >> > http://www.info.gov.hk/digital21/eng/hkscs/download/hkscs-2004-chr-incl.txt > >> , >> 0xC87A was added >> into HKSCS-1999, but I can not find it in ibm-5471_P100-2007.ucm. >> >> >> >>> CCSID 1375 is Big5-HKSCS with >>> Microsoft extensions. Each have two alternate mapping tables that map >>> >>> >> the >> >> >>> codepoints to Unicode 3.0 and Unicode 3.1. >>> >>> >> How to get two mapping table from one file? >> >> >>> So there are at least 4 >>> mapping tables for these two CCSIDs. There's actually a total of 6 >>> >>> >> tables >> >> >>> for the 2 CCSIDs, but the other 2 aren't relevant to this discussion. >>> >>> The last time I checked, Windows has a patch that modifies windows-950 >>> > to > >> >>> support the HKSCS characters, but it's for Unicode 3.0. This means that >>> many characters are mapped to the private use area of Unicode. So this >>> behavior is mapped to CCSID 1375 with the Unicode 3.0 behavior. This is >>> also similar to some implementations on Solaris and HP-UX. >>> >>> >>> >> Does this mean that to keep compatibility with Windows, CCSID1375 with >> the Unicode 3.0 behavor >> should be used? >> >> Thank you very much! >> >> Regards, >> Yandong >> >> >>> CCSID 5417 tries to match the Big5-HKSCS specification without so many >>> extensions. I've also picked the variant table with the Unicode 3.1 >>> mappings, since the Unicode 3.0 mapping table usually isn't used without >>> the Microsoft extensions. The official description can be found at < >>> http://www.info.gov.hk/digital21/eng/hkscs/ >. It's very similar to Mac >>> OS X's implementation. It can also be considered a "proper" >>> implementation because it's using the Unicode supplementary characters. >>> >>> The glibc implementation of Big5-HKSCS is significantly different from >>> other implementations. It's Big5-HKSCS with a lot of Unicode 3.1 >>> mappings, but it's incomplete. It doesn't map some characters that are >>> mapped in other Big5-HKSCS implementations. It also maps some >>> > characters > >> >>> to different Unicode private use codepoints. It's closer to CCSID 5417 >>> with Unicode 3.1 mappings. >>> >>> George Rhoten >>> IBM Globalization Center of Competency/ICU San Jos?, CA, USA >>> http://www.icu-project.org/ >>> >>> >>> >>> Yandong Yao <Yandong.Yao@...> >>> Sent by: icu-support-bounces@... >>> 05/14/2007 08:20 PM >>> Please respond to >>> Yandong.Yao@...; Please respond to >>> ICU support mailing list <icu-support@...> >>> >>> >>> To >>> icu-support@... >>> cc >>> >>> Subject >>> [icu-support] What is the difference between two Big5-HKSCS >>> >>> >> conversion >> >> >>> table? >>> >>> >>> >>> >>> >>> >>> Hi guys, >>> >>> >From icu/source/data/mappings/convrtrs.txt, I found below two >>> > conversion > >>> tables for Big5-HKSCS, and I have not found the explanation for ibm-5471 >>> from >>> http://www-306.ibm.com/software/globalization/ccsid/ccsid_registered.jsp >>> > . > >>> Could you help to clarify that what is the difference between them and >>> which one should be used at what context? >>> >>> Thank you very much! >>> >>> --------------------------------------------- >>> ibm-5471_P100-2007 { UTR22* } # This uses supplementary characters. >>> ibm-5471 { IBM* } >>> Big5-HKSCS { IANA* JAVA* } >>> big5hk { JAVA } >>> HKSCS-BIG5 # From http://www.openi18n.org/localenameguide/ >>> ibm-1375_P100-2006 { UTR22* } # IBM's interpretation of Windows' Taiwan >>> Big-5 w/ HKSCS extensions >>> ibm-1375 { IBM* } >>> Big5-HKSCS >>> MS950_HKSCS { JAVA* } >>> hkbig5 # from HP-UX 11i, which can't handle supplementary characters. >>> big5-hkscs:unicode3.0 >>> # windows-950 # Windows-950 can be w/ or w/o HKSCS extensions. By >>> default it's not. >>> # windows-950_hkscs >>> ------------------------------------------------ >>> >>> Regards, >>> Yandong >>> >>> >>> >>> >>> > ------------------------------------------------------------------------- > >>> This SF.net email is sponsored by DB2 Express >>> Download DB2 Express C - the FREE version of DB2 express and take >>> control of your XML. No limits. Just data. Click to get it now. >>> http://sourceforge.net/powerbar/db2/ >>> _______________________________________________ >>> icu-support mailing list - icu-support@... >>> To Un/Subscribe: >>> > https://lists.sourceforge.net/lists/listinfo/icu-support > >>> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> icu-support mailing list - icu-support@... >> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support >> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by DB2 Express >> Download DB2 Express C - the FREE version of DB2 express and take >> control of your XML. No limits. Just data. Click to get it now. >> http://sourceforge.net/powerbar/db2/ >> _______________________________________________ >> icu-support mailing list - icu-support@... >> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support >> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?> What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999?
> > CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375 > contains HKSCS-2004 character set. I didn't realize that. I had incorrectly assumed that the update of CCSID 1375 was 5471 because many of the other updates to a CCSID have 4096 added to the CCSID (e.g. 1255 -> 5354 -> 9447). The CDRA database within IBM was missing this information in the description. This is helpful information. > Does this mean that to keep compatibility with Windows, CCSID1375 with the > Unicode 3.0 behavior should be used? > > Yes, I think so. Actually no. I had used http://www.icu-project.org/charts/charset/roundtripIndex.html#windows-950_hkscs-2001 to determine the correct CCSID to use. After closer inspection CCSID 5471 should be used with the Unicode 3.0 mappings (ibm-5471_P100-2006). The Microsoft implementation typically will map unused characters to random Unicode characters. When you use the Microsoft API to discover their behavior, you don't get an error for valid but "unmapped" byte sequences. The differences between Microsoft's Big5-HKSCS and ibm-5471_P100-2006 is mainly how the Unicode PUA is used. When the original Big5-HKSCS mapping was collected from Windows XP, the patch at http://www.microsoft.com/hk/hkscs/ was used. This uses Big5-HKSCS-2001. The page now states that Big5-HKSCS-2004 is natively supported in Windows Vista. So I'll have to inspect the Windows Vista behavior to determine the correct table to use. It's likely that CCSID 1375 will be used for the Windows compatible implementation, and a newer Unicode mapping will be used. So it may be an alternate CCSID 1375. I don't know yet. So whatever you see in ICU's trunk is incorrect. This is post ICU 3.6 work. Don't use it for any decisions on your implementation. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: What is the difference between two Big5-HKSCS conversion table?Okay, I've taken a closer look at the Windows Vista implementation and the
fine print on the Microsoft web site. Windows Vista does not support codepage conversion to Big5-HKSCS-2004 or Big5-HKSCS-2001, but it does support the characters from Unicode 4.1, which contains the characters from Big5-HKSCS. Basically Windows has the fonts and the IME to use the characters in Big5-HKSCS-2004. The add-on from the Microsoft website is for Big5-HKSCS-2001, and the site provides code to convert the PUA characters from Big5-HKSCS-2001 to Unicode 4.1. So CCSID 1375 with the newer Unicode mappings will be used to denote Big5-HKSCS-2004 in ICU. CCSID 5471 with the older Unicode mappings will be used to denote Big5-HKSCS-2001 and big5-hkscs:unicode3.0 in ICU. ICU's usage of CCSID 1375 will convert Big5-HKSCS in a way that will be viewable by Windows Vista. This will be the default when you generically request Big5-HKSCS. ICU's usage of CCSID 5417 will convert Big5-HKSCS in a way that is compatible with the Microsoft Windows add-on, and the results *may not* be 100% viewable by Windows Vista due to the font support. If you read between the lines on the Microsoft Big5-HKSCS pages, they're saying that you should migrate your Big5-HKSCS data to Unicode 4.1. This is a perfectly reasonable migration strategy :-) You should keep that in mind, if you are concerned about compatibility with Windows Vista. George Rhoten IBM Globalization Center of Competency/ICU San José, CA, USA http://www.icu-project.org/ George Rhoten/San Jose/IBM@IBMUS Sent by: icu-support-bounces@... 05/15/2007 10:10 AM Please respond to ICU support mailing list <icu-support@...> To ICU support mailing list <icu-support@...> cc Subject Re: [icu-support] What is the difference between two Big5-HKSCS conversion table? > What is the version of HKSCS? HKSCS-2004 or HKSCS-2001 or HKSCS-1999? > > CCSID 5471 contains the character set for HKSCS-2001 and CCSID 1375 > contains HKSCS-2004 character set. I didn't realize that. I had incorrectly assumed that the update of CCSID 1375 was 5471 because many of the other updates to a CCSID have 4096 added to the CCSID (e.g. 1255 -> 5354 -> 9447). The CDRA database within IBM was missing this information in the description. This is helpful information. > Does this mean that to keep compatibility with Windows, CCSID1375 with the > Unicode 3.0 behavior should be used? > > Yes, I think so. Actually no. I had used http://www.icu-project.org/charts/charset/roundtripIndex.html#windows-950_hkscs-2001 to determine the correct CCSID to use. After closer inspection CCSID 5471 should be used with the Unicode 3.0 mappings (ibm-5471_P100-2006). The Microsoft implementation typically will map unused characters to random Unicode characters. When you use the Microsoft API to discover their behavior, you don't get an error for valid but "unmapped" byte sequences. The differences between Microsoft's Big5-HKSCS and ibm-5471_P100-2006 is mainly how the Unicode PUA is used. When the original Big5-HKSCS mapping was collected from Windows XP, the patch at http://www.microsoft.com/hk/hkscs/ was used. This uses Big5-HKSCS-2001. The page now states that Big5-HKSCS-2004 is natively supported in Windows Vista. So I'll have to inspect the Windows Vista behavior to determine the correct table to use. It's likely that CCSID 1375 will be used for the Windows compatible implementation, and a newer Unicode mapping will be used. So it may be an alternate CCSID 1375. I don't know yet. So whatever you see in ICU's trunk is incorrect. This is post ICU 3.6 work. Don't use it for any decisions on your implementation. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
| Free embeddable forum powered by Nabble | Forum Help |