|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Problem: Transliterator and Private Use AreaRE: Problem: Transliterator and Private Use Area
I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9 I'm creating a Transliterator from rules. I find that I can write rules like " cat > \u0065; " that map a string of characters into a defined single Unicode char (here 'e'), but I cannot map the same string of chars into a char from the Private Use Area, e.g. " cat > \uF500 ; ". Here's a minimal illustration of the problem: import com.ibm.icu.text.Transliterator ; public class Foo { public static void main(String[] args) { // This works: String ruleStr = "cat > \u0065;" ; // But the following, transliterating to a char in the Private Use Area, // does not work String ruleStr = "cat > \uF500;" ; String inStr = "tomcat" ; Transliterator tr = Transliterator.createFromRules("ID", ruleStr, Transliterator.FORWARD) ; String outStr = tr.transliterate(inStr) ; for (int i = 0; i < outStr.length(); i++) { System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ; } } } When ruleStr is defined as "cat > \u0065;", the string "tomcat" is transliterated to "tome", and the loop at the end prints out 0x0074 0x006F 0x006D 0x0065 which is exactly what I expected. But when ruleStr is defined as "cat > \uF500;", I get the following error message from the TransliteratorParser.parseRules() method: Exception in thread "main" java.lang.IllegalArgumentException: Variable range character in rule in " \uF500" at com.ibm.icu.text.TransliteratorParser.parseRules(TransliteratorParser.java:1099) at com.ibm.icu.text.TransliteratorParser.parse(TransliteratorParser.java:855) at com.ibm.icu.text.Transliterator.createFromRules(Transliterator.java:1417) at Foo.main(Foo.java:14) Is it really the intent of the parseRules() method to prevent the use of the Private Use Area? Is there a workaround? Thanks, Ken ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Problem: Transliterator and Private Use Area That's odd, since I know that the Indic transliterators use PU codes. You can try grabbing some of their lines and seeing how they differ from what you are doing.
Mark On 5/11/07,
Kenneth Reid Beesley <krbeesley@...> wrote: RE: Problem: Transliterator and Private Use Area -- Mark ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Problem: Transliterator and Private Use AreaI looked into the ICU4J transliterator code. It looks that the
implementation use the private use area for transliterator variables, as the results, use of code in the range ends up the exception. The implementation allows to change the code range for the variables by a rule. But I cannot find any documentations about the syntax. Anyway, the syntax should be - "use variable range <low> <high>" By default, low = 0xF000 and high = 0xF8FF. In your example, \uF500 is in this default variable range. When I reproduced the problem with your code. After I changed the rule string to add "use variable..." like below- String ruleStr = "use variable range 0xF000 0xF100; cat > \uF500"; the sample code no longer threw the exception and got the expected result. I'm not familiar with this and this might be an undocumented feature - but it should resolve your current problem at least. -Yoshito icu-support-bounces@... wrote on 05/11/2007 04:35:04 PM: > That's odd, since I know that the Indic transliterators use PU > codes. You can try grabbing some of their lines and seeing how they > differ from what you are doing. > > Mark > On 5/11/07, Kenneth Reid Beesley <krbeesley@...> wrote: > RE: Problem: Transliterator and Private Use Area > > I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9 > > I'm creating a Transliterator from rules. I find that I can write rules like > " cat > \u0065; " that map a string of characters into a defined > single Unicode > char (here 'e'), but I cannot map the same string of chars into a char from > the Private Use Area, e.g. " cat > \uF500 ; ". > > Here's a minimal illustration of the problem: > > import com.ibm.icu.text.Transliterator ; > > public class Foo { > > public static void main(String[] args) { > // This works: String ruleStr = "cat > \u0065;" ; > > // But the following, transliterating to a char in the > Private Use Area, > // does not work > String ruleStr = "cat > \uF500;" ; > > String inStr = "tomcat" ; > > Transliterator tr = Transliterator.createFromRules("ID", > Transliterator.FORWARD) ; > String outStr = tr.transliterate(inStr) ; > > for (int i = 0; i < outStr.length(); i++) { > System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ; > } > } > } > > When ruleStr is defined as "cat > \u0065;", the string "tomcat" is > transliterated > to "tome", and the loop at the end prints out > > 0x0074 > 0x006F > 0x006D > 0x0065 > > which is exactly what I expected. But when ruleStr is defined as "cat > > \uF500;", > I get the following error message from the > method: > > > Exception in thread "main" java.lang.IllegalArgumentException: > Variable range character in rule in " \uF500" > at com.ibm.icu.text.TransliteratorParser. > parseRules(TransliteratorParser.java :1099) > at com.ibm.icu.text.TransliteratorParser. > parse(TransliteratorParser.java:855) > at com.ibm.icu.text.Transliterator. > createFromRules(Transliterator.java:1417) > at Foo.main(Foo.java:14) > > Is it really the intent of the parseRules() method to prevent the use > of the Private > Use Area? Is there a workaround? > > Thanks, > > Ken > > > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > > > > -- > Mark ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Problem: Transliterator and Private Use AreaHello Yoshito,
MANY thanks for getting into the code and finding a solution. Much appreciated. This does seem to solve my problem. Best wishes, Ken On 5/11/07, yoshito_umaoka@... <yoshito_umaoka@...> wrote: > I looked into the ICU4J transliterator code. It looks that the > implementation uses the private use area for transliterator variables, as > the results, use of code in the range ends up the exception. The > implementation allows to change the code range for the variables by a > rule. But I cannot find any documentations about the syntax. > > Anyway, the syntax should be - > > "use variable range <low> <high>" > > By default, low = 0xF000 and high = 0xF8FF. In your example, \uF500 is in > this default variable range. When I reproduced the problem with your > code. After I changed the rule string to add "use variable..." like > below- > > String ruleStr = "use variable range 0xF000 0xF100; cat > \uF500"; > > the sample code no longer threw the exception and got the expected result. > > I'm not familiar with this and this might be an undocumented feature - but > it should resolve your current problem at least. > > > -Yoshito > > > icu-support-bounces@... wrote on 05/11/2007 04:35:04 PM: > > > That's odd, since I know that the Indic transliterators use PU > > codes. You can try grabbing some of their lines and seeing how they > > differ from what you are doing. > > > > Mark > > > On 5/11/07, Kenneth Reid Beesley <krbeesley@...> wrote: > > RE: Problem: Transliterator and Private Use Area > > > > I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9 > > > > I'm creating a Transliterator from rules. I find that I can write rules > like > > " cat > \u0065; " that map a string of characters into a defined > > single Unicode > > char (here 'e'), but I cannot map the same string of chars into a char > from > > the Private Use Area, e.g. " cat > \uF500 ; ". > > > > Here's a minimal illustration of the problem: > > > > import com.ibm.icu.text.Transliterator ; > > > > public class Foo { > > > > public static void main(String[] args) { > > // This works: String ruleStr = "cat > \u0065;" ; > > > > // But the following, transliterating to a char in the > > Private Use Area, > > // does not work > > String ruleStr = "cat > \uF500;" ; > > > > String inStr = "tomcat" ; > > > > Transliterator tr = Transliterator.createFromRules("ID", > ruleStr, > > Transliterator.FORWARD) ; > > String outStr = tr.transliterate(inStr) ; > > > > for (int i = 0; i < outStr.length(); i++) { > > System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ; > > } > > } > > } > > > > When ruleStr is defined as "cat > \u0065;", the string "tomcat" is > > transliterated > > to "tome", and the loop at the end prints out > > > > 0x0074 > > 0x006F > > 0x006D > > 0x0065 > > > > which is exactly what I expected. But when ruleStr is defined as "cat > > > \uF500;", > > I get the following error message from the > TransliteratorParser.parseRules() > > method: > > > > > > Exception in thread "main" java.lang.IllegalArgumentException: > > Variable range character in rule in " \uF500" > > at com.ibm.icu.text.TransliteratorParser. > > parseRules(TransliteratorParser.java :1099) > > at com.ibm.icu.text.TransliteratorParser. > > parse(TransliteratorParser.java:855) > > at com.ibm.icu.text.Transliterator. > > createFromRules(Transliterator.java:1417) > > at Foo.main(Foo.java:14) > > > > Is it really the intent of the parseRules() method to prevent the use > > of the Private > > Use Area? Is there a workaround? > > > > Thanks, > > > > Ken > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > icu-support mailing list - icu-support@... > > To Un/Subscribe: > https://lists.sourceforge.net/lists/listinfo/icu-support > > > > > > > > -- > > Mark > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > icu-support mailing list - icu-support@... > > To Un/Subscribe: > https://lists.sourceforge.net/lists/listinfo/icu-support > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
| Free embeddable forum powered by Nabble | Forum Help |