Problem: Transliterator and Private Use Area

View: New views
4 Messages — Rating Filter:   Alert me  

Problem: Transliterator and Private Use Area

by Kenneth R Beesley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RE:  Problem:  Transliterator and Private Use Area

I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9

I'm creating a Transliterator from rules.  I find that I can write rules like
" cat > \u0065; "  that map a string of characters into a defined single Unicode
char (here 'e'), but I cannot map the same string of chars into a char from
the Private Use Area, e.g.  " cat > \uF500 ; ".

Here's a minimal illustration of the problem:

import com.ibm.icu.text.Transliterator ;

public class Foo {

    public static void main(String[] args) {
        // This works: String ruleStr = "cat > \u0065;" ;

        // But the following, transliterating to a char in the Private Use Area,
        // does not work
        String ruleStr = "cat > \uF500;" ;

        String inStr = "tomcat" ;

        Transliterator tr = Transliterator.createFromRules("ID", ruleStr,
Transliterator.FORWARD) ;
        String outStr = tr.transliterate(inStr) ;

        for (int i = 0; i < outStr.length(); i++) {
            System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ;
        }
    }
}

When ruleStr is defined as "cat > \u0065;", the string "tomcat" is
transliterated
to "tome", and the loop at the end prints out

0x0074
0x006F
0x006D
0x0065

which is exactly what I expected.  But when ruleStr is defined as "cat
> \uF500;",
I get the following error message from the TransliteratorParser.parseRules()
method:


Exception in thread "main" java.lang.IllegalArgumentException:
Variable range character in rule in " \uF500"
        at com.ibm.icu.text.TransliteratorParser.parseRules(TransliteratorParser.java:1099)
        at com.ibm.icu.text.TransliteratorParser.parse(TransliteratorParser.java:855)
        at com.ibm.icu.text.Transliterator.createFromRules(Transliterator.java:1417)
        at Foo.main(Foo.java:14)

Is it really the intent of the parseRules() method to prevent the use
of the Private
Use Area?  Is there a workaround?

Thanks,

Ken

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: Problem: Transliterator and Private Use Area

by Mark Davis-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 That's odd, since I know that the Indic transliterators use PU codes. You can try grabbing some of their lines and seeing how they differ from what you are doing.

Mark

On 5/11/07, Kenneth Reid Beesley <krbeesley@...> wrote:
RE:  Problem:  Transliterator and Private Use Area

I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9

I'm creating a Transliterator from rules.  I find that I can write rules like
" cat > \u0065; "  that map a string of characters into a defined single Unicode
char (here 'e'), but I cannot map the same string of chars into a char from
the Private Use Area, e.g.  " cat > \uF500 ; ".

Here's a minimal illustration of the problem:

import com.ibm.icu.text.Transliterator ;

public class Foo {

    public static void main(String[] args) {
        // This works: String ruleStr = "cat > \u0065;" ;

        // But the following, transliterating to a char in the Private Use Area,
        // does not work
        String ruleStr = "cat > \uF500;" ;

        String inStr = "tomcat" ;

        Transliterator tr = Transliterator.createFromRules("ID", ruleStr,
Transliterator.FORWARD) ;
        String outStr = tr.transliterate(inStr) ;

        for (int i = 0; i < outStr.length(); i++) {
            System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ;
        }
    }
}

When ruleStr is defined as "cat > \u0065;", the string "tomcat" is
transliterated
to "tome", and the loop at the end prints out

0x0074
0x006F
0x006D
0x0065

which is exactly what I expected.  But when ruleStr is defined as "cat
> \uF500;",
I get the following error message from the TransliteratorParser.parseRules()
method:


Exception in thread "main" java.lang.IllegalArgumentException:
Variable range character in rule in " \uF500"
        at com.ibm.icu.text.TransliteratorParser.parseRules(TransliteratorParser.java :1099)
        at com.ibm.icu.text.TransliteratorParser.parse(TransliteratorParser.java:855)
        at com.ibm.icu.text.Transliterator.createFromRules(Transliterator.java:1417)
        at Foo.main(Foo.java:14)

Is it really the intent of the parseRules() method to prevent the use
of the Private
Use Area?  Is there a workaround?

Thanks,

Ken

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support



--
Mark
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: Problem: Transliterator and Private Use Area

by yoshito_umaoka :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I looked into the ICU4J transliterator code.  It looks that the
implementation use the private use area for transliterator variables, as
the results, use of code in the range ends up the exception.  The
implementation allows to change the code range for the variables by a
rule.  But I cannot find any documentations about the syntax.

Anyway, the syntax should be -

"use variable range <low> <high>"

By default, low = 0xF000 and high = 0xF8FF.  In your example, \uF500 is in
this default variable range.  When I reproduced the problem with your
code.  After I changed the rule string to add "use variable..." like
below-

String ruleStr = "use variable range 0xF000 0xF100; cat > \uF500";

the sample code no longer threw the exception and got the expected result.

I'm not familiar with this and this might be an undocumented feature - but
it should resolve your current problem at least.


-Yoshito


icu-support-bounces@... wrote on 05/11/2007 04:35:04 PM:

>  That's odd, since I know that the Indic transliterators use PU
> codes. You can try grabbing some of their lines and seeing how they
> differ from what you are doing.
>
> Mark

> On 5/11/07, Kenneth Reid Beesley <krbeesley@...> wrote:
> RE:  Problem:  Transliterator and Private Use Area
>
> I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9
>
> I'm creating a Transliterator from rules.  I find that I can write rules
like
> " cat > \u0065; "  that map a string of characters into a defined
> single Unicode
> char (here 'e'), but I cannot map the same string of chars into a char
from

> the Private Use Area, e.g.  " cat > \uF500 ; ".
>
> Here's a minimal illustration of the problem:
>
> import com.ibm.icu.text.Transliterator ;
>
> public class Foo {
>
>     public static void main(String[] args) {
>         // This works: String ruleStr = "cat > \u0065;" ;
>
>         // But the following, transliterating to a char in the
> Private Use Area,
>         // does not work
>         String ruleStr = "cat > \uF500;" ;
>
>         String inStr = "tomcat" ;
>
>         Transliterator tr = Transliterator.createFromRules("ID",
ruleStr,

> Transliterator.FORWARD) ;
>         String outStr = tr.transliterate(inStr) ;
>
>         for (int i = 0; i < outStr.length(); i++) {
>             System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ;
>         }
>     }
> }
>
> When ruleStr is defined as "cat > \u0065;", the string "tomcat" is
> transliterated
> to "tome", and the loop at the end prints out
>
> 0x0074
> 0x006F
> 0x006D
> 0x0065
>
> which is exactly what I expected.  But when ruleStr is defined as "cat
> > \uF500;",
> I get the following error message from the
TransliteratorParser.parseRules()

> method:
>
>
> Exception in thread "main" java.lang.IllegalArgumentException:
> Variable range character in rule in " \uF500"
>         at com.ibm.icu.text.TransliteratorParser.
> parseRules(TransliteratorParser.java :1099)
>         at com.ibm.icu.text.TransliteratorParser.
> parse(TransliteratorParser.java:855)
>         at com.ibm.icu.text.Transliterator.
> createFromRules(Transliterator.java:1417)
>         at Foo.main(Foo.java:14)
>
> Is it really the intent of the parseRules() method to prevent the use
> of the Private
> Use Area?  Is there a workaround?
>
> Thanks,
>
> Ken
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe:
https://lists.sourceforge.net/lists/listinfo/icu-support
>
>
>
> --
> Mark
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe:
https://lists.sourceforge.net/lists/listinfo/icu-support


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: Problem: Transliterator and Private Use Area

by Kenneth R Beesley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Yoshito,

MANY thanks for getting into the code and finding a solution.
Much appreciated.  This does seem to solve my problem.

Best wishes,

Ken


On 5/11/07, yoshito_umaoka@... <yoshito_umaoka@...> wrote:

> I looked into the ICU4J transliterator code.  It looks that the
> implementation uses the private use area for transliterator variables, as
> the results, use of code in the range ends up the exception.  The
> implementation allows to change the code range for the variables by a
> rule.  But I cannot find any documentations about the syntax.
>
> Anyway, the syntax should be -
>
> "use variable range <low> <high>"
>
> By default, low = 0xF000 and high = 0xF8FF.  In your example, \uF500 is in
> this default variable range.  When I reproduced the problem with your
> code.  After I changed the rule string to add "use variable..." like
> below-
>
> String ruleStr = "use variable range 0xF000 0xF100; cat > \uF500";
>
> the sample code no longer threw the exception and got the expected result.
>
> I'm not familiar with this and this might be an undocumented feature - but
> it should resolve your current problem at least.
>
>
> -Yoshito
>
>
> icu-support-bounces@... wrote on 05/11/2007 04:35:04 PM:
>
> >  That's odd, since I know that the Indic transliterators use PU
> > codes. You can try grabbing some of their lines and seeing how they
> > differ from what you are doing.
> >
> > Mark
>
> > On 5/11/07, Kenneth Reid Beesley <krbeesley@...> wrote:
> > RE:  Problem:  Transliterator and Private Use Area
> >
> > I'm using ICU4J 3.6.1, Java 1.5, on Mac OS X 10.4.9
> >
> > I'm creating a Transliterator from rules.  I find that I can write rules
> like
> > " cat > \u0065; "  that map a string of characters into a defined
> > single Unicode
> > char (here 'e'), but I cannot map the same string of chars into a char
> from
> > the Private Use Area, e.g.  " cat > \uF500 ; ".
> >
> > Here's a minimal illustration of the problem:
> >
> > import com.ibm.icu.text.Transliterator ;
> >
> > public class Foo {
> >
> >     public static void main(String[] args) {
> >         // This works: String ruleStr = "cat > \u0065;" ;
> >
> >         // But the following, transliterating to a char in the
> > Private Use Area,
> >         // does not work
> >         String ruleStr = "cat > \uF500;" ;
> >
> >         String inStr = "tomcat" ;
> >
> >         Transliterator tr = Transliterator.createFromRules("ID",
> ruleStr,
> > Transliterator.FORWARD) ;
> >         String outStr = tr.transliterate(inStr) ;
> >
> >         for (int i = 0; i < outStr.length(); i++) {
> >             System.out.printf("0x%04X\n", (int) outStr.charAt(i)) ;
> >         }
> >     }
> > }
> >
> > When ruleStr is defined as "cat > \u0065;", the string "tomcat" is
> > transliterated
> > to "tome", and the loop at the end prints out
> >
> > 0x0074
> > 0x006F
> > 0x006D
> > 0x0065
> >
> > which is exactly what I expected.  But when ruleStr is defined as "cat
> > > \uF500;",
> > I get the following error message from the
> TransliteratorParser.parseRules()
> > method:
> >
> >
> > Exception in thread "main" java.lang.IllegalArgumentException:
> > Variable range character in rule in " \uF500"
> >         at com.ibm.icu.text.TransliteratorParser.
> > parseRules(TransliteratorParser.java :1099)
> >         at com.ibm.icu.text.TransliteratorParser.
> > parse(TransliteratorParser.java:855)
> >         at com.ibm.icu.text.Transliterator.
> > createFromRules(Transliterator.java:1417)
> >         at Foo.main(Foo.java:14)
> >
> > Is it really the intent of the parseRules() method to prevent the use
> > of the Private
> > Use Area?  Is there a workaround?
> >
> > Thanks,
> >
> > Ken
> >
> >
> -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > icu-support mailing list - icu-support@...
> > To Un/Subscribe:
> https://lists.sourceforge.net/lists/listinfo/icu-support
> >
> >
> >
> > --
> > Mark
> -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > icu-support mailing list - icu-support@...
> > To Un/Subscribe:
> https://lists.sourceforge.net/lists/listinfo/icu-support
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support