Changing non-standard ASCII character casing (UTF-8)

View: New views
3 Messages — Rating Filter:   Alert me  

Changing non-standard ASCII character casing (UTF-8)

by Matthew Ueckerman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm looking for some information on changing the casing of non-standard ASCII UTF-8 characters in JRuby.

In MRI, it appears the unicode gem is the recommended approach for this:
http://ideaharbor.org/notes/technical/working-with-unicode-in-ruby/

Is there a recommended approach in JRuby?

Matthew

Re: Changing non-standard ASCII character casing (UTF-8)

by Matthew Ueckerman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Fortunately I've discovered a way to do this.  Unfortunately it requires explicit use of Java.

>> s = "Café"
=> "Café"
>> s.upcase
=> "CAFé"
>> s.to_java_string.to_upper_case
=> "CAFé"
>> java.lang.String.new(s).to_upper_case
=> "CAFÉ"

It's intriguing that converting from a Ruby to a Java String does not behave the same way as creating a Java String.
Furthermore;

>> s.to_java_string.to_s
=> "Café"

Suggests that the encoding of the converted String is incorrect.

Regards,

Matthew Ueckerman

Matthew Ueckerman wrote:
I'm looking for some information on changing the casing of non-standard ASCII UTF-8 characters in JRuby.

In MRI, it appears the unicode gem is the recommended approach for this:
http://ideaharbor.org/notes/technical/working-with-unicode-in-ruby/

Is there a recommended approach in JRuby?

Matthew

Re: Changing non-standard ASCII character casing (UTF-8)

by Charles Oliver Nutter-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 26, 2009 at 2:50 AM, Matthew Ueckerman<matthew@...> wrote:
>>> s.to_java_string.to_upper_case
> => "CAFé"
>>> java.lang.String.new(s).to_upper_case
> => "CAFÉ"

This looks like a bug in to_java_string. It's probably not assuming
UTF-8 when pulling in the string. Can you toss this into a bug?

It might also be fun to look at implementing what the "unicode" gem
does for JRuby, since really all the unicode facilities should "just
be there".

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email