Character / char oddness

View: New views
3 Messages — Rating Filter:   Alert me  

Character / char oddness

by Sven Haiges-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

I have to convert a small piece of Java to Groovy to include it into a command line groovy script. The part I am working on converts a String into XML, escaping the five XML entities, leaving the ASCII range >= 32 till <= 126 untouched and unicoding &#1234; the rest.

You can paste below code right into groovyConsole to give it a try.

The assertion will fail once I try to encode the ¼ which should be a &#188; according to the old java code. It is definitely not in the ASCII >=32 <=126 range, but still I can see the println 'in ascii' is called, which means the code block

           if (!isXmlEntity && ch >= 32 && ch <= 126)
            {
                output.append(ch)
                println "in ascii ${ch}"
                continue
            }

is being executed.

The question: why? This currently breaks my conversion... am I hitting a Groovy gotcha?

Cheers
Sven



def test = [
    'this is a test' : 'this is a test',
    '<>\'"&' : '&lt;&gt;&apos;&quot;&amp;',
    '¼' : '&#188;',
    '©¼ÇÈÉÊËÐÑßàäç™' : '&#169;&#188;&#199;&#200;&#201;&#202;&#203;&#208;&#209;&#223;&#224;&#228;&#231;&#8482;' //output from original EntityCodec.XML
]

test.each { input, expected ->
    assert (expected == XMLCodec.encode(input))
}

class XMLCodec
{
    static encode = { original ->
   
        if (original == null)
            return null
           
        char[] originalChars = original.toCharArray()
        StringBuffer output = new StringBuffer()
       
        for (char ch: originalChars)
        {
            Character character = new Character(ch);
            def isXmlEntity = false
           
            if ( ch == '&' || ch == '"' || ch == '\'' || ch == '<' || ch == '>')
                isXmlEntity = true         
       
            if (!isXmlEntity && ch >= 32 && ch <= 126)
            {
                output.append(ch)
                println "in ascii ${ch}"
                continue
            }
       
            if (isXmlEntity)
                output.append('&')               
               
            switch(ch)
            {
                case '&': output.append('amp');break
                case '"': output.append('quot');break
                case '\'': output.append('apos');break
                case '<': output.append('lt');break
                case '>': output.append('gt');break 
                default:
                     println "in default"
                     output.append("&#")
                     output.append((int)ch)
                     output.append(';')                                                      
            }
           

            if (isXmlEntity)
                output.append(';')
          
        }

        def result = output.toString();
        println result
        return result      
           
  
    }

}

--
Sven Haiges
sven.haiges@...

Yahoo Messenger / Skype: hansamann
Personal Homepage, Wiki & Blog: http://www.svenhaiges.de

Subscribe to the Grails Podcast:
http://feeds.grailspodcast.com/grailspodcast
http://www.grailspodcast.com

Re: Character / char oddness

by jstell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Sven

I cut-and-pasted your script into a new file (ANSI encoded), ran under Groovy 1.5.7 / JDK 1.6.0_11 and it seemed to work fine:


in ascii t
in ascii h
in ascii i
in ascii s
in ascii
in ascii i
in ascii s
in ascii
in ascii a
in ascii
in ascii t
in ascii e
in ascii s
in ascii t
this is a test
&lt;&gt;&apos;&quot;&amp;
in default
&#188;
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
&#169;&#188;&#199;&#200;&#201;&#202;&#203;&#208;&#209;&#223;&#224;&#228;&#231;&#8482;

Something about your source file encoding?

Jason

Re: Character / char oddness

by Sven Haiges-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jason,

I found the issue now: the default charset/encoding in Mac Os X is MacRoman. Although I saved the .groovy file using UTF-8, something seems to go wrong when it is being read from the file system

By calling groovy --encoding utf-8 script.groovy it worked. It's an annoying Mac OS Specialty...

Thanx for looking into this!

Cheers
Sven

On Wed, Jul 1, 2009 at 6:31 PM, Jason Stell <jstell@...> wrote:
Hi Sven

I cut-and-pasted your script into a new file (ANSI encoded), ran under Groovy 1.5.7 / JDK 1.6.0_11 and it seemed to work fine:


in ascii t
in ascii h
in ascii i
in ascii s
in ascii
in ascii i
in ascii s
in ascii
in ascii a
in ascii
in ascii t
in ascii e
in ascii s
in ascii t

this is a test
&lt;&gt;&apos;&quot;&amp;
in default
&#188;
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default
in default

&#169;&#188;&#199;&#200;&#201;&#202;&#203;&#208;&#209;&#223;&#224;&#228;&#231;&#8482;

Something about your source file encoding?

Jason



--
Sven Haiges
sven.haiges@...

Yahoo Messenger / Skype: hansamann
Personal Homepage, Wiki & Blog: http://www.svenhaiges.de

Subscribe to the Grails Podcast:
http://feeds.grailspodcast.com/grailspodcast
http://www.grailspodcast.com