Encoding issue

View: New views
2 Messages — Rating Filter:   Alert me  

Encoding issue

by Pills :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've a problem with JDom in one of my webapps.

It runs under linux centos, Tomcat5.5.27, JDom v1.1, etc.

My customer send me a file which is created like this:

- exported to XML UTF8
- converted to Base64
- POSTed to my webapp. (headers are set to the correct encoding)

I decode it like this:
- get the data
- convert it back from base64
- parse the data with new SAXBuilder().build(...)

After that, when I get strings using "mynode.getChildText("bla")", it is
misencoded, ie: "ü" comes "ä".

I was thinking that JDom will handle all possible conversion himself. I
really don't want to convert extracted strings using
Charset.forName().encode or else....

Any idea on what am I doing wrong?

Thank you very much ;)
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...

RE: Encoding issue

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It looks to me as if SAXBuilder().build() doesn't realize that the data is
in UTF-8 and thinks it is in iso-8859-1. So there's something wrong in the
way data is being passed from the Base64 decoding step to the XML parsing
step. Nothing to do with JDOM.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: jdom-interest-bounces@...
> [mailto:jdom-interest-bounctaes@...] On Behalf Of Piller Sébastien
> Sent: 24 October 2008 11:07
> To: jdom-interest@...
> Subject: [jdom-interest] Encoding issue
>
> I've a problem with JDom in one of my webapps.
>
> It runs under linux centos, Tomcat5.5.27, JDom v1.1, etc.
>
> My customer send me a file which is created like this:
>
> - exported to XML UTF8
> - converted to Base64
> - POSTed to my webapp. (headers are set to the correct encoding)
>
> I decode it like this:
> - get the data
> - convert it back from base64
> - parse the data with new SAXBuilder().build(...)
>
> After that, when I get strings using
> "mynode.getChildText("bla")", it is misencoded, ie: "ü" comes "ä".
>
> I was thinking that JDom will handle all possible conversion
> himself. I really don't want to convert extracted strings
> using Charset.forName().encode or else....
>
> Any idea on what am I doing wrong?
>
> Thank you very much ;)
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@you
> rhost.com


_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...