MarshalProcessor utf-8 bug?

View: New views
5 Messages — Rating Filter:   Alert me  

MarshalProcessor utf-8 bug?

by Magnus Heino :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A xml-message encoded in utf-8 containing åäö routed through this route is
trashed. åäö is not åäö after this route. If I remove the marshal call, the
problem goes away.

this.from("direct:request").marshal(soapDataFormat).convertBodyTo(
String.class).to(requestUri);

To be sure that it wasn't my dataformat that trashed it, I wrote this
processor:

final Processor wrapSoap = new Processor() {

                public void process(Exchange exchange) throws Exception {
                    ByteArrayOutputStream outputStream = new
ByteArrayOutputStream();
                    soapDataFormat.marshal(exchange, new
StringSource((String) exchange.getIn().getBody()), outputStream);
                    exchange.getOut(true).setBody(outputStream.toString
("UTF-8"));

                }
            };

and applied this route:

this.from("direct:request").process(this.wrapSoap).convertBodyTo(
String.class).to(requestUri);

And now things are working...

Looking at org.apache.camel.processor.MarshalProcessor.java that is used in
the marshal()-call, is this really ok to do with utf-8 data?

        dataFormat.marshal(exchange, body, buffer);
        byte[] data = buffer.toByteArray();
        out.setBody(data);

--

/Magnus Heino

Re: MarshalProcessor utf-8 bug?

by davsclaus :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Magnus

I have recently been working on some patches for camel-mina for encodings. During this work I do think the camel codebase currently does not consider encoding when it does convertions to String.

I think camel need to be improved in its core to support encoding parameters for its type conversion framework. Currently the type converters does not support meta parameters such as encoding etc.

In camel-mina I did manage to get by this as I could handle the type conversion in camel-mina and use the provided encoding parameter. However the mock endpoint used for mock testing could no do this and thus a few tests has to be done without the mocking.

James, or any of the core comitters what is your view of this? Is it somehow doable to improve the type conversion framework to support meta parameters such as encoding.

It is quite common in integrations to use byte[] and String objects when passing data around and thus the encoding is important.

/Claus

Magnus Heino wrote:
A xml-message encoded in utf-8 containing åäö routed through this route is
trashed. åäö is not åäö after this route. If I remove the marshal call, the
problem goes away.

this.from("direct:request").marshal(soapDataFormat).convertBodyTo(
String.class).to(requestUri);

To be sure that it wasn't my dataformat that trashed it, I wrote this
processor:

final Processor wrapSoap = new Processor() {

                public void process(Exchange exchange) throws Exception {
                    ByteArrayOutputStream outputStream = new
ByteArrayOutputStream();
                    soapDataFormat.marshal(exchange, new
StringSource((String) exchange.getIn().getBody()), outputStream);
                    exchange.getOut(true).setBody(outputStream.toString
("UTF-8"));

                }
            };

and applied this route:

this.from("direct:request").process(this.wrapSoap).convertBodyTo(
String.class).to(requestUri);

And now things are working...

Looking at org.apache.camel.processor.MarshalProcessor.java that is used in
the marshal()-call, is this really ok to do with utf-8 data?

        dataFormat.marshal(exchange, body, buffer);
        byte[] data = buffer.toByteArray();
        out.setBody(data);

--

/Magnus Heino

Re: MarshalProcessor utf-8 bug?

by Magnus Heino :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> James, or any of the core comitters what is your view of this? Is it
> somehow
> doable to improve the type conversion framework to support meta parameters
> such as encoding.
>
> It is quite common in integrations to use byte[] and String objects when
> passing data around and thus the encoding is important.
>

Well, the current lack of support for encodings makes you not trust what
Camel does with your data. The type conversion framework is great, but since
it's not respecting the encoding of the data, in practice, its like
supporting ASCII only.

--

/Magnus Heino

Re: MarshalProcessor utf-8 bug?

by SoaMattH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a problem where in my in coming XML messages I have  °  symbols these are marking
degrees on lat and long values. I am using version camel 2.0.0

I have specified my data format as follows

<camel:dataFormats>
      <camel:jaxb encoding="uft-8"
                  id="incidentJaxb"
                  prettyPrint="true"
                  contextPath="au.bla.bla.bla.bla.bla.bla.incident" />
    </camel:dataFormats>
and call it in my route:
<camel:unmarshal ref="incidentJaxb" />

Thousands of messages process with out failure except the ones with the
invalid ° character they end up in to my dead letter queue

I suppose my question is, is encoding still a problem in camel as per this thread
from earlier in the year?

Thanks Matt




Magnus Heino wrote:
A xml-message encoded in utf-8 containing åäö routed through this route is
trashed. åäö is not åäö after this route. If I remove the marshal call, the
problem goes away.

this.from("direct:request").marshal(soapDataFormat).convertBodyTo(
String.class).to(requestUri);

To be sure that it wasn't my dataformat that trashed it, I wrote this
processor:

final Processor wrapSoap = new Processor() {

                public void process(Exchange exchange) throws Exception {
                    ByteArrayOutputStream outputStream = new
ByteArrayOutputStream();
                    soapDataFormat.marshal(exchange, new
StringSource((String) exchange.getIn().getBody()), outputStream);
                    exchange.getOut(true).setBody(outputStream.toString
("UTF-8"));

                }
            };

and applied this route:

this.from("direct:request").process(this.wrapSoap).convertBodyTo(
String.class).to(requestUri);

And now things are working...

Looking at org.apache.camel.processor.MarshalProcessor.java that is used in
the marshal()-call, is this really ok to do with utf-8 data?

        dataFormat.marshal(exchange, body, buffer);
        byte[] data = buffer.toByteArray();
        out.setBody(data);

--

/Magnus Heino
/* ----------------------
** Matt Hannay
** Unix Java C
** Software Engineer
** ------------------- */

Re: MarshalProcessor utf-8 bug?

by Claus Ibsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

Can you create a JIRA ticket and attach a small example with one of
your XML file that has this invalid character.
Then we can use that as a base for a fix and unit test.

On Wed, Sep 23, 2009 at 1:02 AM, SoaMattH <matthew@...> wrote:

>
> I have a problem where in my in coming XML messages I have  °  symbols these
> are marking
> degrees on lat and long values. I am using version camel 2.0.0
>
> I have specified my data format as follows
>
> <camel:dataFormats>
>      <camel:jaxb encoding="uft-8"
>                  id="incidentJaxb"
>                  prettyPrint="true"
>                  contextPath="au.bla.bla.bla.bla.bla.bla.incident" />
>    </camel:dataFormats>
> and call it in my route:
> <camel:unmarshal ref="incidentJaxb" />
>
> Thousands of messages process with out failure except the ones with the
> invalid ° character they end up in to my dead letter queue
>
> I suppose my question is, is encoding still a problem in camel as per this
> thread
> from earlier in the year?
>
> Thanks Matt
>
>
>
>
>
> Magnus Heino wrote:
>>
>> A xml-message encoded in utf-8 containing åäö routed through this route is
>> trashed. åäö is not åäö after this route. If I remove the marshal call,
>> the
>> problem goes away.
>>
>> this.from("direct:request").marshal(soapDataFormat).convertBodyTo(
>> String.class).to(requestUri);
>>
>> To be sure that it wasn't my dataformat that trashed it, I wrote this
>> processor:
>>
>> final Processor wrapSoap = new Processor() {
>>
>>                 public void process(Exchange exchange) throws Exception {
>>                     ByteArrayOutputStream outputStream = new
>> ByteArrayOutputStream();
>>                     soapDataFormat.marshal(exchange, new
>> StringSource((String) exchange.getIn().getBody()), outputStream);
>>                     exchange.getOut(true).setBody(outputStream.toString
>> ("UTF-8"));
>>
>>                 }
>>             };
>>
>> and applied this route:
>>
>> this.from("direct:request").process(this.wrapSoap).convertBodyTo(
>> String.class).to(requestUri);
>>
>> And now things are working...
>>
>> Looking at org.apache.camel.processor.MarshalProcessor.java that is used
>> in
>> the marshal()-call, is this really ok to do with utf-8 data?
>>
>>         dataFormat.marshal(exchange, body, buffer);
>>         byte[] data = buffer.toByteArray();
>>         out.setBody(data);
>>
>> --
>>
>> /Magnus Heino
>>
>>
>
> --
> View this message in context: http://www.nabble.com/MarshalProcessor-utf-8-bug--tp16037986p25530930.html
> Sent from the Camel - Users (activemq) mailing list archive at Nabble.com.
>
>



--
Claus Ibsen
Apache Camel Committer

Open Source Integration: http://fusesource.com
Blog: http://davsclaus.blogspot.com/
Twitter: http://twitter.com/davsclaus