Proper parsing of InputStreams with encoding attribute in XML prologue.

View: New views
6 Messages — Rating Filter:   Alert me  

Proper parsing of InputStreams with encoding attribute in XML prologue.

by Dawid Weiss-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi there,

We are using SimpleXML in Carrot2 (Carrot2.org), it's great stuff, thanks.

We would like to point at one potential issue that could be easily
fixed: currently, the Persister class forces
UTF-8 encoding on byte streams, which may not be correct (let's say
the XML is provided by users and not generated by SimpleXML). The
relevant section of Persister could be easily fixed to rely on the XML
parser's character detection and decoding. So:

    public <T> T read(Class<? extends T> type, InputStream source)
throws Exception {
-      return (T)read(type, source, "utf-8");
+      return (T)read(type, NodeBuilder.read(source));
    }

and then in NodeBuilder add a method that relies on the XML parser's decoding:

+   public static InputNode read(InputStream source) throws Exception {
+      return read(factory.createXMLEventReader(source));
+   }

The entire patch against 2.1.4 is attached. We would appreciate if
this could be integrated into the next version (or at least if
NodeBuilder's read(XMLEventReader) could be made public to allow
alternative parsing strategies):

private static InputNode read(XMLEventReader source) throws Exception {

Dawid

[byte-streams.patch]

diff --git a/src/org/simpleframework/xml/core/Persister.java b/src/org/simpleframework/xml/core/Persister.java
index 43e2c91..eac218d 100644
--- a/src/org/simpleframework/xml/core/Persister.java
+++ b/src/org/simpleframework/xml/core/Persister.java
@@ -451,7 +451,7 @@ public class Persister implements Serializer {
     * @throws Exception if the object cannot be fully deserialized
     */
    public <T> T read(Class<? extends T> type, InputStream source) throws Exception {
-      return (T)read(type, source, "utf-8");          
+      return (T)read(type, NodeBuilder.read(source));          
    }
   
    /**
@@ -464,6 +464,8 @@ public class Persister implements Serializer {
     * @param type this is the class type to be deserialized from XML
     * @param source this provides the source of the XML document
     * @param charset this is the character set to read the XML with
+    *        (if the encoding is unknown, you should rely on the XML parser
+    *        instead -- see {@link #read(Class, InputStream)}).
     *
     * @return the object deserialized from the XML document
     *
diff --git a/src/org/simpleframework/xml/stream/NodeBuilder.java b/src/org/simpleframework/xml/stream/NodeBuilder.java
index e855002..238dcad 100644
--- a/src/org/simpleframework/xml/stream/NodeBuilder.java
+++ b/src/org/simpleframework/xml/stream/NodeBuilder.java
@@ -22,8 +22,7 @@ package org.simpleframework.xml.stream;
 
 import javax.xml.stream.XMLInputFactory;
 import javax.xml.stream.XMLEventReader;
-import java.io.Reader;
-import java.io.Writer;
+import java.io.*;
 
 /**
  * The <code>NodeBuilder</code> object is used to create either an
@@ -69,6 +68,19 @@ public final class NodeBuilder {
     * @param source this contains the contents of the XML source
     *
     * @throws Exception thrown if there is an I/O exception
+    */  
+   public static InputNode read(InputStream source) throws Exception {
+      return read(factory.createXMLEventReader(source));  
+   }
+
+   /**
+    * This is used to create an <code>InputNode</code> that can be
+    * used to read XML from the specified reader. The reader will
+    * be positioned at the root element in the XML document.
+    *
+    * @param source this contains the contents of the XML source
+    *
+    * @throws Exception thrown if there is an I/O exception
     */    
    private static InputNode read(XMLEventReader source) throws Exception {
       return new NodeReader(source).readRoot();          


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Simple-support mailing list
Simple-support@...
https://lists.sourceforge.net/lists/listinfo/simple-support

Re: Proper parsing of InputStreams with encoding attribute in XML prologue.

by niall.gallagher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sounds good, ill add this in.


Niall Gallagher
RBS Global Banking & Markets
Office: +44 7879498724  

-----Original Message-----
From: Dawid Weiss [mailto:dawid.weiss@...]
Sent: 29 October 2009 18:36
To: simple-support@...
Subject: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.

Hi there,

We are using SimpleXML in Carrot2 (Carrot2.org), it's great stuff, thanks.

We would like to point at one potential issue that could be easily
fixed: currently, the Persister class forces
UTF-8 encoding on byte streams, which may not be correct (let's say the XML is provided by users and not generated by SimpleXML). The relevant section of Persister could be easily fixed to rely on the XML parser's character detection and decoding. So:

    public <T> T read(Class<? extends T> type, InputStream source) throws Exception {
-      return (T)read(type, source, "utf-8");
+      return (T)read(type, NodeBuilder.read(source));
    }

and then in NodeBuilder add a method that relies on the XML parser's decoding:

+   public static InputNode read(InputStream source) throws Exception {
+      return read(factory.createXMLEventReader(source));
+   }

The entire patch against 2.1.4 is attached. We would appreciate if this could be integrated into the next version (or at least if NodeBuilder's read(XMLEventReader) could be made public to allow alternative parsing strategies):

private static InputNode read(XMLEventReader source) throws Exception {

Dawid

***********************************************************************************
The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised and regulated by the Financial Services Authority.
 
This e-mail message is confidential and for use by the
addressee only. If the message is received by anyone other
than the addressee, please return the message to the sender
by replying to it and then delete the message from your
computer. Internet e-mails are not necessarily secure. The
Royal Bank of Scotland plc does not accept responsibility for
changes made to this message after it was sent.

Whilst all reasonable care has been taken to avoid the
transmission of viruses, it is the responsibility of the recipient to
ensure that the onward transmission, opening or use of this
message and any attachments will not adversely affect its
systems or data. No responsibility is accepted by The
Royal Bank of Scotland plc in this regard and the recipient should carry
out such virus and other checks as it considers appropriate.

Visit our website at www.rbs.com

***********************************************************************************


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Simple-support mailing list
Simple-support@...
https://lists.sourceforge.net/lists/listinfo/simple-support

Re: Proper parsing of InputStreams with encoding attribute in XML prologue.

by Dawid Weiss-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks Niall.

Dawid

On Fri, Oct 30, 2009 at 10:04 AM,  <niall.gallagher@...> wrote:

> Sounds good, ill add this in.
>
>
> Niall Gallagher
> RBS Global Banking & Markets
> Office: +44 7879498724
>
> -----Original Message-----
> From: Dawid Weiss [mailto:dawid.weiss@...]
> Sent: 29 October 2009 18:36
> To: simple-support@...
> Subject: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.
>
> Hi there,
>
> We are using SimpleXML in Carrot2 (Carrot2.org), it's great stuff, thanks.
>
> We would like to point at one potential issue that could be easily
> fixed: currently, the Persister class forces
> UTF-8 encoding on byte streams, which may not be correct (let's say the XML is provided by users and not generated by SimpleXML). The relevant section of Persister could be easily fixed to rely on the XML parser's character detection and decoding. So:
>
>    public <T> T read(Class<? extends T> type, InputStream source) throws Exception {
> -      return (T)read(type, source, "utf-8");
> +      return (T)read(type, NodeBuilder.read(source));
>    }
>
> and then in NodeBuilder add a method that relies on the XML parser's decoding:
>
> +   public static InputNode read(InputStream source) throws Exception {
> +      return read(factory.createXMLEventReader(source));
> +   }
>
> The entire patch against 2.1.4 is attached. We would appreciate if this could be integrated into the next version (or at least if NodeBuilder's read(XMLEventReader) could be made public to allow alternative parsing strategies):
>
> private static InputNode read(XMLEventReader source) throws Exception {
>
> Dawid
>
> ***********************************************************************************
> The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
> Authorised and regulated by the Financial Services Authority.
>
> This e-mail message is confidential and for use by the
> addressee only. If the message is received by anyone other
> than the addressee, please return the message to the sender
> by replying to it and then delete the message from your
> computer. Internet e-mails are not necessarily secure. The
> Royal Bank of Scotland plc does not accept responsibility for
> changes made to this message after it was sent.
>
> Whilst all reasonable care has been taken to avoid the
> transmission of viruses, it is the responsibility of the recipient to
> ensure that the onward transmission, opening or use of this
> message and any attachments will not adversely affect its
> systems or data. No responsibility is accepted by The
> Royal Bank of Scotland plc in this regard and the recipient should carry
> out such virus and other checks as it considers appropriate.
>
> Visit our website at www.rbs.com
>
> ***********************************************************************************
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Simple-support mailing list
Simple-support@...
https://lists.sourceforge.net/lists/listinfo/simple-support

Re: Proper parsing of InputStreams with encoding attribute in XML prologue.

by Dawid Weiss-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Niall,

I see a new version of Simple is out and, sadly, it does not contain
this forced-UTF-8 patch. Is there any schedule on when this could be
integrated? Our stuff sort of depends on this change and we'd hate to
fork and maintain SimpleXML internally just for two lines of code...

Dawid

On Fri, Oct 30, 2009 at 10:04 AM,  <niall.gallagher@...> wrote:

> Sounds good, ill add this in.
>
>
> Niall Gallagher
> RBS Global Banking & Markets
> Office: +44 7879498724
>
> -----Original Message-----
> From: Dawid Weiss [mailto:dawid.weiss@...]
> Sent: 29 October 2009 18:36
> To: simple-support@...
> Subject: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.
>
> Hi there,
>
> We are using SimpleXML in Carrot2 (Carrot2.org), it's great stuff, thanks.
>
> We would like to point at one potential issue that could be easily
> fixed: currently, the Persister class forces
> UTF-8 encoding on byte streams, which may not be correct (let's say the XML is provided by users and not generated by SimpleXML). The relevant section of Persister could be easily fixed to rely on the XML parser's character detection and decoding. So:
>
>    public <T> T read(Class<? extends T> type, InputStream source) throws Exception {
> -      return (T)read(type, source, "utf-8");
> +      return (T)read(type, NodeBuilder.read(source));
>    }
>
> and then in NodeBuilder add a method that relies on the XML parser's decoding:
>
> +   public static InputNode read(InputStream source) throws Exception {
> +      return read(factory.createXMLEventReader(source));
> +   }
>
> The entire patch against 2.1.4 is attached. We would appreciate if this could be integrated into the next version (or at least if NodeBuilder's read(XMLEventReader) could be made public to allow alternative parsing strategies):
>
> private static InputNode read(XMLEventReader source) throws Exception {
>
> Dawid
>
> ***********************************************************************************
> The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
> Authorised and regulated by the Financial Services Authority.
>
> This e-mail message is confidential and for use by the
> addressee only. If the message is received by anyone other
> than the addressee, please return the message to the sender
> by replying to it and then delete the message from your
> computer. Internet e-mails are not necessarily secure. The
> Royal Bank of Scotland plc does not accept responsibility for
> changes made to this message after it was sent.
>
> Whilst all reasonable care has been taken to avoid the
> transmission of viruses, it is the responsibility of the recipient to
> ensure that the onward transmission, opening or use of this
> message and any attachments will not adversely affect its
> systems or data. No responsibility is accepted by The
> Royal Bank of Scotland plc in this regard and the recipient should carry
> out such virus and other checks as it considers appropriate.
>
> Visit our website at www.rbs.com
>
> ***********************************************************************************
>
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Simple-support mailing list
Simple-support@...
https://lists.sourceforge.net/lists/listinfo/simple-support

Re: Proper parsing of InputStreams with encoding attribute in XML prologue.

by niall.gallagher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I did not change the method for InputStream on this one, however I did add NodeBuilder.read(InputStream) to return an InputNode from the stream. I need to test a bit more before I change to this in the persister. It should be possible to use NodeBuilder.read(InputStream) to create an InputNode you can pass to the Persister.read(Class,InputNode) method. Ill try get it in for the next release.

Niall


Niall Gallagher
RBS Global Banking & Markets
Office: +44 2070851454  

-----Original Message-----
From: Dawid Weiss [mailto:dawid.weiss@...]
Sent: 23 November 2009 10:13
To: GALLAGHER, Niall, GBM
Cc: simple-support@...
Subject: Re: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.

Hi Niall,

I see a new version of Simple is out and, sadly, it does not contain this forced-UTF-8 patch. Is there any schedule on when this could be integrated? Our stuff sort of depends on this change and we'd hate to fork and maintain SimpleXML internally just for two lines of code...

Dawid

On Fri, Oct 30, 2009 at 10:04 AM,  <niall.gallagher@...> wrote:

> Sounds good, ill add this in.
>
>
> Niall Gallagher
> RBS Global Banking & Markets
> Office: +44 7879498724
>
> -----Original Message-----
> From: Dawid Weiss [mailto:dawid.weiss@...]
> Sent: 29 October 2009 18:36
> To: simple-support@...
> Subject: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.
>
> Hi there,
>
> We are using SimpleXML in Carrot2 (Carrot2.org), it's great stuff, thanks.
>
> We would like to point at one potential issue that could be easily
> fixed: currently, the Persister class forces
> UTF-8 encoding on byte streams, which may not be correct (let's say the XML is provided by users and not generated by SimpleXML). The relevant section of Persister could be easily fixed to rely on the XML parser's character detection and decoding. So:
>
>    public <T> T read(Class<? extends T> type, InputStream source)
> throws Exception {
> -      return (T)read(type, source, "utf-8");
> +      return (T)read(type, NodeBuilder.read(source));
>    }
>
> and then in NodeBuilder add a method that relies on the XML parser's decoding:
>
> +   public static InputNode read(InputStream source) throws Exception
> + {
> +      return read(factory.createXMLEventReader(source));
> +   }
>
> The entire patch against 2.1.4 is attached. We would appreciate if this could be integrated into the next version (or at least if NodeBuilder's read(XMLEventReader) could be made public to allow alternative parsing strategies):
>
> private static InputNode read(XMLEventReader source) throws Exception
> {
>
> Dawid
>
> **********************************************************************
> ************* The Royal Bank of Scotland plc. Registered in Scotland
> No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
> Authorised and regulated by the Financial Services Authority.
>
> This e-mail message is confidential and for use by the addressee only.
> If the message is received by anyone other than the addressee, please
> return the message to the sender by replying to it and then delete the
> message from your computer. Internet e-mails are not necessarily
> secure. The Royal Bank of Scotland plc does not accept responsibility
> for changes made to this message after it was sent.
>
> Whilst all reasonable care has been taken to avoid the transmission of
> viruses, it is the responsibility of the recipient to ensure that the
> onward transmission, opening or use of this message and any
> attachments will not adversely affect its systems or data. No
> responsibility is accepted by The Royal Bank of Scotland plc in this
> regard and the recipient should carry out such virus and other checks
> as it considers appropriate.
>
> Visit our website at www.rbs.com
>
> **********************************************************************
> *************
>
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Simple-support mailing list
Simple-support@...
https://lists.sourceforge.net/lists/listinfo/simple-support

Re: Proper parsing of InputStreams with encoding attribute in XML prologue.

by Dawid Weiss-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks. I really don't see how it could break things (unless you have
a broken XML parser in Java, which may occasionally happen,
unfortunately).

 I'll look into the public method you mentioned -- should do for us
for the moment.

Dawid

On Mon, Nov 23, 2009 at 11:16 AM,  <niall.gallagher@...> wrote:

> Hi,
>
> I did not change the method for InputStream on this one, however I did add NodeBuilder.read(InputStream) to return an InputNode from the stream. I need to test a bit more before I change to this in the persister. It should be possible to use NodeBuilder.read(InputStream) to create an InputNode you can pass to the Persister.read(Class,InputNode) method. Ill try get it in for the next release.
>
> Niall
>
>
> Niall Gallagher
> RBS Global Banking & Markets
> Office: +44 2070851454
>
> -----Original Message-----
> From: Dawid Weiss [mailto:dawid.weiss@...]
> Sent: 23 November 2009 10:13
> To: GALLAGHER, Niall, GBM
> Cc: simple-support@...
> Subject: Re: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.
>
> Hi Niall,
>
> I see a new version of Simple is out and, sadly, it does not contain this forced-UTF-8 patch. Is there any schedule on when this could be integrated? Our stuff sort of depends on this change and we'd hate to fork and maintain SimpleXML internally just for two lines of code...
>
> Dawid
>
> On Fri, Oct 30, 2009 at 10:04 AM,  <niall.gallagher@...> wrote:
>> Sounds good, ill add this in.
>>
>>
>> Niall Gallagher
>> RBS Global Banking & Markets
>> Office: +44 7879498724
>>
>> -----Original Message-----
>> From: Dawid Weiss [mailto:dawid.weiss@...]
>> Sent: 29 October 2009 18:36
>> To: simple-support@...
>> Subject: [Simple-support] Proper parsing of InputStreams with encoding attribute in XML prologue.
>>
>> Hi there,
>>
>> We are using SimpleXML in Carrot2 (Carrot2.org), it's great stuff, thanks.
>>
>> We would like to point at one potential issue that could be easily
>> fixed: currently, the Persister class forces
>> UTF-8 encoding on byte streams, which may not be correct (let's say the XML is provided by users and not generated by SimpleXML). The relevant section of Persister could be easily fixed to rely on the XML parser's character detection and decoding. So:
>>
>>    public <T> T read(Class<? extends T> type, InputStream source)
>> throws Exception {
>> -      return (T)read(type, source, "utf-8");
>> +      return (T)read(type, NodeBuilder.read(source));
>>    }
>>
>> and then in NodeBuilder add a method that relies on the XML parser's decoding:
>>
>> +   public static InputNode read(InputStream source) throws Exception
>> + {
>> +      return read(factory.createXMLEventReader(source));
>> +   }
>>
>> The entire patch against 2.1.4 is attached. We would appreciate if this could be integrated into the next version (or at least if NodeBuilder's read(XMLEventReader) could be made public to allow alternative parsing strategies):
>>
>> private static InputNode read(XMLEventReader source) throws Exception
>> {
>>
>> Dawid
>>
>> **********************************************************************
>> ************* The Royal Bank of Scotland plc. Registered in Scotland
>> No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
>> Authorised and regulated by the Financial Services Authority.
>>
>> This e-mail message is confidential and for use by the addressee only.
>> If the message is received by anyone other than the addressee, please
>> return the message to the sender by replying to it and then delete the
>> message from your computer. Internet e-mails are not necessarily
>> secure. The Royal Bank of Scotland plc does not accept responsibility
>> for changes made to this message after it was sent.
>>
>> Whilst all reasonable care has been taken to avoid the transmission of
>> viruses, it is the responsibility of the recipient to ensure that the
>> onward transmission, opening or use of this message and any
>> attachments will not adversely affect its systems or data. No
>> responsibility is accepted by The Royal Bank of Scotland plc in this
>> regard and the recipient should carry out such virus and other checks
>> as it considers appropriate.
>>
>> Visit our website at www.rbs.com
>>
>> **********************************************************************
>> *************
>>
>>
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Simple-support mailing list
Simple-support@...
https://lists.sourceforge.net/lists/listinfo/simple-support