[jira] Created: (ABDERA-251) Charset issue in FOMDiv.getInternalValue() leads to corrupt return value on non-ASCII platforms

View: New views
2 Messages — Rating Filter:   Alert me  

[jira] Created: (ABDERA-251) Charset issue in FOMDiv.getInternalValue() leads to corrupt return value on non-ASCII platforms

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Charset issue in FOMDiv.getInternalValue() leads to corrupt return value on non-ASCII platforms
-----------------------------------------------------------------------------------------------

                 Key: ABDERA-251
                 URL: https://issues.apache.org/jira/browse/ABDERA-251
             Project: Abdera
          Issue Type: Bug
    Affects Versions: 0.4.0, 1.0
         Environment: z/OS
            Reporter: Robin Fernandes


In org.apache.abdera.parser.stax.FOMDiv.getInternalValue(), the content of the div is obtained as a byte array using an XMLStreamWriter.
The content of the byte array is then converted to String using the default platform encoding (using ByteArrayOutputStream.toString()), which may not be compatible with the encoding used by the XMLStreamWriter.

A scenario in which this is problematic is if the XMLStreamWriter uses UTF8 (which is the default behaviour), but FOMDiv.getInternalValue() is invoked on z/OS where the platform encoding is a flavour EBCDIC. In this situation, the method returns garbage.

Here's a suggested patch which ensures the XMLStreamWriter writes directly to a StringWriter, so no 'bytes to String' conversion is required in FOMDiv, and therefore no transcoding issues arise. The patch also remove seemingly unnecessary calls to XMLStreamWriter.writeStartElement() and XMLStreamWriter.writeEndElement().

Index: parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java
===================================================================
--- parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (revision 834082)
+++ parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (working copy)
@@ -17,7 +17,7 @@
 */
 package org.apache.abdera.parser.stax;
 
-import java.io.ByteArrayOutputStream;
+import java.io.StringWriter;
 import java.util.Iterator;
 
 import javax.xml.namespace.QName;
@@ -143,16 +143,14 @@
 
   protected String getInternalValue() {
     try {
-      ByteArrayOutputStream out = new ByteArrayOutputStream();
+      StringWriter out = new StringWriter();
       XMLStreamWriter writer =
         XMLOutputFactory.newInstance().createXMLStreamWriter(out);
-      writer.writeStartElement("");
       for (Iterator<?> nodes = this.getChildren(); nodes.hasNext();) {
         OMNode node = (OMNode) nodes.next();
         node.serialize(writer);
       }
-      writer.writeEndElement();
-      return out.toString().substring(2);
+      return out.getBuffer().toString();
     } catch (Exception e) {}
     return "";
   }



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ABDERA-251) Charset issue in FOMDiv.getInternalValue() leads to corrupt return value on non-ASCII platforms

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/ABDERA-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Fernandes updated ABDERA-251:
-----------------------------------

    Description:
In org.apache.abdera.parser.stax.FOMDiv.getInternalValue(), the content of the div is obtained as a byte array using an XMLStreamWriter.
The content of the byte array is then converted to String using the default platform encoding (using ByteArrayOutputStream.toString()), which may not be compatible with the encoding used by the XMLStreamWriter.

A scenario in which this is problematic is if the XMLStreamWriter uses UTF8 (which is the default behaviour), but FOMDiv.getInternalValue() is invoked on z/OS where the platform encoding is a flavour of EBCDIC. In this situation, the method returns garbage.

Here's a suggested patch which ensures the XMLStreamWriter writes directly to a StringWriter, so no 'bytes to String' conversion is required in FOMDiv, and therefore no transcoding issues arise. The patch also remove seemingly unnecessary calls to XMLStreamWriter.writeStartElement() and XMLStreamWriter.writeEndElement().

Index: parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java
===================================================================
--- parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (revision 834082)
+++ parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (working copy)
@@ -17,7 +17,7 @@
 */
 package org.apache.abdera.parser.stax;
 
-import java.io.ByteArrayOutputStream;
+import java.io.StringWriter;
 import java.util.Iterator;
 
 import javax.xml.namespace.QName;
@@ -143,16 +143,14 @@
 
   protected String getInternalValue() {
     try {
-      ByteArrayOutputStream out = new ByteArrayOutputStream();
+      StringWriter out = new StringWriter();
       XMLStreamWriter writer =
         XMLOutputFactory.newInstance().createXMLStreamWriter(out);
-      writer.writeStartElement("");
       for (Iterator<?> nodes = this.getChildren(); nodes.hasNext();) {
         OMNode node = (OMNode) nodes.next();
         node.serialize(writer);
       }
-      writer.writeEndElement();
-      return out.toString().substring(2);
+      return out.getBuffer().toString();
     } catch (Exception e) {}
     return "";
   }



  was:
In org.apache.abdera.parser.stax.FOMDiv.getInternalValue(), the content of the div is obtained as a byte array using an XMLStreamWriter.
The content of the byte array is then converted to String using the default platform encoding (using ByteArrayOutputStream.toString()), which may not be compatible with the encoding used by the XMLStreamWriter.

A scenario in which this is problematic is if the XMLStreamWriter uses UTF8 (which is the default behaviour), but FOMDiv.getInternalValue() is invoked on z/OS where the platform encoding is a flavour EBCDIC. In this situation, the method returns garbage.

Here's a suggested patch which ensures the XMLStreamWriter writes directly to a StringWriter, so no 'bytes to String' conversion is required in FOMDiv, and therefore no transcoding issues arise. The patch also remove seemingly unnecessary calls to XMLStreamWriter.writeStartElement() and XMLStreamWriter.writeEndElement().

Index: parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java
===================================================================
--- parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (revision 834082)
+++ parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (working copy)
@@ -17,7 +17,7 @@
 */
 package org.apache.abdera.parser.stax;
 
-import java.io.ByteArrayOutputStream;
+import java.io.StringWriter;
 import java.util.Iterator;
 
 import javax.xml.namespace.QName;
@@ -143,16 +143,14 @@
 
   protected String getInternalValue() {
     try {
-      ByteArrayOutputStream out = new ByteArrayOutputStream();
+      StringWriter out = new StringWriter();
       XMLStreamWriter writer =
         XMLOutputFactory.newInstance().createXMLStreamWriter(out);
-      writer.writeStartElement("");
       for (Iterator<?> nodes = this.getChildren(); nodes.hasNext();) {
         OMNode node = (OMNode) nodes.next();
         node.serialize(writer);
       }
-      writer.writeEndElement();
-      return out.toString().substring(2);
+      return out.getBuffer().toString();
     } catch (Exception e) {}
     return "";
   }




> Charset issue in FOMDiv.getInternalValue() leads to corrupt return value on non-ASCII platforms
> -----------------------------------------------------------------------------------------------
>
>                 Key: ABDERA-251
>                 URL: https://issues.apache.org/jira/browse/ABDERA-251
>             Project: Abdera
>          Issue Type: Bug
>    Affects Versions: 0.4.0, 1.0
>         Environment: z/OS
>            Reporter: Robin Fernandes
>
> In org.apache.abdera.parser.stax.FOMDiv.getInternalValue(), the content of the div is obtained as a byte array using an XMLStreamWriter.
> The content of the byte array is then converted to String using the default platform encoding (using ByteArrayOutputStream.toString()), which may not be compatible with the encoding used by the XMLStreamWriter.
> A scenario in which this is problematic is if the XMLStreamWriter uses UTF8 (which is the default behaviour), but FOMDiv.getInternalValue() is invoked on z/OS where the platform encoding is a flavour of EBCDIC. In this situation, the method returns garbage.
> Here's a suggested patch which ensures the XMLStreamWriter writes directly to a StringWriter, so no 'bytes to String' conversion is required in FOMDiv, and therefore no transcoding issues arise. The patch also remove seemingly unnecessary calls to XMLStreamWriter.writeStartElement() and XMLStreamWriter.writeEndElement().
> Index: parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java
> ===================================================================
> --- parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (revision 834082)
> +++ parser/src/main/java/org/apache/abdera/parser/stax/FOMDiv.java (working copy)
> @@ -17,7 +17,7 @@
>  */
>  package org.apache.abdera.parser.stax;
>  
> -import java.io.ByteArrayOutputStream;
> +import java.io.StringWriter;
>  import java.util.Iterator;
>  
>  import javax.xml.namespace.QName;
> @@ -143,16 +143,14 @@
>  
>    protected String getInternalValue() {
>      try {
> -      ByteArrayOutputStream out = new ByteArrayOutputStream();
> +      StringWriter out = new StringWriter();
>        XMLStreamWriter writer =
>          XMLOutputFactory.newInstance().createXMLStreamWriter(out);
> -      writer.writeStartElement("");
>        for (Iterator<?> nodes = this.getChildren(); nodes.hasNext();) {
>          OMNode node = (OMNode) nodes.next();
>          node.serialize(writer);
>        }
> -      writer.writeEndElement();
> -      return out.toString().substring(2);
> +      return out.getBuffer().toString();
>      } catch (Exception e) {}
>      return "";
>    }

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.