Possible Bug/Feature with SAXBuilder's setExpandEntities()

View: New views
1 Messages — Rating Filter:   Alert me  

Possible Bug/Feature with SAXBuilder's setExpandEntities()

by David Wang-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I'm not sure if this is a bug or a feature, but I thought I would report it anyway... I have attached (also reproduced below) a simple example that illustrates the problem. I have tested this with Java 1.6EE, and JDOM's Jan 9th, 2009 nightly build as well as the standard 1.1 release.

In this example, I am trying to prevent the expansion of the entity "−" in an XHTML document that is being read in and then immediately written out. I create an instance of SAXBuilder, setExpandEntities(false), then call the build() method on an input XHTML doc. For simplicity, I then use an instance of XMLOutputter to print the parsed document to standard out (Even though I don't think it's necessary for standard out, I also make sure the encoding is consistent between the Format and the OutputStream and that it is a common "US-ASCII" format).

The original XHTML document uses the entity:
−

But, the resulting XHTML printed to standard out shows:
−−

Apparently, setting "setExpandEntities(false)" had the effect of duplicating the character. I would expect that setting expand entities to 'false' would simply leave the "−", without duplicating it in US-ASCII formatting.

This isn't a big problem because if the default value, 'true', is used for entity expansion, the resulting output will simply contain "−" instead of duplicating the character. Even though the original entity encoding has changed, the resulting output will still behave/appear the same as the original, which is probably what's normally required.

- Thanks for any feedback & Happy 2009,
- David W.

======= INPUT XHTML DOCUMENT START =======
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.w3.org/Math/XSL/pmathml.xsl"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-strict.dtd">
<html>
  <head>
  </head>
  <body>
    <p>&minus;</p>
  </body>
</html>
======= INPUT XHTML DOCUMENT END =======


======= TEST JAVA CODE START =======
import java.io.File;
import java.io.OutputStreamWriter;

import org.jdom.Document;
import org.jdom.input.SAXBuilder;
import org.jdom.output.Format;
import org.jdom.output.XMLOutputter;

public class Test {
        public static void main(String[] args) throws Exception{
                 File fileInput = new File("testEntity.xml");
                 Document doc;

                 SAXBuilder b = new SAXBuilder();
                  b.setIgnoringElementContentWhitespace(true);
                  b.setExpandEntities(false);
                 doc = b.build(fileInput);
                  doc.getDocType().setInternalSubset(null);

                 XMLOutputter outputter = new XMLOutputter();
                 Format format = Format.getPrettyFormat();
                  format.setEncoding("US-ASCII");
                  outputter.setFormat(format);

                  outputter.output(doc, new OutputStreamWriter(System.out,format.getEncoding()));
        }
}
======= TEST JAVA CODE END =====

_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@...

jdomExpandEntitiesProblem.zip (1K) Download Attachment