writeCharacters() does not escape > (>)

View: New views
3 Messages — Rating Filter:   Alert me  

writeCharacters() does not escape > (>)

by Chris Custine :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Can anyone tell me why BufferingXmlWriter.writeCharacters() only escapes > if it appears to be part of the forbidden sequence "]]>" outside of a CDATA?  I would have expected > to always be escaped as > but I'm curious why this code doesn't always do it.  The lines in question are shown below from BufferingXmlWriter.

                    } else if (c == '<') {
                        ent = "&lt;";
                        break inner_loop;
                    } else if (c == '&') {
                        ent = "&amp;";
                        break inner_loop;
                    } else if (c == '>') {
                        // Let's be conservative; and if there's any
                        // change it might be part of "]]>" quote it
                        if (inPtr < 2 || text.charAt(inPtr-2) == ']') {
                            ent = "&gt;";
                            break inner_loop;
                        }
                    }

Thanks,
Chris
--
Chris Custine
FUSESource :: http://fusesource.com
My Blog :: http://blog.organicelement.com
Apache ServiceMix :: http://servicemix.apache.org
Apache Directory Server :: http://directory.apache.org

RE: writeCharacters() does not escape > (&gt;)

by Michael Kay :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't know why the author made that design decision (many people escape ">" wherever it appears), but certainly, XML only requires ">" to be escaped when it appears as part of "]]>".
 
Michael Kay
http://www.saxonica.com/


From: Chris Custine [mailto:chris.custine@...]
Sent: 13 May 2009 05:04
To: user@...
Subject: [woodstox-user] writeCharacters() does not escape > (&gt;)

Can anyone tell me why BufferingXmlWriter.writeCharacters() only escapes > if it appears to be part of the forbidden sequence "]]>" outside of a CDATA?  I would have expected > to always be escaped as &gt; but I'm curious why this code doesn't always do it.  The lines in question are shown below from BufferingXmlWriter.

                    } else if (c == '<') {
                        ent = "&lt;";
                        break inner_loop;
                    } else if (c == '&') {
                        ent = "&amp;";
                        break inner_loop;
                    } else if (c == '>') {
                        // Let's be conservative; and if there's any
                        // change it might be part of "]]>" quote it
                        if (inPtr < 2 || text.charAt(inPtr-2) == ']') {
                            ent = "&gt;";
                            break inner_loop;
                        }
                    }

Thanks,
Chris
--
Chris Custine
FUSESource :: http://fusesource.com
My Blog :: http://blog.organicelement.com
Apache ServiceMix :: http://servicemix.apache.org
Apache Directory Server :: http://directory.apache.org

Re: writeCharacters() does not escape > (&gt;)

by Cowtowncoder :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, May 13, 2009 at 12:59 AM, Michael Kay <mike@...> wrote:
> I don't know why the author made that design decision (many people escape
> ">" wherever it appears), but certainly, XML only requires ">" to be escaped
> when it appears as part of "]]>".

Correct: the decision was just based on doing minimal escaping where
it is required (or more accurately: where it can not be determined not
to be needed).

So why minimize escaping? Slightly more compact output, and (if I
remember correctly) ability to write larger chunks of contiguous
non-escaped characters as is, which is faster.

Hope this helps,

-+ Tatu +-

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email