Is this expected behaviour?

View: New views
1 Messages — Rating Filter:   Alert me  

Is this expected behaviour?

by Richard Fine :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello everyone,

I'm building an input sanitizer around Charles Reitzel's Tidy.NET
bindings; I'm using it to take potentially-malformed XHTML +
proprietary-namespaced tag soup and produce some kind of valid XML from
it. It's working OK so far but I'm a bit surprised by one of my
testcases - the output is valid XML but it's not what I was expecting it
to be.

 From the default options, I explicitly turn on:

input-xml
output-xml
force-output

and give it the input string:

<html xmlns="http://www.w3.org/1999/xhtml"><body><b>Hello,
<i<i>world!</b></body></html>

Note the '<i<i>' construct before 'world.' I was expecting the output:

<html xmlns="http://www.w3.org/1999/xhtml"><body><b>Hello,
<i<i>world!</i></b></body></html>

whereby the first < in the <i<i> is encoded as an entity. Instead, what
I'm getting is:

<html xmlns="http://www.w3.org/1999/xhtml"><body><b>Hello,
<i i="">world!</i></b></body></html>

the <i<i> is becoming <i i="">.  How come?

(I have a suspicion that the TidyATL/Tidy.NET packages on Charles' page
are a bit out of date - they're certainly missing some of the more
recent options - so if this is a bug that's already been fixed or
something, I apologise for wasting your time...)

Thanks in advance,

- Richard