bug in e4x? missing = in XML attribute

View: New views
15 Messages — Rating Filter:   Alert me  

bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, I think I may have encountered a bug in e4x parsing related to a
3-byte sequence of UTF-8.

The reason I think it's a bug is that it seems unreasonable that the
test case XML is parsable by the DOM parser but not the e4x parser.

Before filing in bugzilla I thought I would post here to see if anyone
has another explanation for the behaviour.

An email describing the problem with a test case is attached.

Regards -

Leni.

Hi -

I am trying to parse the attached xml file as follows:

   var req = new XMLHttpRequest();
   req.open("GET", "chrome://myextension/content/1.xml", false);
   req.send(null);

   var xml = new XML(
             String(req.responseText).
               replace(/\<\?xml version=.*?\?\>/,""));

I'm seeing an error message in the javascript console:

   missing = in XML attribute

The problem relates to a sequence of three bytes in the <content>
element between the "Kerry" and the "Ex".  od -x tells me the bit
pattern is:
   0xe2 0x80 0xa8
which according to:
   http://en.wikipedia.org/wiki/UTF-8#Description
and
   http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char

is a valid 3-byte UTF-8 sequence.

According to this:
https://developer.mozilla.org/en/International_characters_in_XUL_JavaScript#How_the_character_encoding_is_determined_in_Gecko_1.8_and_later
the extension should be defaulting to UTF-8 and signalling that to the
e4x processor.

So I'm not sure what this parsing error is about.

I've noticed through experimentation that changing some of the xml
elsewhere (eg remove one of the redundant namespace delcarations) can
change the error message or even make it go away entirely.

And if I avoid e4x and stick with DOM:

   var serializer = new XMLSerializer();
   var str = serializer.serializeToString(req.responseXML);
   alert(str);

it parses fine.

Any insight into what might be going on here would be welcome!

Leni.

_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Parent Message unknown Re: bug in e4x? missing = in XML attribute

by Martin Honnen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Leni wrote:

> Hi, I think I may have encountered a bug in e4x parsing related to a
> 3-byte sequence of UTF-8.
>
> The reason I think it's a bug is that it seems unreasonable that the
> test case XML is parsable by the DOM parser but not the e4x parser.
>
> Before filing in bugzilla I thought I would post here to see if anyone
> has another explanation for the behaviour.
>
> An email describing the problem with a test case is attached.

Can you post the XML you are trying to parse?

--

        Martin Honnen
        http://JavaScript.FAQTs.com/
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Honnen wrote:

> Leni wrote:
>> Hi, I think I may have encountered a bug in e4x parsing related to a
>> 3-byte sequence of UTF-8.
>>
>> The reason I think it's a bug is that it seems unreasonable that the
>> test case XML is parsable by the DOM parser but not the e4x parser.
>>
>> Before filing in bugzilla I thought I would post here to see if anyone
>> has another explanation for the behaviour.
>>
>> An email describing the problem with a test case is attached.
>
> Can you post the XML you are trying to parse?
Test-case xml is attached.

I also have a question about a workaround I was considering using:

var serializer = new XMLSerializer();
var str = serializer.serializeToString(req.responseXML);
var xml = new XML(str);

By running the DOM's XML through the XMLserialzer to make a string then
giving that to the e4x parser at least it parses.

But XMLserialiser turns that three-byte UTF-8 sequence into a '('
character.  So two more questions:
a) can someone offer a pointer to how XMLserializer is supposed
    to behave when there is a 3-byte UTF-8 sequence in the content
    of an element?
b) can anyone suggest any other workaround?

The real-world thing I am trying to do is get a UTF-8 encoded Atom feed
coming from Google into an e4x XML object.

Leni.

_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The earlier attached xml didn't pass through the email correctly so here
it is again in a .zip.

Leni.

_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Leni wrote:
> The earlier attached xml didn't pass through the email correctly so here
> it is again in a .zip.

Ok, it looks like the mailing list software is removing the attachement,
so here is a URL:
http://www.zindus.com/tmp/1.xml.zip

Leni.
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Parent Message unknown Re: bug in e4x? missing = in XML attribute

by Martin Honnen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Leni wrote:

>> Can you post the XML you are trying to parse?
>
> Test-case xml is attached.

I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test
case is at
http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html
and loads XML document from
http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501Test.xml
which is the file you sent.

I don't get any script or XML parsing errors.


--

        Martin Honnen
        http://JavaScript.FAQTs.com/
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U;
> Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test
> case is at
> http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html
> and loads XML document from
> http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501Test.xml 
> which is the file you sent.
>
> I don't get any script or XML parsing errors.

Yes, you are right.

The extension I am working on is for Thunderbird2 and Thunderbird3, and
I can only reproduce the problem under Thunderbird2, not Thunderbird3.
Sorry for not making this clear in the original posting (I didn't test tb3).

If you are curious to reproduce this problem in Thunderbird using
Martin's test case, install the ThunderbirdBrowse extension:
https://addons.mozilla.org/en-US/thunderbird/addon/5373

Then visit the link in ThunderBrowse:
http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html

In Thunderbird3, the page is served correctly - the XML is shown.

In Thunderbird2, the page is not served correctly - the javascript error
console reports:

   Error: e.target.parentNode.hasAttribute is not a function
   Source File: chrome://tbrowse/content/tburlclk.js
   Line: 377

I won't file a bug report for this tb2-only problem then because I doubt
it would get much attention.

About a workaround for Thunderbird 2, the DOM ==> XMLSerializer ==> e4x
technique does parse the XML but converts that 3-byte UTF-8 sequence
into a '(' which makes it lossy.  If someone can shed any light on what
is going on here and in particular, what class of UTF-8 byte sequences
might be affected by such lossy conversion, it would help me evaluate
whether this technique is acceptable.

Or if anyone can think of a better workaround for tb2 it will be welcome!

Thanks -

Leni.
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Parent Message unknown Re: bug in e4x? missing = in XML attribute

by G4TechTV :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Feb 15, 2:05 pm, Leni <mozilla....@...> wrote:

> Martin Honnen wrote:
> > I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U;
> > Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test
> > case is at
> >http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501....
> > and loads XML document from
> >http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501T...
> > which is the file you sent.
>
> > I don't get any script or XML parsing errors.
>
> Yes, you are right.
>
> The extension I am working on is for Thunderbird2 and Thunderbird3, and
> I can only reproduce the problem under Thunderbird2, not Thunderbird3.
> Sorry for not making this clear in the original posting (I didn't test tb3).
>
> If you are curious to reproduce this problem in Thunderbird using
> Martin's test case, install the ThunderbirdBrowse extension:https://addons.mozilla.org/en-US/thunderbird/addon/5373
>
> Then visit the link inThunderBrowse:http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501....
>
> In Thunderbird3, the page is served correctly - the XML is shown.
>
> In Thunderbird2, the page is not served correctly - the javascript error
> console reports:
>
>    Error: e.target.parentNode.hasAttribute is not a function
>    Source File: chrome://tbrowse/content/tburlclk.js
>    Line: 377
>
> I won't file a bug report for this tb2-only problem then because I doubt
> it would get much attention.
>
> About a workaround for Thunderbird 2, the DOM ==> XMLSerializer ==> e4x
> technique does parse the XML but converts that 3-byte UTF-8 sequence
> into a '(' which makes it lossy.  If someone can shed any light on what
> is going on here and in particular, what class of UTF-8 byte sequences
> might be affected by such lossy conversion, it would help me evaluate
> whether this technique is acceptable.
>
> Or if anyone can think of a better workaround for tb2 it will be welcome!
>
> Thanks -
>
> Leni.

Actually, it's a bug that deals with javascript link handling in
ThunderBrowse. 3.2.3 fixes the bug.
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0

After the posting from shows.G4TechTV@... I did some more testing
  and found that I can't reproduce it when the e4x parsing happens
inside a <browser> element.

So ... here is another test case along the same lines.

To run the test:
- copy and paste the code below into a text editor and remove
   all the newlines - all the code should be on one line
- copy and paste into the javascript error console and click evaluate

The error console reports:
   Error: missing = in XML attribute
   Source File:
   Line: 3, Column: 2
   Source Code:
   le><content>Alice, Kerry

I can reproduce this in tb2, tb3beta1 and firefox 3.06.  It's the \u2028
character in the code below which causes the problem.

var str = "<?xml version='1.0' encoding='UTF-8'?><feed
xmlns='http://www.w3.org/2005/Atom' 
xmlns:openSearch='http://a9.com/-/spec/opensearch/1.1/'><id>example@...</id><updated>2009-02-11T05:58:32.673Z</updated><category
scheme='http://schemas.google.com/g/2005#kind' 
term='http://schemas.google.com/contact/2008#contact'/><generator
version='1.0'
uri='http://www.google.com/m8/feeds'>Contacts</generator><entry><app:edited
xmlns:app='http://www.w3.org/2007/app'>2009-02-11T05:48:11.672Z</app:edited><title>Alice
Midxxxxxx</title><content>Alice, Kerry \u2028Ex:
Jones</content></entry></feed>";var xml = new XML(str.replace(/\<\?xml
version=.*?\?\>/,""));

Regards -

Leni.
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

shows.G4TechTV@... wrote:
> Actually, it's a bug that deals with javascript link handling in
> ThunderBrowse. 3.2.3 fixes the bug.

Yes - thanks for that.  With ThunderBrowse 3.2.3 Martin's test case now
works for me too.

Leni.
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Leni :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0

Just for good measure, I can now reproduce the problem using a test case
similar to the one you used.

Test case:
http://www.zindus.com/tmp/test-case-2009-02-17-1.html
The xml:
http://www.zindus.com/tmp/test-case-2009-02-17-1.xml

Firefox 3.0.6 error console reports:
   Error: illegal XML character

The .xml is different to the one provided earlier, but the problem is
the same - related to that unicode character, in this example it is just
before the string "Jones".

I hope I am not making a big noise over something that has a simple
explanation.

Leni.
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Parent Message unknown Re: bug in e4x? missing = in XML attribute

by Martin Honnen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Leni wrote:

> Martin Honnen wrote:
>> I can't reproduce the issue with Firefox 3.0
>
> Just for good measure, I can now reproduce the problem using a test case
> similar to the one you used.
>
> Test case:
> http://www.zindus.com/tmp/test-case-2009-02-17-1.html
> The xml:
> http://www.zindus.com/tmp/test-case-2009-02-17-1.xml
>
> Firefox 3.0.6 error console reports:
>   Error: illegal XML character
>
> The .xml is different to the one provided earlier, but the problem is
> the same - related to that unicode character, in this example it is just
> before the string "Jones".

I see that problem too with Firefox 3.0.6.

Now to move the problem into a bug report it would be best to have a
minimal test case, preferably, as the E4X XML constructor is implemented
by the JavaScript engine itself, a test case not even needing to load an
XML document with XMLHttpRequest, but rather a script test case doing
new XML(string) and causing the error.

I am however struggling to indentify the character causing the problem.
According to your earlier post, it is encoded in UTF-8 as 0xe2 0x80 0xa8
which would be the Unicode character U2028 I think.
However doing
   var el = new XML('<foo>Line 1.\u2028Line 2.</foo>');
in Firefox 3.0.6 does not cause any error, so that way the character is
parsed fine. So either it is not that character causing the error or
that error only occurs with longer strings.





--

        Martin Honnen
        http://JavaScript.FAQTs.com/
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Boris Zbarsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Martin Honnen wrote:
>> Test case:
>> http://www.zindus.com/tmp/test-case-2009-02-17-1.html
>> The xml:
>> http://www.zindus.com/tmp/test-case-2009-02-17-1.xml
>>
>> Firefox 3.0.6 error console reports:
>>   Error: illegal XML character

I get that too in trunk Gecko.

However, if I start reducing it (and it's possible to reduce it a good
bit while still getting that error), I eventually get to a point where
the error starts changing (e.g. complaining about there being a missing
'=' in an attribute).

If I breakpoint on the "invalid XML character" error, I see that it
happens when we get a '<' while we think we're in the process or parsing
an open tag.

In particular, it thinks it's looking at a string that looks something like:

 
<author/www.google.com/m8/feeds/contacts/a.b%40gdomain.example.com/thin?start-index=2681&max-results=10'

Which is pretty clearly bogus.

-Boris
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Boris Zbarsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OK, I have this minimized to this script:

       var xmlEl = new XML("<feed
xmlns:gContact='http://schemas.google.com/contact/2' 
xmlns:batch='http://schemas.google.com/gdata/batch' 
xmlns:gd='http://schemas.google.com/g/2005' 
gd:etag='W/"xxxxxxxxxxxxxxxxxxxxxxw."'><updated>2009-02-1</updated><e><c>\u2028</c></e></feed>");
       var pre = document.createElement('pre');
       pre.appendChild(document.createTextNode(xmlEl.toXMLString()));
       document.body.appendChild(pre);

with no XMLHttpRequest required.  Deleting chars from the string
sometimes changes the error, and sometimes makes it go away entirely,
but I bet it can be minimized some more.  If someone wants to take a
shot at that, great.

This doesn't look like an XML issue, though, but a JS engine one.

-Boris
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml

Re: bug in e4x? missing = in XML attribute

by Martin Honnen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Boris Zbarsky wrote:

> OK, I have this minimized to this script:
>
>       var xmlEl = new XML("<feed
> xmlns:gContact='http://schemas.google.com/contact/2' 
> xmlns:batch='http://schemas.google.com/gdata/batch' 
> xmlns:gd='http://schemas.google.com/g/2005' 
> gd:etag='W/"xxxxxxxxxxxxxxxxxxxxxxw."'><updated>2009-02-1</updated><e><c>\u2028</c></e></feed>");
>
>       var pre = document.createElement('pre');
>       pre.appendChild(document.createTextNode(xmlEl.toXMLString()));
>       document.body.appendChild(pre);
>
> with no XMLHttpRequest required.  Deleting chars from the string
> sometimes changes the error, and sometimes makes it go away entirely,
> but I bet it can be minimized some more.  If someone wants to take a
> shot at that, great.
>
> This doesn't look like an XML issue, though, but a JS engine one.

Thanks for the reduction. I agree it is a JavaScript engine issue, I
have filed https://bugzilla.mozilla.org/show_bug.cgi?id=478905 on this.


--

        Martin Honnen
        http://JavaScript.FAQTs.com/
_______________________________________________
dev-tech-xml mailing list
dev-tech-xml@...
https://lists.mozilla.org/listinfo/dev-tech-xml