|
View:
New views
15 Messages
—
Rating Filter:
Alert me
|
|
|
bug in e4x? missing = in XML attributeHi, I think I may have encountered a bug in e4x parsing related to a
3-byte sequence of UTF-8. The reason I think it's a bug is that it seems unreasonable that the test case XML is parsable by the DOM parser but not the e4x parser. Before filing in bugzilla I thought I would post here to see if anyone has another explanation for the behaviour. An email describing the problem with a test case is attached. Regards - Leni. Hi - I am trying to parse the attached xml file as follows: var req = new XMLHttpRequest(); req.open("GET", "chrome://myextension/content/1.xml", false); req.send(null); var xml = new XML( String(req.responseText). replace(/\<\?xml version=.*?\?\>/,"")); I'm seeing an error message in the javascript console: missing = in XML attribute The problem relates to a sequence of three bytes in the <content> element between the "Kerry" and the "Ex". od -x tells me the bit pattern is: 0xe2 0x80 0xa8 which according to: http://en.wikipedia.org/wiki/UTF-8#Description and http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char is a valid 3-byte UTF-8 sequence. According to this: https://developer.mozilla.org/en/International_characters_in_XUL_JavaScript#How_the_character_encoding_is_determined_in_Gecko_1.8_and_later the extension should be defaulting to UTF-8 and signalling that to the e4x processor. So I'm not sure what this parsing error is about. I've noticed through experimentation that changing some of the xml elsewhere (eg remove one of the redundant namespace delcarations) can change the error message or even make it go away entirely. And if I avoid e4x and stick with DOM: var serializer = new XMLSerializer(); var str = serializer.serializeToString(req.responseXML); alert(str); it parses fine. Any insight into what might be going on here would be welcome! Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
|
|
|
Re: bug in e4x? missing = in XML attributeMartin Honnen wrote:
> Leni wrote: >> Hi, I think I may have encountered a bug in e4x parsing related to a >> 3-byte sequence of UTF-8. >> >> The reason I think it's a bug is that it seems unreasonable that the >> test case XML is parsable by the DOM parser but not the e4x parser. >> >> Before filing in bugzilla I thought I would post here to see if anyone >> has another explanation for the behaviour. >> >> An email describing the problem with a test case is attached. > > Can you post the XML you are trying to parse? I also have a question about a workaround I was considering using: var serializer = new XMLSerializer(); var str = serializer.serializeToString(req.responseXML); var xml = new XML(str); By running the DOM's XML through the XMLserialzer to make a string then giving that to the e4x parser at least it parses. But XMLserialiser turns that three-byte UTF-8 sequence into a '(' character. So two more questions: a) can someone offer a pointer to how XMLserializer is supposed to behave when there is a 3-byte UTF-8 sequence in the content of an element? b) can anyone suggest any other workaround? The real-world thing I am trying to do is get a UTF-8 encoded Atom feed coming from Google into an e4x XML object. Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
Re: bug in e4x? missing = in XML attributeThe earlier attached xml didn't pass through the email correctly so here
it is again in a .zip. Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
Re: bug in e4x? missing = in XML attributeLeni wrote:
> The earlier attached xml didn't pass through the email correctly so here > it is again in a .zip. Ok, it looks like the mailing list software is removing the attachement, so here is a URL: http://www.zindus.com/tmp/1.xml.zip Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
|
|
|
Re: bug in e4x? missing = in XML attributeMartin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U; > Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test > case is at > http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html > and loads XML document from > http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501Test.xml > which is the file you sent. > > I don't get any script or XML parsing errors. Yes, you are right. The extension I am working on is for Thunderbird2 and Thunderbird3, and I can only reproduce the problem under Thunderbird2, not Thunderbird3. Sorry for not making this clear in the original posting (I didn't test tb3). If you are curious to reproduce this problem in Thunderbird using Martin's test case, install the ThunderbirdBrowse extension: https://addons.mozilla.org/en-US/thunderbird/addon/5373 Then visit the link in ThunderBrowse: http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html In Thunderbird3, the page is served correctly - the XML is shown. In Thunderbird2, the page is not served correctly - the javascript error console reports: Error: e.target.parentNode.hasAttribute is not a function Source File: chrome://tbrowse/content/tburlclk.js Line: 377 I won't file a bug report for this tb2-only problem then because I doubt it would get much attention. About a workaround for Thunderbird 2, the DOM ==> XMLSerializer ==> e4x technique does parse the XML but converts that 3-byte UTF-8 sequence into a '(' which makes it lossy. If someone can shed any light on what is going on here and in particular, what class of UTF-8 byte sequences might be affected by such lossy conversion, it would help me evaluate whether this technique is acceptable. Or if anyone can think of a better workaround for tb2 it will be welcome! Thanks - Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
|
|
|
Re: bug in e4x? missing = in XML attributeMartin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0 After the posting from shows.G4TechTV@... I did some more testing and found that I can't reproduce it when the e4x parsing happens inside a <browser> element. So ... here is another test case along the same lines. To run the test: - copy and paste the code below into a text editor and remove all the newlines - all the code should be on one line - copy and paste into the javascript error console and click evaluate The error console reports: Error: missing = in XML attribute Source File: Line: 3, Column: 2 Source Code: le><content>Alice, Kerry I can reproduce this in tb2, tb3beta1 and firefox 3.06. It's the \u2028 character in the code below which causes the problem. var str = "<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearch/1.1/'><id>example@...</id><updated>2009-02-11T05:58:32.673Z</updated><category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/contact/2008#contact'/><generator version='1.0' uri='http://www.google.com/m8/feeds'>Contacts</generator><entry><app:edited xmlns:app='http://www.w3.org/2007/app'>2009-02-11T05:48:11.672Z</app:edited><title>Alice Midxxxxxx</title><content>Alice, Kerry \u2028Ex: Jones</content></entry></feed>";var xml = new XML(str.replace(/\<\?xml version=.*?\?\>/,"")); Regards - Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
Re: bug in e4x? missing = in XML attributeshows.G4TechTV@... wrote:
> Actually, it's a bug that deals with javascript link handling in > ThunderBrowse. 3.2.3 fixes the bug. Yes - thanks for that. With ThunderBrowse 3.2.3 Martin's test case now works for me too. Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
Re: bug in e4x? missing = in XML attributeMartin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0 Just for good measure, I can now reproduce the problem using a test case similar to the one you used. Test case: http://www.zindus.com/tmp/test-case-2009-02-17-1.html The xml: http://www.zindus.com/tmp/test-case-2009-02-17-1.xml Firefox 3.0.6 error console reports: Error: illegal XML character The .xml is different to the one provided earlier, but the problem is the same - related to that unicode character, in this example it is just before the string "Jones". I hope I am not making a big noise over something that has a simple explanation. Leni. _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
|
|
|
Re: bug in e4x? missing = in XML attributeMartin Honnen wrote:
>> Test case: >> http://www.zindus.com/tmp/test-case-2009-02-17-1.html >> The xml: >> http://www.zindus.com/tmp/test-case-2009-02-17-1.xml >> >> Firefox 3.0.6 error console reports: >> Error: illegal XML character I get that too in trunk Gecko. However, if I start reducing it (and it's possible to reduce it a good bit while still getting that error), I eventually get to a point where the error starts changing (e.g. complaining about there being a missing '=' in an attribute). If I breakpoint on the "invalid XML character" error, I see that it happens when we get a '<' while we think we're in the process or parsing an open tag. In particular, it thinks it's looking at a string that looks something like: <author/www.google.com/m8/feeds/contacts/a.b%40gdomain.example.com/thin?start-index=2681&max-results=10' Which is pretty clearly bogus. -Boris _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
Re: bug in e4x? missing = in XML attributeOK, I have this minimized to this script:
var xmlEl = new XML("<feed xmlns:gContact='http://schemas.google.com/contact/2' xmlns:batch='http://schemas.google.com/gdata/batch' xmlns:gd='http://schemas.google.com/g/2005' gd:etag='W/"xxxxxxxxxxxxxxxxxxxxxxw."'><updated>2009-02-1</updated><e><c>\u2028</c></e></feed>"); var pre = document.createElement('pre'); pre.appendChild(document.createTextNode(xmlEl.toXMLString())); document.body.appendChild(pre); with no XMLHttpRequest required. Deleting chars from the string sometimes changes the error, and sometimes makes it go away entirely, but I bet it can be minimized some more. If someone wants to take a shot at that, great. This doesn't look like an XML issue, though, but a JS engine one. -Boris _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
|
|
Re: bug in e4x? missing = in XML attributeBoris Zbarsky wrote:
> OK, I have this minimized to this script: > > var xmlEl = new XML("<feed > xmlns:gContact='http://schemas.google.com/contact/2' > xmlns:batch='http://schemas.google.com/gdata/batch' > xmlns:gd='http://schemas.google.com/g/2005' > gd:etag='W/"xxxxxxxxxxxxxxxxxxxxxxw."'><updated>2009-02-1</updated><e><c>\u2028</c></e></feed>"); > > var pre = document.createElement('pre'); > pre.appendChild(document.createTextNode(xmlEl.toXMLString())); > document.body.appendChild(pre); > > with no XMLHttpRequest required. Deleting chars from the string > sometimes changes the error, and sometimes makes it go away entirely, > but I bet it can be minimized some more. If someone wants to take a > shot at that, great. > > This doesn't look like an XML issue, though, but a JS engine one. Thanks for the reduction. I agree it is a JavaScript engine issue, I have filed https://bugzilla.mozilla.org/show_bug.cgi?id=478905 on this. -- Martin Honnen http://JavaScript.FAQTs.com/ _______________________________________________ dev-tech-xml mailing list dev-tech-xml@... https://lists.mozilla.org/listinfo/dev-tech-xml |
| Free embeddable forum powered by Nabble | Forum Help |