|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Tag-Soup- versus XML-ParserHi folks,
as I can see, there are at least to different methods of parsing a (x) html-document: a "tag-soup"-parser which is very error-prone and makes the best out of non-valid websites. and a xml-parser, which is only activated if a document is served with mime-type application/xhtml +xml. IMHO an xml-parser should be a lot faster, because it just stops if there is an error, and has not to concern error-handling. I would be very interested in performance-comparison between the mostly used tag-soup-parser and the xml-parser for xhtml1.x documents which are correctly served as application/xhtml+xml. can you maybe give me any metrics? or, better, is it possible to extract the gecko document parser and benchmark it standalone with various documents? the answer to this question is relevant if it is preferable to build valid xml xhtml documents or just stick to html 4.01. thank you very much! _______________________________________________ dev-performance mailing list dev-performance@... https://lists.mozilla.org/listinfo/dev-performance |
|
|
Re: Tag-Soup- versus XML-Parser[Please don't cross-post without setting followup-to; set that to
.performance] Ernst Bauernfeind wrote: > as I can see, there are at least to different methods of parsing a (x) > html-document: a "tag-soup"-parser which is very error-prone and makes > the best out of non-valid websites. and a xml-parser, which is only > activated if a document is served with mime-type application/xhtml > +xml. You mean "error-correcting", not "error-prone", right? > IMHO an xml-parser should be a lot faster, because it just stops if > there is an error, and has not to concern error-handling. It also needs to implement very different parsing rules from HTML in general, keep track of namespaces, etc. > I would be very interested in performance-comparison between the > mostly used tag-soup-parser and the xml-parser for xhtml1.x documents > which are correctly served as application/xhtml+xml. > > can you maybe give me any metrics? Parser performance per se is more or less a wash, from what I've seen. It's also generally been a small enough component of pageload time (15% or less) that this aspect should be about the last factor in decidint whether to use use XML or not. > or, better, is it possible to extract the gecko document parser and > benchmark it standalone with various documents? Why? That would give you a somewhat useless number for your purposes (see below). > the answer to this question is relevant if it is preferable to build > valid xml xhtml documents or just stick to html 4.01. That's an entirely different question from a performance standpoint. Typically, the HTML codepath receives more attention in terms of optimization and profiling; if we have to sacrifice XHTML-as-XML performance in favor of HTML performance, we do so. There are also common markup constructs that actually do significantly different things in XHTML-as-XML and in HTML. A good example: <table> <tr><td>Text</td></tr> </table> This produces different DOMs in HTML and in XHTML-as-XML; the HTML one is faster to lay out, especially if you plan to do any dynamic addition or removal of rows in that table. -Boris _______________________________________________ dev-performance mailing list dev-performance@... https://lists.mozilla.org/listinfo/dev-performance |
| Free embeddable forum powered by Nabble | Forum Help |