|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
What makes illegal characters non-conformant-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 validator.nu finds an error in http://www.ltg.ed.ac.uk/~ht/char_alias.html I don't think I have a problem with that, I can imagine an argument that it's broken (although http://www.ltg.ed.ac.uk/~ht/char_alias.xml is _not_ broken per the XML specification. . .), but I can't find anywhere in the HTML5 spec. which says so. Does it/should it? ht - -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@... URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFKuizlkjnJixAXWBoRAu+RAJ92Qgw1nFNt9DEcB8cAb3OVN11nDgCfZms5 uM5iIDb88zKefGCn93/Xg44= =Gzwx -----END PGP SIGNATURE----- |
|
|
Re: What makes illegal characters non-conformantOn Wed, 23 Sep 2009 16:12:53 +0200, Henry S. Thompson <ht@...>
wrote: > validator.nu finds an error in > > http://www.ltg.ed.ac.uk/~ht/char_alias.html > > I don't think I have a problem with that, I can imagine an argument > that it's broken (although http://www.ltg.ed.ac.uk/~ht/char_alias.xml > is _not_ broken per the XML specification. . .), but I can't find > anywhere in the HTML5 spec. which says so. Does it/should it? http://whatwg.org/html5#misinterpreted-for-compatibility -- Anne van Kesteren http://annevankesteren.nl/ |
|
|
Re: What makes illegal characters non-conformant-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Anne van Kesteren writes: > http://whatwg.org/html5#misinterpreted-for-compatibility That's about agents, not documents. ht - -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@... URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFKulwtkjnJixAXWBoRAujRAJ48mIC1P/wKZxHBn0OER0r14H2eQgCfYO08 y1Qi2uVmizIybucJbLUD44Y= =/fZi -----END PGP SIGNATURE----- |
|
|
Re: What makes illegal characters non-conformantOn Sep 23, 2009, at 20:34, Henry S. Thompson wrote:
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Anne van Kesteren writes: > >> http://whatwg.org/html5#misinterpreted-for-compatibility > > That's about agents, not documents. What happens here is that Validator.nu is out of date and doesn't misinterpret US-ASCII for compatibility, the US-ASCII decoder finds a bad byte. However, what makes the document non-conforming (but what isn't the reason why Validator.nu says it's non-conforming) is the sentence "The character encoding name given must be the name of the character encoding used to serialize the file." under http://www.whatwg.org/specs/web-apps/current-work/#charset The byte 0x80 is not valid in US-ASCII. Thus, US-ASCII isn't the name of the encoding used. Note that for encodings that aren't "misinterpreted for compatibility" the reasoning would be that the normative requirements of the encoding become part of the conformance criteria by reference. Since Validator.nu is out of date and treats US-ASCII like any non-special encoding, this is the reason why it complains. -- Henri Sivonen hsivonen@... http://hsivonen.iki.fi/ |
|
|
Re: What makes illegal characters non-conformantOn 23 Sep 2009, at 15:12, Henry S. Thompson wrote: > although http://www.ltg.ed.ac.uk/~ht/char_alias.xml > is _not_ broken per the XML specification. . . It should be, per: > It is a fatal error if an XML entity is determined (via default, > encoding declaration, or higher-level protocol) to be in a certain > encoding but contains byte sequences that are not legal in that > encoding. That said, though processors must throw a fatal error, I can't see anything saying the document isn't well-formed (bug?). -- Geoffrey Sneddon <http://gsnedders.com/> |
|
|
Re: What makes illegal characters non-conformant-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Geoffrey Sneddon writes: > On 23 Sep 2009, at 15:12, Henry S. Thompson wrote: > >> although http://www.ltg.ed.ac.uk/~ht/char_alias.xml >> is _not_ broken per the XML specification. . . > > It should be, per: > >> It is a fatal error if an XML entity is determined (via default, >> encoding declaration, or higher-level protocol) to be in a certain >> encoding but contains byte sequences that are not legal in that >> encoding. You're right, I was mistaken. > That said, though processors must throw a fatal error, I can't see > anything saying the document isn't well-formed (bug?). Hmm. ht - -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@... URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFKum7QkjnJixAXWBoRAl7dAJ9YERQmccq5h1FQC+/y+8ya5DRfcwCghAT2 rfoIGs4VEOSoEQ8HKz23Yc8= =MnAk -----END PGP SIGNATURE----- |
|
|
Re: What makes illegal characters non-conformant* Henry S. Thompson wrote:
>I don't think I have a problem with that, I can imagine an argument >that it's broken (although http://www.ltg.ed.ac.uk/~ht/char_alias.xml >is _not_ broken per the XML specification. . .), but I can't find >anywhere in the HTML5 spec. which says so. Does it/should it? It is not broken per the XML specification by the same reasoning that a PNG image is not broken per the XML specification. Procedurally for both cases the XML processor determines some character encoding and attempts to decode the document, and then encounters byte sequences that do not have a well-defined meaning according to the encoding's specification. It is therefore not possible to restore the textual data the binary data represents, and the XML specification only defines conformance for pro- cessors and textual data objects. Consider that the XML specification does not normatively define exactly how to determine the character encoding (and I am ignoring that you've used text/xml as media type for the document which has other theoretical considerations rarely met in practise), so you can easily define a new character encoding very-bogus-encoding as "Any sequence of bytes stands for the text <?xml version='1.0' encoding='very-bogus-encoding'?><x/>" and your document would be perfectly conforming if the processor does indeed support that encoding. Cases like this do in fact exist in the real world, for example, with UTF-32 encoded documents the processor may not support UTF-32 and may instead detect UTF-16 or UTF-8 and encounter illegal byte sequences or disallowed characters. The only difference is in perception as UTF-32 is widely recognized while very-bogus-encoding is not. It is ultimately entirely irrelevant whether your document is broken per the XML specification as it is as far as common sense goes broken per the US-ASCII specification. You might just as well have your web server send out malformed TCP datagrams or a malformed HTTP response and muse how that is or is not broken per unrelated specifications. Similarily is very-bogus-encoding irrelevant because it violates what is considered common sense. http://xkcd.com/468/ comes to mind. (The XML specification actually considers your case a fatal error and those are errors which in turn are violations of the constraints of the specification, I've argued unsuccessfully against that in the past as having specification violations dependant on processor capabilities is a violation of common sense.) -- Björn Höhrmann · mailto:bjoern@... · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ |
| Free embeddable forum powered by Nabble | Forum Help |