|
View:
New views
19 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: Notes on validome test suite / validators comparisonValidome-Staff wrote: > Validome advices the user to use our XML-Validator, as a HTML-Validator > is not the appropriated tool to check XML...;-) When I try to validate <http://idn.icann.org/IDNwiki> at your site I get the same incorrect "valid" result as with the W3C validator. For the W3C validator I know that it can't (yet) check URI syntax, but it's disappointing that your validator also fails. Is than an issue in the "XHTML 1.0 transitional" schema or in your code ? Frank |
|
|
|
|
|
Re: Notes on validome test suite / validators comparisonValidome-Staff wrote: > http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#anyURI > "such rules and restrictions are not part of type validity > and are not checked by ·minimally conforming· processors. > Thus in practice the above definition imposes only very > modest obligations on ·minimally conforming· processors. " The 2nd edition 2004 still has the same text talking about RFC 2396 as amended by 2732 instead of RFC 3986 (STD 66) - okay, just checked it, STD 66 was published in January 2005. > As you know, there is no so simple as you claim to provide > a reliable URI check. The regexp in STD 66 is a one-liner, and determining the set of visible ASCII characters allowed in an URI is "possible". (Actually it's trivial, but it took me almost year to figure it out with the help Roy and others on the W3C URI list. ;-) > as we know - there is no validaor at the moment, which > handles it much better. Indeed, I just asked WDG and schneegans.de what they think, they also said "valid". Frank |
|
|
|
|
|
Critical bug 4916 (was: Notes on validome test suite / validators comparison)Alex wrote: >> The regexp in STD 66 is a one-liner, and determining the set >> of visible ASCII characters allowed in an URI is "possible". > My post was not about the ASCII character issue only...There > are some "URI" problems more, a schema validator doesn't catch > at the moment. Well, it's about time to fix this. After the installation of a "popular browser" on a "popular OS" virtually all applications allowing to click on URIs could indirectly start malware. It's hard to decide whose fault that is, but saying that it's only the fault of the user is no option. All, please "vote" for bug 4916 and support its reclassification as "critical" with "priority 1" for an immediate fix. We all had almost three years to think about RFC 3986 and 3987. It's a good thing that the IDN test finally forces some action. > A "RFC Conformity Checker" for URIs is much more than this single > ASCII issue. The generic RFC 3986 syntax is no rocket science, just ignore all idiosyncrasies of legacy definitions as in RFC 2368, admittedly mailto: is a hard case. The syntax in the expired mailto-bis draft is better. For a validator you're not forced to guess what invalid syntax is supposed to mean, simply flag it as invalid and be done with it. > NONtrivial...;-) Maybe we can agree on an "interesting clerical task". The xmpp folks (i.e. Peter) had to fix their syntax for 3986-compatibility, they (i.e. he) managed. Frank |
|
|
|
|
|
Re: Notes on validome test suite / validators comparisonHi Alex, Thanks a lot for going through the list, and giving more references. This is very useful. On Oct 20, 2007, at 00:05 , Validome-Staff wrote: > Here Validome advices the user to use our XML-Validator, as a HTML- > Validator is not the appropriated tool to check XML...;-) Understood, but as I wrote, I think it's not very good usability to call this a fatal error, when you could transparently redirect to your XML checker. > Here we corrected our claims, sorry for not keeping the comparison > up to date. Appreciated. > > * http://www.validome.org/out/ena4011 > > HTML 4.01 document with no system Id. > > Validome sends a warning... Not necessary per the spec. > > W3C Markup validator passes validation. > > Why is W3C validator marked as faulty here? References please? > > http://www.w3.org/TR/1999/REC-html401-19991224/struct/ > global.html#h-7.2 > Other way: Where is specified, that System-Id can be missed? SGML, which HTML 4.01 is an application of. Only in XML is the system identifier required, per: http://www.w3.org/TR/xml/#NT-ExternalID > > * http://www.validome.org/out/ena4023 > > Validome says valid. OpenSP and W3C Markup validator says not valid. > > I'd tend to trust opensp here. The comparison page's claim that > validome is the only validator doing the right thing is very dubious. > > > * http://www.validome.org/out/ena4024 > > Ditto above. The comparison page's claim that validome is the only > > validator doing the right thing is very dubious. > > What is here dubious? It's about SGML (not HTML) documents. And? > The "old" W3C-Validator made a fallback o US-ASCII, the "new" to > UTF-8. Can you explain this, please? > We asked many times W3C-Germany and Bjoern Hoehrmann in regard to > the *correct* behaviour of an validator in the case of a fallback, > but we didn't get any *exact* answer. In this case, the specs are > very unexact and ambiguous. Please give us a *mandatory" answer - > with a link reference to appropriate specifications - upon this > case. The only clear case till now is XHTML, there validators > should make a fallback to UTF-8 (depending on MIME-Type), HTML is > still ambiguous... There is no authoritative answer as far as I can tell, which supports my question: why do you consider your sending a fatal error the right thing to do, and other validators trying a fallback wrong? If there is no rule, you are not supposed to make arbitrary ones and claim you are the only ones to respect them. > > * http://www.validome.org/out/ena7003 > > I'd like to see a reference for this. > > http://www.w3.org/TR/html401/struct/links.html#h-12.2.3 > "...The id attribute, on the other hand, may not contain character > references." Interesting discrepancy between prose and DTD here, thanks for the pointer. > > * http://www.validome.org/out/ena7005 (and 7006) > > This has nothing to do with validation. If validome emulates some of > > the features of a link checker, compare it to link checkers, not > > validator. This test is moot. > > http://www.w3.org/TR/html401/struct/links.html#h-12.2.4 > "A reference to an unavailable or unidentifiable resource is an error" > ... > "If a user agent cannot locate a linked resource, it should alert > the user" > > Where is here the "moot"? The W3C-Specification is very clear in > this case... This is the usual confusion between user agent conformance (which the sections you quote are about) and document conformance (which validome and the markup validator are checking). > * http://www.validome.org/out/ena3002 > > This test is bogus. Sorry. An XML declaration also happens to be a > > proper SGML PI. Giving a warning asking the HTML4 author "are you > > sure you want this here" may be a good idea. Making this a fatal > > error is wrong, wrong, wrong. > > If a XML-declaration is allowed in SGML, I'd like to see a > reference for this. What I have is the SGML spec, chapter 8. Processing instructions. [on shorttags] > Oh, her we have hundred opinions of the case. Could you please show > us a *exact* reference? The best I have is the informative: http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.7 and the normative DTD, which allows the shorttags. As such, the spec clearly allows the construct, while informatively warning against it. I'll reply to your notes on distributing the markup validator in a separate mail. Thank you, -- olivier |
|
|
Re: validator catalogsHi Alex, On Oct 20, 2007, at 00:05 , Validome-Staff wrote: > 1. The W3C-SGML-Parser uses two catalog files: xml.soc and > sgml.soc. Within xml.soc there are 21 points missing, all regarding > SVG 1.1 Tiny and "SVG 1.1 Basic. The issues with SVG 1.1 Tiny and Basic are actualy a bit more complicated. See this mail: http://lists.w3.org/Archives/Public/www-svg/2007Oct/0005.html I think the workaround we found last month is better, see: http://lists.w3.org/Archives/Public/www-validator-cvs/2007Oct/0018.html I note that you also added a number of modules and files for XHTML print and basic, good idea. > 2. We missed 6 DTDs, necessary to get the download package running. Added, thanks. > 3. Your LibXML-Implementation was not correct - you just use the > catalog files of your SGML-Parser instead of taking care of the the > "official" catalog specification (http://www.xmlsoft.org/ > catalog.html#Simple). > Because of this, LibXML tries to get the external DTDs instead of > the local ones. Indeed, it was incorrect, but in the end we decided to not fix it, because loading of the catalogue is only supported after a certain version of XML::LibXML - hence we just didn't load anything and muted the entities errors. It may still be a good idea to fix it, although I'm not sure what version of XML::LibXML is supported by most systems. Thank you, -- olivier Thereaux - W3C - http://www.w3.org/People/olivier/ W3C Open Source Software: http://www.w3.org/Status |
|
|
Re: Notes on validome test suite / validators comparisonFrank, On Oct 20, 2007, at 23:22 , Frank Ellermann wrote: > When I try to validate <http://idn.icann.org/IDNwiki> at your site > I get the same incorrect "valid" result as with the W3C validator. > > For the W3C validator I know that it can't (yet) check URI syntax, > but it's disappointing that your validator also fails. Is than an > issue in the "XHTML 1.0 transitional" schema or in your code ? I'm curious as to why you so adamantly want to ban non-ascii IRIs from HTML? More on this later, but from what I am gathering from the experts, given the spirit of the specs (written before IDNs and IRIs) and the level of support for IDNs, barking at IRIs in href and src would be counterproductive for the internationalization of the web. -- olivier |
|
|
Re: validator catalogsAlex, all On Oct 24, 2007, at 15:04 , olivier Thereaux wrote: > I note that you also added a number of modules and files for XHTML > print and basic, good idea. FWIW, the file included in the RAR had a number of typos, that would break any validator using them. I committed to CVS a proper version, I suggest you use this if you are to package the w3c markup validator. Thanks again. -- olivier |
|
|
Re: Notes on validome test suite / validators comparisonolivier Thereaux wrote: > I'm curious as to why you so adamantly want to ban non-ascii IRIs > from HTML? Please tell me that you're joking. Native IRIs are nice where they are permitted. But on ICANN's Wiki using XHTML 1.0 they will cause havoc: Sooner or later mediawiki will be fixed to generate valid XHTML 1.0, translating native IRIs to equivalent URIs on the fly. After all that's REQUIRED for backwards compatibility in the numerous Wikis based on mediawiki. Users want that something happens when they click on a link, without upgrading their browser. And native IRIs are designed to have an equivalent URI-form. Sooner or later validators will be fixed to validate URIs, what with all those "URI exploits" we've seen in the last weeks for XP after the installation of IE7. And when validators do their job all users who naively followed ICANN and W3C into the realms of "who cares about validity if it works" will be seriously annoyed. I can still tell you the day when the W3C validator started to flag as invalid on a windows-1252 page. I was working on this page, it was stunning. > from what I am gathering from the experts, given the spirit of the > specs (written before IDNs and IRIs) and the level of support for > IDNs, barking at IRIs in href and src would be counterproductive > for the internationalization of the web. I'm curious which expert propagates to violate specifications. Want to know how long it took me to create an XHTML ersatz-DTD permitting IRIs everywhere ? 30 minutes. Check out http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-IRI-test.html Frank |
|
|
|
|
|
Re: IRIs in href (Was: Notes on validome test suite / validators comparison)On Oct 25, 2007, at 03:39 , Frank Ellermann wrote: > Users want that something happens when they > click on a link, without upgrading their browser. And native IRIs > are designed to have an equivalent URI-form. Lack of support for IRIs in legacy user agents is an issue, understood. Now, if today the HTML 4.01 and XHTML 1.0 specs and above were updated to say "IRIs" instead of "URIs", what would you do? As I wrote before, these specs were written before IRIs were a reality. The HTML4 spec contains advice on how to treat "URIs containing non- ASCII characters". See http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 Although it clearly calls these illegal, it prepares the ground for IRIs (for which we didn't yet have that name at that time). Saying that IRIs should not be used because they break in legacy software, is an argument I have sympathy for, but have trouble accepting. This reminds me of the situation whereby, in Japan, one still can't safely use unicode in mails, because so many MUAs or webmails just don't support it. > Sooner or later validators will be fixed to validate URIs, what with > all those "URI exploits" we've seen in the last weeks for XP after > the installation of IE7. This is irrelevant to the discussion about IRIs. Please don't use internationalization as a scapegoat for bad coding. > I can still tell you the day when the W3C validator started to flag > as invalid on a windows-1252 page. I was working on this > page, it was stunning. There once was a bug, and IIRC it was fixed in a few hours. Now, how is that relevant to the discussion at hand? > I'm curious which expert propagates to violate specifications. Want > to know how long it took me to create an XHTML ersatz-DTD permitting > IRIs everywhere ? 30 minutes. Here you must be joking, bluffing, or mistaken, Frank. The current XHTML DTD says that DTDs are CDATA, and thus any SGML or XML validator has to accept all the characters allowed in the document, which includes all those usable in IRIs. -- olivier |
|
|
Re: IRIs in href (Was: Notes on validome test suite / validators comparison)olivier Thereaux wrote: > > > On Oct 25, 2007, at 03:39 , Frank Ellermann wrote: >> Users want that something happens when they >> click on a link, without upgrading their browser. And native IRIs >> are designed to have an equivalent URI-form. > > Lack of support for IRIs in legacy user agents is an issue, understood. > Now, if today the HTML 4.01 and XHTML 1.0 specs and above were updated > to say "IRIs" instead of "URIs", what would you do? > > As I wrote before, these specs were written before IRIs were a reality. > The HTML4 spec contains advice on how to treat "URIs containing > non-ASCII characters". > See http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1 > Although it clearly calls these illegal, it prepares the ground for IRIs > (for which we didn't yet have that name at that time). > > Saying that IRIs should not be used because they break in legacy > software, is an argument I have sympathy for, but have trouble > accepting. This reminds me of the situation whereby, in Japan, one still > can't safely use unicode in mails, because so many MUAs or webmails just > don't support it. What about saying that IRIs should not be used because RFC 3987 section 1.2 item (a) says that this standard is not intended to apply to any protocol or format element unless those formats or protocols explicitly say that IRIs are supported? - Sam Ruby |
|
|
Re: IRIs in hrefolivier Thereaux wrote: > Now, if today the HTML 4.01 and XHTML 1.0 specs and above were > updated to say "IRIs" instead of "URIs", what would you do? Maybe ditch the W3C and post the reasons in an Internet Draft. I'd certainly consider it as unethical. RFC 3987 does not "update" 3986. The spec.s should be updated with s/2396/3986/g, s/3066/4646/g, and similar clerical tasks, e.g. explaining why xml:lang is forced to be still an NMTOKEN wrt these document types. But for incompatible modifications we need new document types. Not worldwide "upgrade your browser" campaigns, some users can't, and besides it's completely unnecessary, all IRIs by definition have an equivalent URI working with "any browser". > Saying that IRIs should not be used because they break in > legacy software, is an argument I have sympathy for, but > have trouble accepting. I'm not suprised if folks active in the W3C don't care much about "backwards compatibility". But admittedly I was very suprised when you introduced "let's not care about formally valid" as new concept. A user armed with an old text mode browser could take out the ICANN IDN test, AFAIK "formally invalid" is a FAIL in any accesibility test, isn't it ? > This reminds me of the situation whereby, in Japan, one > still can't safely use unicode in mails, because so many > MUAs or webmails just don't support it. Maybe they have plausible reasons why they don't need or don't like it. BTW, the (formally valid) IDN test page I've created last week was the first XHTML page where I actually needed UTF-8. Now I'm curious what browsers do with an IRI in a legacy charset. RFC 3987 allows this. >> Sooner or later validators will be fixed to validate >> URIs, what with all those "URI exploits" we've seen in >> the last weeks for XP after the installation of IE7. > This is irrelevant to the discussion about IRIs. Please > don't use internationalization as a scapegoat for bad > coding. It's relevant for the discussion of bug 4916 submitted by you 2007-08-07. If that bug is fixed it might also detect IRIs where only URIs are allowed. Admittedly almost impossible for a validator based on DTDs, maybe you end up with a clumsy hack working only for a few very important document types. >> I can still tell you the day when the W3C validator >> started to flag as invalid on a windows-1252 >> page. I was working on this page, it was stunning. > There once was a bug, and IIRC it was fixed in a few > hours. Two days after 911, it's good if it only took you a few hours. But it took me several months to figure out why I need octet 128 instead of NCR . > Now, how is that relevant to the discussion at hand? If everybody and his dog start to use IRIs in document types where it's not permitted, and some time later an improved validator informs them that this was invalid, the disturbed users will be annoyed. > The current XHTML DTD says that DTDs are CDATA Sure, the details are specified in the prose, the DTD only uses an entity name %URI; It could also use %FOO; or %IRI; as name. Likewise the RFC 2396 in the DTD is only a comment. Frank |
|
|
Re: IRIs in hrefFrank Ellermann wrote: >olivier Thereaux wrote: > >> Now, if today the HTML 4.01 and XHTML 1.0 specs and above were >> updated to say "IRIs" instead of "URIs", what would you do? > >Maybe ditch the W3C and post the reasons in an Internet Draft. >I'd certainly consider it as unethical. I can understand your feelings, but they'd only clarify what they meant when they recommended, in http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1, to convert URIs with non-ASCII characters to UTF-8 and then to use percent-encoding, or what they meant when they used %URI; but declared it as being CDATA in the DTD. >RFC 3987 does not "update" 3986. Very correct. It was never intended as that, it was intended to serve as a stable specification for all those specs (including HTML4) that wanted to use the concept but had to use circumscriptive language. >The spec.s should be updated >with s/2396/3986/g, s/3066/4646/g, and similar clerical tasks, >e.g. explaining why xml:lang is forced to be still an NMTOKEN >wrt these document types. > >But for incompatible modifications we need new document types. The new document type would not at all differ in functionality from the old one. The only changes might be comments and the names of parameter entities, but as with programs, that doesn't change the functionality at all. >Not worldwide "upgrade your browser" campaigns, some users >can't, and besides it's completely unnecessary, all IRIs by >definition have an equivalent URI working with "any browser". Yes, but people who actually can read and understand "испытание" better than "testing", испытание may be very helpful, whereas %D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5 would be just garbage for them. >> Saying that IRIs should not be used because they break in >> legacy software, is an argument I have sympathy for, but >> have trouble accepting. > >I'm not suprised if folks active in the W3C don't care much >about "backwards compatibility". But admittedly I was very >suprised when you introduced "let's not care about formally >valid" as new concept. Formally valid means valid according to the DTD, I guess. In this respect, IRIs have always been valid. >A user armed with an old text mode >browser could take out the ICANN IDN test, AFAIK "formally >invalid" is a FAIL in any accesibility test, isn't it ? I'm not sure what you are after here, but if you want to claim that IRIs somehow are anti-accessibility, then I think you should consider the following two points: a) There are temporary accessibility issues and long-term accessibility issues. Temporary accessibility issues are issues of the kind "The current screen readers/audio browsers/... only support foo, so in order to be accessible, use foo, not bar". Once the technology has caught up (and accessibility technology improves in the same way other technology improves), such a requirement may no longer apply. b) A Russian screen reader will definitely do a better job with испытание than with %D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5. Other screen readers may have problems, but then, испытание will mostly appear in Russian documents. >> This reminds me of the situation whereby, in Japan, one >> still can't safely use unicode in mails, because so many >> MUAs or webmails just don't support it. > >Maybe they have plausible reasons why they don't need or >don't like it. BTW, the (formally valid) IDN test page >I've created last week was the first XHTML page where I >actually needed UTF-8. Now I'm curious what browsers do >with an IRI in a legacy charset. RFC 3987 allows this. For some tests, please see http://www.sw.it.aoyama.ac.jp/2005/iritest/, in particular the "Legacy Human" section at http://www.sw.it.aoyama.ac.jp/2005/iritest/HTML/index.html. >>> Sooner or later validators will be fixed to validate >>> URIs, what with all those "URI exploits" we've seen in >>> the last weeks for XP after the installation of IE7. > >> This is irrelevant to the discussion about IRIs. Please >> don't use internationalization as a scapegoat for bad >> coding. > >It's relevant for the discussion of bug 4916 submitted >by you 2007-08-07. If that bug is fixed it might also >detect IRIs where only URIs are allowed. In the original mail, Olivier actually wrote: >>>> 1) a parser to check that a given string is a proper URI/IRI This surely already exists, hopefully as open source code, or even better, as a perl module. Does anyone want to investigate this? >>>> That then got shortened in the bug report. But there is a rather fundamental reason why this actually may be a bad idea: URIs/IRIs are supposed to be very flexible. If somebody came along tomorrow with a very great idea for an extension to the URI syntax, and the community agreed with that extension, even if it wouldn't fit the current syntax definition, then this would lead to an update of the URI spec. Let's show you the idea behind the above with a somewhat more concrete example: If you want to create some software that tries to spot potential mistakes in an HTML document, I'd guess you'd surely flag something like <a href='htpp://www.w3.org'... But even actually reading the URI spec in detail, there's nothing there that says it's illegal. Somebody could register the "htpp:" scheme at any time. As another example, consider the following: <img src='http://example.org/top.html'> Again, this clearly looks like a mistake, one wouldn't use a link to a Web page in an src attribute. But you never know when some browsers might actually implement something like a thumbnail view of a web page in such a case (apart from the fact that you also don't know that top.html is a Web page and not some image). Again, <img src='mailto:abc@...'> looks like nonsense, but again, it may make sense in the future. The point is that URIs and IRIs are intended to be a very general mechanism to connect resources on the Web, and that any restriction has to be considered very carefully. >Admittedly almost impossible for a validator based on >DTDs, maybe you end up with a clumsy hack working only >for a few very important document types. Yes, trying to restrict the syntax in a field declared as CDATA (for attributes) or PCDATA (for elements) based only on the name of the field, the name of a parameter entity, or a comment found nearby would be difficult. >>> I can still tell you the day when the W3C validator >>> started to flag as invalid on a windows-1252 >>> page. I was working on this page, it was stunning. > >> There once was a bug, and IIRC it was fixed in a few >> hours. > >Two days after 911, it's good if it only took you a few >hours. But it took me several months to figure out why >I need octet 128 instead of NCR . Sorry to be a bit direct here, but if it took you several months to figure out why you need octet 128 rather than NCR , then at least at that point in time, you didn't really know much about the fundamentals of Web internationalization. If you look at the SGML declaration and at the DTD, it's very clear that is illegal and non-valid. >> Now, how is that relevant to the discussion at hand? > >If everybody and his dog start to use IRIs in document >types where it's not permitted, and some time later an >improved validator informs them that this was invalid, >the disturbed users will be annoyed. Well, this is a circular argument. "Let's annoy users now so that we don't need to annoy them later." doesn't make sense if "Let's not annoy them at all." is the best option anyway. >> The current XHTML DTD says that DTDs are CDATA > >Sure, the details are specified in the prose, the DTD >only uses an entity name %URI; It could also use %FOO; >or %IRI; as name. Likewise the RFC 2396 in the DTD is >only a comment. Exactly. Validation means validation according to the DTD, and parameter entity names and comments don't affect this process. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
Re: IRIs in hrefMartin Duerst wrote: > they'd only clarify whatthey meant when they recommended, > in http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1, > to convert URIs with non-ASCII characters to UTF-8 and > then to use percent-encoding The given example states that it's <strong>illegal</strong>, after that it explains a best guess implementation clearly written before you published RFC 3987. It doesn't address IDNs, IDNs didn't exist 1999. When browsers try to guess what broken URIs mean they could run into the recent flood of "XP with IE7" security issues. >> for incompatible modifications we need new document types. > The new document type would not at all differ in functionality > from the old one. The only changes might be comments and > the names of parameter entities, but as with programs, that > doesn't change the functionality at all. It's a "formally valid" experiment with unencoded IRIs in links, that can be (legally) relevant for accesibility. It might also help for an RFC 3987 implementation and interoperability report, "it's illegal but it works" wouldn't be convincing (of course there would be still atom and xmpp if all else fails). Maybe somebody creates a corresponding schema allowing to check IRI syntax. > %D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5 > would be just garbage for them. The W3C validator apparently hates this in a system identifier: http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-XML-test.htm (sorry, I can't read your unencoded ISO-2022-JP examples at the moment) > Formally valid means valid according to the DTD, I guess. No, I meant the prose 2396 specification of URI, not CDATA. > Temporary accessibility issues are issues of the kind "The > current screen readers/audio browsers/... only support foo, > so in order to be accessible, use foo, not bar". Once the > technology has caught up (and accessibility technology > improves in the same way other technology improves), such > a requirement may no longer apply. "Temporary" can be a rather long time, RFC 2277 talks about 50 years wrt UTF-8. Worldwide upgrades take some time. The "real" IDN TLD test started less than four weeks ago, and on another list you argued that not much will happen before real IDN TLDs are introduced. > For some tests, please see > http://www.sw.it.aoyama.ac.jp/2005/iritest/, Thanks, Firefox 2.0.0.9 fails already in the Latin-1 "Bücher" test, of course it works for UTF-8. What I had in mind would be minimally harder, using "Bücher" in an unencoded IDN on a Web page using a legacy charset. Obviously I can forget this for now, if it doesn't work in an <ipath> it also won't work in an <ihost>. > URIs/IRIs are supposed to be very flexible. Actually I'm lost with LEIRIs, HRRIs, options allowing to use unencoded ASCII characters in IRIs not permitted in URIs, and the recent discussion about allowing unencoded square brackets outside of <IP-literal>. With URIs it's clear, if they're valid they must match the generic STD 66 syntax. No unencoded spaces etc. > If somebody came along tomorrow with a very great idea > for an extension to the URI syntax, and the community > agreed with that extension, even if it wouldn't fit the > current syntax definition, then this would lead to an > update of the URI spec. There's no "updates RFC 3986" in the URI template draft. > If you want to create some software that tries to spot > potential mistakes in an HTML document, I'd guess you'd > surely flag something like <a href='htpp://www.w3.org'... Flag and warn yes, but it's no STD 66 syntax error. The tool could restrict schemes to registered schemes and allow to configure additional unregistered schemes. > example, consider the following: > <img src='http://example.org/top.html'> > Again, this clearly looks like a mistake The "top.html" is just a name, admittedly a bad name if the resource is something that can be displayed as image. A legal URI. OTOH "bücher.html" isn't a legal URI. > Again, <img src='mailto:abc@...'> looks like > nonsense, but again, it may make sense in the future. "Syntactically valid" isn't the same as "makes sense", I think we don't disagree about this. Where we might disagree is about "syntactically invalid". Browsers are forced to make sense out of (some kinds of) garbage, but a syntax check is supposed to report syntax errors. > if it took you several months to figure out why you > need octet 128 rather than NCR , then at least > at that point in time, you didn't really know much > about the fundamentals of Web internationalization. In 2001 I knew _nothing_ about it, I was armed with a Netscape 2.02 not supporting UTF-8 and treating as Euro, an O'Reilly book with "XHTML" in its title published 2000, the W3C validator for online syntax checks, and a box with local codepages "850" + 437. > Well, this is a circular argument. "Let's annoy users > now so that we don't need to annoy them later." doesn't > make sense if "Let's not annoy them at all." is the > best option anyway. "Let's not annoy them at all" won't fly if the STD 66 syntax is checked later. It was good when the validator finally (2001-09-13) informed me that is crap, it would have been better if it had done that a few months earlier. Frank |
|
|
Re: IRIs in hrefMartin Dürst wrote: > For some tests, please see > http://www.sw.it.aoyama.ac.jp/2005/iritest/, > in particular the "Legacy Human" section at > http://www.sw.it.aoyama.ac.jp/2005/iritest/HTML/index.html. Update: While Firefox 2 fails for the Latin-1 test case in this test suite at http://www.sw.it.aoyama.ac.jp/2005/iritest/HTML/UTF-8check/iso-8859-1_deB/a_href/index.html it works fine the two IRIs (Cyril and Greek) on my Koi8-R test page http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-IRI-koi8-r.html Frank |
| Free embeddable forum powered by Nabble | Forum Help |