|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
Proposed Charter and Agenda for IRI BOF at IETF 76While I am still hoping a working group isn't necessary, the
fallback is to start an IETF working group. This message proposes an agenda for a meeting to form the working group ("BOF" at IETF 76) and a draft charter. Is there interest in participating in a BOF at the Hiroshima meeting? Please discuss on public-iri@.... AGENDA: ======= The primary agenda is to review the charter and discover the level of community interest: 90 minute agenda * 10 minutes, Masinter Review of URI and IRI history & current documents * 20 minutes, Masinter & Duerst Review of IRI draft(s) and open issues * 20 minutes, Masinter Review of other drafts being coordinated with URI / IRI sections, and other committees, and schedules for these. * 20 minutes, discussion: which documents can be ignored, which other documents added? what other committees involved? * 20 minutes, Masinter Working group charter Hum on charter section by section AS IS vs NEEDS WORK vs NO WAY Show of interest in work in working group Proposed Working Group Charter MOTIVATION: =========== Having documents in conflict with widely deployed implementations is a general pain which we should work to correct. This working group proposal is specifically being made to address the problem that the current draft of the HTML 5 specification: http://dev.w3.org/html5/spec/Overview.html#urls in order to match current and expected browser behavior, contains a definition of the term "URL" and associated algorithms. For example, "NOTE: The term "URL" in this specification is used in a manner distinct from the precise technical meaning it is given in RFC 3986. Readers familiar with that RFC will find it easier to read this specification if they pretend the term "URL" as used herein is really called something else altogether. This is a willful violation of RFC 3986. [RFC3986]" In addition, several other specifications also have contained redefinitions of URI-like elements to match their own idea of what should or shouldn't be allowed in a URI when it is expanded to allow non-ASCII characters. The primary goal of this working group is to bring together the authors and editors of the multiple specifications being developed which have independent definitions of what is or isn't a valid (Uniform Internationalized) Resource (Identifier Locator) in a way that conservative producers can produce identifiers that will work with any number of consumers. The fact that the IETF documents for URI and IRI are, in fact, not suitable for direct citation by those wishing to describe deployed browser behavior seems like something the IETF can correct. Further goals are discussed in http://larry.masinter.net/iribis-hack.html, section 12 Charter ======= This working group is scoped to produce a new version of RFC 3987 IRI specification which can be used directly by future W3C XML and HTML specifications, as well as other IETF documents. Current Internet Draft draft-duerst-iri-bis-06 (see also http://larry.masinter.net/iribis-hack.html for update in prep ... will update agenda when ID is available) In addition, the working group MAY consider, if absolutely necessary to accomplish the goal, updates to RFC 3986 (URI) RFC 4395 (URI Guidelines), either within the IRI document (as an 'Updates') or as separate small update documents. NOTE that topics relating to URIs and IRIs not necessary to resolve IETF/W3C differences are Out of Scope. SCHEDULE: Review of Internet Draft(s) and selection of direction October 2009: Review of Internet Drafts and selection of ONE of the two directions December 2009: Working group Last Call March 2010: Publish IRI update as Draft Standard Documents for review by this group: ----------------------------------- As the PRIMARY role of this working group is to bring together a core group to resolve conflicts between various documents in preparation. For this reason the liaison function is most important: IETF HTTPBIS working group http://tools.ietf.org/wg/httpbis/charters preparing: draft-ietf-httpbis-p1-messaging-07 See section 9.2 on HTTP URI scheme HTML5 definition of "URLs" http://dev.w3.org/html5/spec/Overview.html#urls by W3C HTML Working Group charter: http://www.w3.org/html/wg/ URL/IRI issue: http://www.w3.org/html/wg/tracker/issues/56 and [WHATWG] http://www.whatwg.org/ Other Documents currently individual submissions draft-duerst-mailto-bis-06 IETF IDNABIS working group http://www.ietf.org/dyn/wg/charter/idnabis-charter.html W3C TAG issue: http://www.w3.org/2001/tag/group/track/issues/27 |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76Larry, your changes to the IRI draft make it incomprehensible.
I think this is getting ridiculous. We don't need a working group. We don't even need an updated draft, at least not for LEIRI, Href, and whatever it is that we call HTML5 references. HTML5 wants to specify the *process* of taking arbitrary data entry in various places and transforming it into a) something the browser displays, and b) a URI for use on the wire. What they are calling URL is the arbitrary data entry part, NOT the resulting URI, which is why it is so frigging annoying and inconsistent with all other standards. LEIRI made the same mistake. The purpose of IRI is to specify the allowed syntax for what one might see on the side of a bus as a Web address in i18n-friendly, human-readable form. That is why the IRI syntax does not allow common delimiters like whitespace, quotes, and brackets (except for IPv6 literals). It does not define a data-entry box. URI is in the same boat, except that it also defines the allowed syntax for on-the-wire usage in HTTP, etc. It is intentionally limited for use in embedded plain text. It does not define a data entry box. Both IRI and URI are intended to define standards for the Internet in the same way as the US Postal Service residential addresses have a standard normal form. The fact that an envelope does not prevent a person from writing an arbitrary form of address in the hope that a mail carrier can interpret it for them is not an indication that the standard is somehow "wrong" -- what matters is that following the standard is known to be interoperable, and everything else is just an experiment in forgiveness. What HTML5 wants to define is how to process a data entry box in the same way across all browser implementations, and there is nothing wrong with such a definition appearing in HTML5 *except* for the fact that the editor has chosen an existing well-known term that means something else to describe it, which conflicts with all prior uses of that term. Just stop that nonsense by changing the HTML5 draft wording to talk about references, not URLs. HTML5 does not require changes to IRI, and certainly not to URI. Changing IRI (or URI) so that it conforms both to the side of a bus definition and a data entry definition is insane. They are not the same thing. They do not share the same concerns. A reference might allow anything, depending on its context and the technology used to parse it; it is the post-processing that produces an IRI/URI. ....Roy |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76I agree with Roy, and would add that this document hides the new
information (i.e., how to get from random bits to a valid URI or IRI) too deeply; for example, if HTTPbis wanted to reference this thing, it would need to do so by specifying a section in the IRI spec, even though HTTP doesn't use IRIs at all. What I'd like to see is: a. A revision of the IRI spec (if necessary), and b. A new spec defining how to get from random bits to a URI or an IRI (allowing the application to choose which one it needs to end up with). Then, different specs can refer to URIs if they want to, IRIs if they want to, and optionally specify this processing as a step beforehand, and do so clearly. On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote: > Larry, your changes to the IRI draft make it incomprehensible. > > I think this is getting ridiculous. We don't need a working group. > We don't even need an updated draft, at least not for LEIRI, Href, > and whatever it is that we call HTML5 references. > > HTML5 wants to specify the *process* of taking arbitrary data entry > in various places and transforming it into a) something the browser > displays, and b) a URI for use on the wire. What they are calling URL > is the arbitrary data entry part, NOT the resulting URI, which is why > it is so frigging annoying and inconsistent with all other standards. > LEIRI made the same mistake. > > The purpose of IRI is to specify the allowed syntax for what one > might see on the side of a bus as a Web address in i18n-friendly, > human-readable form. That is why the IRI syntax does not allow > common delimiters like whitespace, quotes, and brackets (except > for IPv6 literals). It does not define a data-entry box. > > URI is in the same boat, except that it also defines the allowed > syntax for on-the-wire usage in HTTP, etc. It is intentionally > limited for use in embedded plain text. It does not define a > data entry box. > > Both IRI and URI are intended to define standards for the Internet > in the same way as the US Postal Service residential addresses have > a standard normal form. The fact that an envelope does not prevent > a person from writing an arbitrary form of address in the hope that > a mail carrier can interpret it for them is not an indication that > the standard is somehow "wrong" -- what matters is that following > the standard is known to be interoperable, and everything else is > just an experiment in forgiveness. > > What HTML5 wants to define is how to process a data entry box > in the same way across all browser implementations, and there is > nothing wrong with such a definition appearing in HTML5 *except* > for the fact that the editor has chosen an existing well-known > term that means something else to describe it, which conflicts > with all prior uses of that term. Just stop that nonsense by > changing the HTML5 draft wording to talk about references, not URLs. > HTML5 does not require changes to IRI, and certainly not to URI. > > Changing IRI (or URI) so that it conforms both to the side of a > bus definition and a data entry definition is insane. They are > not the same thing. They do not share the same concerns. A > reference might allow anything, depending on its context and the > technology used to parse it; it is the post-processing that > produces an IRI/URI. > > ....Roy > -- Mark Nottingham http://www.mnot.net/ |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote: > I agree with Roy, and would add that this document hides the new > information (i.e., how to get from random bits to a valid URI or > IRI) too deeply; for example, if HTTPbis wanted to reference this > thing, it would need to do so by specifying a section in the IRI > spec, even though HTTP doesn't use IRIs at all. > > What I'd like to see is: > a. A revision of the IRI spec (if necessary), and > b. A new spec defining how to get from random bits to a URI or an > IRI (allowing the application to choose which one it needs to end up > with). > > Then, different specs can refer to URIs if they want to, IRIs if > they want to, and optionally specify this processing as a step > beforehand, and do so clearly. It seems like the only difference between your proposal and Larry's is whether strict processing of IRIs and lenient processing of strings that may or may not be valid IRIs are in the same spec or two separate specs. I don't see a great advantage in splitting the specs, as this makes cross-references more complicated. If it were just a matter of transforming the kind of string that may appear in an "href" attribute into a valid IRI, then your proposal might be plausible. However, in addition to converting to a URI, HTML UAs also need to be able to do the following to resource identifiers treated with lenient processing: (a) separate into components, even when the string is not a valid URI or IRI, and in a way that is not necessarily equivalent to first converting to a valid IRI or URI; (b) resolve a reference relative to a base when either the reference or the base might not be a valid URI or IRI; (c) determine if a reference is "absolute" even if it might not be a valid URI or IRI. That would mean a great deal of algorithms defined in a totally separate place from the URI spec. This is what the Web Address spec[1] attempted to do, and it ends up duplicating a lot of concepts from IRI/URI. This effort was set aside in favor of IRIbis incorporating the necessary content. I do agree that the lenient processing rules are hidden too deeply in Larry's current draft, but I do not think this is intrinsic to the content being in a single document. Note: it's not clear to me why HTTPbis would want to reference lenient processing rules for URIs/IRIs. Are HTTP servers and proxies not strict in what they accept? Regards, Maciej [1] http://www.w3.org/html/wg/href/draft.html > > > On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote: > >> Larry, your changes to the IRI draft make it incomprehensible. >> >> I think this is getting ridiculous. We don't need a working group. >> We don't even need an updated draft, at least not for LEIRI, Href, >> and whatever it is that we call HTML5 references. >> >> HTML5 wants to specify the *process* of taking arbitrary data entry >> in various places and transforming it into a) something the browser >> displays, and b) a URI for use on the wire. What they are calling >> URL >> is the arbitrary data entry part, NOT the resulting URI, which is why >> it is so frigging annoying and inconsistent with all other standards. >> LEIRI made the same mistake. >> >> The purpose of IRI is to specify the allowed syntax for what one >> might see on the side of a bus as a Web address in i18n-friendly, >> human-readable form. That is why the IRI syntax does not allow >> common delimiters like whitespace, quotes, and brackets (except >> for IPv6 literals). It does not define a data-entry box. >> >> URI is in the same boat, except that it also defines the allowed >> syntax for on-the-wire usage in HTTP, etc. It is intentionally >> limited for use in embedded plain text. It does not define a >> data entry box. >> >> Both IRI and URI are intended to define standards for the Internet >> in the same way as the US Postal Service residential addresses have >> a standard normal form. The fact that an envelope does not prevent >> a person from writing an arbitrary form of address in the hope that >> a mail carrier can interpret it for them is not an indication that >> the standard is somehow "wrong" -- what matters is that following >> the standard is known to be interoperable, and everything else is >> just an experiment in forgiveness. >> >> What HTML5 wants to define is how to process a data entry box >> in the same way across all browser implementations, and there is >> nothing wrong with such a definition appearing in HTML5 *except* >> for the fact that the editor has chosen an existing well-known >> term that means something else to describe it, which conflicts >> with all prior uses of that term. Just stop that nonsense by >> changing the HTML5 draft wording to talk about references, not URLs. >> HTML5 does not require changes to IRI, and certainly not to URI. >> >> Changing IRI (or URI) so that it conforms both to the side of a >> bus definition and a data entry definition is insane. They are >> not the same thing. They do not share the same concerns. A >> reference might allow anything, depending on its context and the >> technology used to parse it; it is the post-processing that >> produces an IRI/URI. >> >> ....Roy >> > > > -- > Mark Nottingham http://www.mnot.net/ > > |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On 27/09/2009, at 9:37 AM, Maciej Stachowiak wrote: > > On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote: > >> I agree with Roy, and would add that this document hides the new >> information (i.e., how to get from random bits to a valid URI or >> IRI) too deeply; for example, if HTTPbis wanted to reference this >> thing, it would need to do so by specifying a section in the IRI >> spec, even though HTTP doesn't use IRIs at all. >> >> What I'd like to see is: >> a. A revision of the IRI spec (if necessary), and >> b. A new spec defining how to get from random bits to a URI or an >> IRI (allowing the application to choose which one it needs to end >> up with). >> >> Then, different specs can refer to URIs if they want to, IRIs if >> they want to, and optionally specify this processing as a step >> beforehand, and do so clearly. > > It seems like the only difference between your proposal and Larry's > is whether strict processing of IRIs and lenient processing of > strings that may or may not be valid IRIs are in the same spec or > two separate specs. > > I don't see a great advantage in splitting the specs, as this makes > cross-references more complicated. I don't have a lie-down-in-the-road issue with structuring these as one document, although I do think it's more natural to separate them. What I want to avoid is having this extra step hidden away in a non- obvious place that's difficult to reference and specify externally; as it currently sits, the processing is specified in an informally named section of the IRI spec, which is the last place I'd look for it if I were working with URIs. So, at a minimum, the section needs to be re-cast as something more prominent and normative (i.e., if someone chooses to conform to it, they should be able to know what that means), and the spec needs to be named to reflect that. > If it were just a matter of transforming the kind of string that may > appear in an "href" attribute into a valid IRI, then your proposal > might be plausible. However, in addition to converting to a URI, > HTML UAs also need to be able to do the following to resource > identifiers treated with lenient processing: (a) separate into > components, even when the string is not a valid URI or IRI, and in a > way that is not necessarily equivalent to first converting to a > valid IRI or URI; (b) resolve a reference relative to a base when > either the reference or the base might not be a valid URI or IRI; > (c) determine if a reference is "absolute" even if it might not be a > valid URI or IRI. That would mean a great deal of algorithms defined > in a totally separate place from the URI spec. This is what the Web > Address spec[1] attempted to do, and it ends up duplicating a lot of > concepts from IRI/URI. This effort was set aside in favor of IRIbis > incorporating the necessary content. > > I do agree that the lenient processing rules are hidden too deeply > in Larry's current draft, but I do not think this is intrinsic to > the content being in a single document. > > Note: it's not clear to me why HTTPbis would want to reference > lenient processing rules for URIs/IRIs. Are HTTP servers and proxies > not strict in what they accept? It's been discussed for the Location header. No decision as of yet, though. If something like Location (i.e., something that needs a URI, not an IRI, as output) needs this algorithm, including this in the IRI spec is going to make things more complex. > > Regards, > Maciej > > [1] http://www.w3.org/html/wg/href/draft.html > >> >> >> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote: >> >>> Larry, your changes to the IRI draft make it incomprehensible. >>> >>> I think this is getting ridiculous. We don't need a working group. >>> We don't even need an updated draft, at least not for LEIRI, Href, >>> and whatever it is that we call HTML5 references. >>> >>> HTML5 wants to specify the *process* of taking arbitrary data entry >>> in various places and transforming it into a) something the browser >>> displays, and b) a URI for use on the wire. What they are calling >>> URL >>> is the arbitrary data entry part, NOT the resulting URI, which is >>> why >>> it is so frigging annoying and inconsistent with all other >>> standards. >>> LEIRI made the same mistake. >>> >>> The purpose of IRI is to specify the allowed syntax for what one >>> might see on the side of a bus as a Web address in i18n-friendly, >>> human-readable form. That is why the IRI syntax does not allow >>> common delimiters like whitespace, quotes, and brackets (except >>> for IPv6 literals). It does not define a data-entry box. >>> >>> URI is in the same boat, except that it also defines the allowed >>> syntax for on-the-wire usage in HTTP, etc. It is intentionally >>> limited for use in embedded plain text. It does not define a >>> data entry box. >>> >>> Both IRI and URI are intended to define standards for the Internet >>> in the same way as the US Postal Service residential addresses have >>> a standard normal form. The fact that an envelope does not prevent >>> a person from writing an arbitrary form of address in the hope that >>> a mail carrier can interpret it for them is not an indication that >>> the standard is somehow "wrong" -- what matters is that following >>> the standard is known to be interoperable, and everything else is >>> just an experiment in forgiveness. >>> >>> What HTML5 wants to define is how to process a data entry box >>> in the same way across all browser implementations, and there is >>> nothing wrong with such a definition appearing in HTML5 *except* >>> for the fact that the editor has chosen an existing well-known >>> term that means something else to describe it, which conflicts >>> with all prior uses of that term. Just stop that nonsense by >>> changing the HTML5 draft wording to talk about references, not URLs. >>> HTML5 does not require changes to IRI, and certainly not to URI. >>> >>> Changing IRI (or URI) so that it conforms both to the side of a >>> bus definition and a data entry definition is insane. They are >>> not the same thing. They do not share the same concerns. A >>> reference might allow anything, depending on its context and the >>> technology used to parse it; it is the post-processing that >>> produces an IRI/URI. >>> >>> ....Roy >>> >> >> >> -- >> Mark Nottingham http://www.mnot.net/ >> >> > -- Mark Nottingham http://www.mnot.net/ |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76The definition for how to perform forgiving processing of resource identifiers originally started out in the HTML5 spec, where you suggest it should go. However, it was moved to a separate document based on strong objections from many parties. I understand from the below that your objection was solely to the use of the term "URL", and not to these processing rules being in the HTML spec. But that was not the sole objection. Many thought it was architecturally wrong to define these rules in the HTML spec. Thus, while I'm sure Ian Hickson would be perfectly happy to put the processing requirements back in HTML5, I'm not sure that is an acceptable long-term solution. Furthermore, besides the general architectural objection, there may be applications and technologies that wish to use HTML-style loose processing rules. Having those rules in the HTML spec instead of in a standalone specification makes it more difficult to reuse the technology. On a more philosophical level: a lot more resource identifiers are extracted from attributes in HTML documents than from the sides of busses. It is not clear to me why the side-of-bus use case should be privileged. IRIs are a standard for the Internet, not for vehicular advertising. And indeed, many print ads these days drop the initial http: from the addresses they print. For an Internet standard, there is nothing wrong with defining rules for lenient processing as well as the syntax of strictly conforming input. Doing so can convert "experiment[s] in forgiveness" into interoperability. Regards, Maciej On Sep 25, 2009, at 12:19 PM, Roy T. Fielding wrote: > Larry, your changes to the IRI draft make it incomprehensible. > > I think this is getting ridiculous. We don't need a working group. > We don't even need an updated draft, at least not for LEIRI, Href, > and whatever it is that we call HTML5 references. > > HTML5 wants to specify the *process* of taking arbitrary data entry > in various places and transforming it into a) something the browser > displays, and b) a URI for use on the wire. What they are calling URL > is the arbitrary data entry part, NOT the resulting URI, which is why > it is so frigging annoying and inconsistent with all other standards. > LEIRI made the same mistake. > > The purpose of IRI is to specify the allowed syntax for what one > might see on the side of a bus as a Web address in i18n-friendly, > human-readable form. That is why the IRI syntax does not allow > common delimiters like whitespace, quotes, and brackets (except > for IPv6 literals). It does not define a data-entry box. > > URI is in the same boat, except that it also defines the allowed > syntax for on-the-wire usage in HTTP, etc. It is intentionally > limited for use in embedded plain text. It does not define a > data entry box. > > Both IRI and URI are intended to define standards for the Internet > in the same way as the US Postal Service residential addresses have > a standard normal form. The fact that an envelope does not prevent > a person from writing an arbitrary form of address in the hope that > a mail carrier can interpret it for them is not an indication that > the standard is somehow "wrong" -- what matters is that following > the standard is known to be interoperable, and everything else is > just an experiment in forgiveness. > > What HTML5 wants to define is how to process a data entry box > in the same way across all browser implementations, and there is > nothing wrong with such a definition appearing in HTML5 *except* > for the fact that the editor has chosen an existing well-known > term that means something else to describe it, which conflicts > with all prior uses of that term. Just stop that nonsense by > changing the HTML5 draft wording to talk about references, not URLs. > HTML5 does not require changes to IRI, and certainly not to URI. > > Changing IRI (or URI) so that it conforms both to the side of a > bus definition and a data entry definition is insane. They are > not the same thing. They do not share the same concerns. A > reference might allow anything, depending on its context and the > technology used to parse it; it is the post-processing that > produces an IRI/URI. > > ....Roy > |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On Sep 26, 2009, at 5:01 PM, Mark Nottingham wrote: > > On 27/09/2009, at 9:37 AM, Maciej Stachowiak wrote: > >> >> On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote: >> >>> I agree with Roy, and would add that this document hides the new >>> information (i.e., how to get from random bits to a valid URI or >>> IRI) too deeply; for example, if HTTPbis wanted to reference this >>> thing, it would need to do so by specifying a section in the IRI >>> spec, even though HTTP doesn't use IRIs at all. >>> >>> What I'd like to see is: >>> a. A revision of the IRI spec (if necessary), and >>> b. A new spec defining how to get from random bits to a URI or an >>> IRI (allowing the application to choose which one it needs to end >>> up with). >>> >>> Then, different specs can refer to URIs if they want to, IRIs if >>> they want to, and optionally specify this processing as a step >>> beforehand, and do so clearly. >> >> It seems like the only difference between your proposal and Larry's >> is whether strict processing of IRIs and lenient processing of >> strings that may or may not be valid IRIs are in the same spec or >> two separate specs. >> >> I don't see a great advantage in splitting the specs, as this makes >> cross-references more complicated. > > I don't have a lie-down-in-the-road issue with structuring these as > one document, although I do think it's more natural to separate > them. What I want to avoid is having this extra step hidden away in > a non-obvious place that's difficult to reference and specify > externally; as it currently sits, the processing is specified in an > informally named section of the IRI spec, which is the last place > I'd look for it if I were working with URIs. > > So, at a minimum, the section needs to be re-cast as something more > prominent and normative (i.e., if someone chooses to conform to it, > they should be able to know what that means), and the spec needs to > be named to reflect that. I agree that the rules should be more prominent and normative. The HTML spec, and any other referencing spec, should be able to cite a specific section for the algorithm to convert a loosely-processed reference into a URI, or to perform a lenient resolution relative to a base, or whatever. And that algorithm should be normatively defined, even if it is only applicable to cases where other specs require lenient processing. [...snip...] >> >> Note: it's not clear to me why HTTPbis would want to reference >> lenient processing rules for URIs/IRIs. Are HTTP servers and >> proxies not strict in what they accept? > > It's been discussed for the Location header. No decision as of yet, > though. > > If something like Location (i.e., something that needs a URI, not an > IRI, as output) needs this algorithm, including this in the IRI spec > is going to make things more complex. It seems like putting these rules in the HTML5 spec would make things even harder than that for the Location header, since it would create a dependency inversion. So perhaps you don't entirely agree with Roy after all? Regards, Maciej > >> >> Regards, >> Maciej >> >> [1] http://www.w3.org/html/wg/href/draft.html >> >>> >>> >>> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote: >>> >>>> Larry, your changes to the IRI draft make it incomprehensible. >>>> >>>> I think this is getting ridiculous. We don't need a working group. >>>> We don't even need an updated draft, at least not for LEIRI, Href, >>>> and whatever it is that we call HTML5 references. >>>> >>>> HTML5 wants to specify the *process* of taking arbitrary data entry >>>> in various places and transforming it into a) something the browser >>>> displays, and b) a URI for use on the wire. What they are >>>> calling URL >>>> is the arbitrary data entry part, NOT the resulting URI, which is >>>> why >>>> it is so frigging annoying and inconsistent with all other >>>> standards. >>>> LEIRI made the same mistake. >>>> >>>> The purpose of IRI is to specify the allowed syntax for what one >>>> might see on the side of a bus as a Web address in i18n-friendly, >>>> human-readable form. That is why the IRI syntax does not allow >>>> common delimiters like whitespace, quotes, and brackets (except >>>> for IPv6 literals). It does not define a data-entry box. >>>> >>>> URI is in the same boat, except that it also defines the allowed >>>> syntax for on-the-wire usage in HTTP, etc. It is intentionally >>>> limited for use in embedded plain text. It does not define a >>>> data entry box. >>>> >>>> Both IRI and URI are intended to define standards for the Internet >>>> in the same way as the US Postal Service residential addresses have >>>> a standard normal form. The fact that an envelope does not prevent >>>> a person from writing an arbitrary form of address in the hope that >>>> a mail carrier can interpret it for them is not an indication that >>>> the standard is somehow "wrong" -- what matters is that following >>>> the standard is known to be interoperable, and everything else is >>>> just an experiment in forgiveness. >>>> >>>> What HTML5 wants to define is how to process a data entry box >>>> in the same way across all browser implementations, and there is >>>> nothing wrong with such a definition appearing in HTML5 *except* >>>> for the fact that the editor has chosen an existing well-known >>>> term that means something else to describe it, which conflicts >>>> with all prior uses of that term. Just stop that nonsense by >>>> changing the HTML5 draft wording to talk about references, not >>>> URLs. >>>> HTML5 does not require changes to IRI, and certainly not to URI. >>>> >>>> Changing IRI (or URI) so that it conforms both to the side of a >>>> bus definition and a data entry definition is insane. They are >>>> not the same thing. They do not share the same concerns. A >>>> reference might allow anything, depending on its context and the >>>> technology used to parse it; it is the post-processing that >>>> produces an IRI/URI. >>>> >>>> ....Roy >>>> >>> >>> >>> -- >>> Mark Nottingham http://www.mnot.net/ >>> >>> >> > > > -- > Mark Nottingham http://www.mnot.net/ > |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On 27/09/2009, at 10:19 AM, Maciej Stachowiak wrote: >>> >>> Note: it's not clear to me why HTTPbis would want to reference >>> lenient processing rules for URIs/IRIs. Are HTTP servers and >>> proxies not strict in what they accept? >> >> It's been discussed for the Location header. No decision as of yet, >> though. >> >> If something like Location (i.e., something that needs a URI, not >> an IRI, as output) needs this algorithm, including this in the IRI >> spec is going to make things more complex. > > It seems like putting these rules in the HTML5 spec would make > things even harder than that for the Location header, since it would > create a dependency inversion. So perhaps you don't entirely agree > with Roy after all? Stranger things have happened. For the benefit of those who haven't heard me droning on about it before, I believe that what HTML5 should be doing is defining: - a syntax specification for HTML5 - a bits -> URI/IRI spec - various other specs as needed - a "browser profile" that pulls all of these various specifications together, along with refs to HTTP, URI, etc. as appropriate In this fashion, the individual functions will be more independent, and thus more easily referenced by other, non-web-browser applications that might need them. Furthermore, they can evolve more independently, as can the browser profile. Cheers, -- Mark Nottingham http://www.mnot.net/ |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On Sep 26, 2009, at 5:13 PM, Maciej Stachowiak wrote:
> The definition for how to perform forgiving processing of resource > identifiers originally started out in the HTML5 spec, where you > suggest it should go. However, it was moved to a separate document > based on strong objections from many parties. I understand from the > below that your objection was solely to the use of the term "URL", > and not to these processing rules being in the HTML spec. But that > was not the sole objection. Many thought it was architecturally > wrong to define these rules in the HTML spec. Thus, while I'm sure > Ian Hickson would be perfectly happy to put the processing > requirements back in HTML5, I'm not sure that is an acceptable long- > term solution. I think it is hopeless to trace back all the screwed-up misunderstandings of Web architecture that led to anyURI, LEIRI, and now HTML5-URL. I think I explained how it is supposed to work, succinctly and to the point where actual text can be applied to the HTML5 draft that will resolve all objections and settle this matter once and for all. If not, then we can deal with those new objections when they arise. > Furthermore, besides the general architectural objection, there may > be applications and technologies that wish to use HTML-style loose > processing rules. Having those rules in the HTML spec instead of in > a standalone specification makes it more difficult to reuse the > technology. Those rules already exist in RFC3986, Appendix B. What does not exist there is the behavior after parsing into the components, since that behavior is entirely application-dependent. If HTML5 wants to define that behavior, it can do so only if the requirements are stated to be specific to browser-like applications. > On a more philosophical level: a lot more resource identifiers are > extracted from attributes in HTML documents than from the sides of > busses. It is not clear to me why the side-of-bus use case should > be privileged. IRIs are a standard for the Internet, not for > vehicular advertising. And indeed, many print ads these days drop > the initial http: from the addresses they print. Also explained in 3986. I don't remember if that was copied into 3987. > For an Internet standard, there is nothing wrong with defining > rules for lenient processing as well as the syntax of strictly > conforming input. Doing so can convert "experiment[s] in > forgiveness" into interoperability. There is nothing wrong with defining correct processing rules for whatever thing you are trying to process, whether those rules be strict or lenient. The problem is saying that the rules are for processing X when in fact you are actually processing Y and then unilaterally declaring that Y is the new definition of X. ....Roy |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On Sep 26, 2009, at 6:06 PM, Roy T. Fielding wrote: > On Sep 26, 2009, at 5:13 PM, Maciej Stachowiak wrote: > >> The definition for how to perform forgiving processing of resource >> identifiers originally started out in the HTML5 spec, where you >> suggest it should go. However, it was moved to a separate document >> based on strong objections from many parties. I understand from the >> below that your objection was solely to the use of the term "URL", >> and not to these processing rules being in the HTML spec. But that >> was not the sole objection. Many thought it was architecturally >> wrong to define these rules in the HTML spec. Thus, while I'm sure >> Ian Hickson would be perfectly happy to put the processing >> requirements back in HTML5, I'm not sure that is an acceptable long- >> term solution. > > I think it is hopeless to trace back all the screwed-up > misunderstandings > of Web architecture that led to anyURI, LEIRI, and now HTML5-URL. > I think I explained how it is supposed to work, succinctly and to the > point where actual text can be applied to the HTML5 draft that will > resolve all objections and settle this matter once and for all. > If not, then we can deal with those new objections when they arise. I think removing the use of the term "URL" from HTML5 would remove some objections, but I don't think folding the text of Web Address into HTML5 would address any objections, except perhaps the concern about lack of timely progress in this area. > >> Furthermore, besides the general architectural objection, there may >> be applications and technologies that wish to use HTML-style loose >> processing rules. Having those rules in the HTML spec instead of in >> a standalone specification makes it more difficult to reuse the >> technology. > > Those rules already exist in RFC3986, Appendix B. What does not > exist there is the behavior after parsing into the components, > since that behavior is entirely application-dependent. If HTML5 > wants to define that behavior, it can do so only if the requirements > are stated to be specific to browser-like applications. As far as I can tell, RFC3986 *only* defines how to extract components. It does not define how to turn an arbitrary string into a URI, which is potentially needed for HTTPbis. It does not define how to perform a relative resolution on a possibly-invalid reference against a possibly-invalid base. That being said, I think what RFC3986 Appendix B says is a good definition of how to extract components from possibly-invalid strings. It seems way easier to understand than what the Web Address draft says, and better matches what implementations actually do. > >> On a more philosophical level: a lot more resource identifiers are >> extracted from attributes in HTML documents than from the sides of >> busses. It is not clear to me why the side-of-bus use case should >> be privileged. IRIs are a standard for the Internet, not for >> vehicular advertising. And indeed, many print ads these days drop >> the initial http: from the addresses they print. > > Also explained in 3986. I don't remember if that was copied into > 3987. Either way, the upshot is that strings may appear in bus ads that are not allowed to appear in format or protocol elements that require an IRI. > >> For an Internet standard, there is nothing wrong with defining >> rules for lenient processing as well as the syntax of strictly >> conforming input. Doing so can convert "experiment[s] in >> forgiveness" into interoperability. > > There is nothing wrong with defining correct processing rules for > whatever thing you are trying to process, whether those rules be > strict or lenient. The problem is saying that the rules are for > processing X when in fact you are actually processing Y and then > unilaterally declaring that Y is the new definition of X. I don't think Larry proposed to do that. He just suggested that a certain form of reusable lenient processing rules should be in the same spec as the normative definition of an IRI. I don't think he suggested that these rules should redefine what an IRI is. Regards, Maciej |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76Maciej Stachowiak wrote:
> ... > I don't see a great advantage in splitting the specs, as this makes > cross-references more complicated. If it were just a matter of > transforming the kind of string that may appear in an "href" attribute > into a valid IRI, then your proposal might be plausible. However, in > addition to converting to a URI, HTML UAs also need to be able to do the > following to resource identifiers treated with lenient processing: (a) > separate into components, even when the string is not a valid URI or > IRI, and in a way that is not necessarily equivalent to first converting > to a valid IRI or URI; (b) resolve a reference relative to a base when > either the reference or the base might not be a valid URI or IRI; (c) > determine if a reference is "absolute" even if it might not be a valid > URI or IRI. That would mean a great deal of algorithms defined in a > totally separate place from the URI spec. This is what the Web Address > spec[1] attempted to do, and it ends up duplicating a lot of concepts > from IRI/URI. This effort was set aside in favor of IRIbis incorporating > the necessary content. > ... a) As already mentioned in this thread, it appears this is covered by the text in <http://tools.ietf.org/html/rfc3986#appendix-B>. b) Could you remind us why it's not possible to translate both to valid URI/IRI and references first and then use the standard behavior? c) Again, see <http://tools.ietf.org/html/rfc3986#appendix-B>. I do realize that a more normative way of what Appendix B says may be needed, but please let's not dismiss what's already in the spec as unusable. BR, Julian |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76On Sep 26, 2009, at 11:51 PM, Julian Reschke wrote: > Maciej Stachowiak wrote: >> ... >> I don't see a great advantage in splitting the specs, as this makes >> cross-references more complicated. If it were just a matter of >> transforming the kind of string that may appear in an "href" >> attribute into a valid IRI, then your proposal might be plausible. >> However, in addition to converting to a URI, HTML UAs also need to >> be able to do the following to resource identifiers treated with >> lenient processing: (a) separate into components, even when the >> string is not a valid URI or IRI, and in a way that is not >> necessarily equivalent to first converting to a valid IRI or URI; >> (b) resolve a reference relative to a base when either the >> reference or the base might not be a valid URI or IRI; (c) >> determine if a reference is "absolute" even if it might not be a >> valid URI or IRI. That would mean a great deal of algorithms >> defined in a totally separate place from the URI spec. This is what >> the Web Address spec[1] attempted to do, and it ends up duplicating >> a lot of concepts from IRI/URI. This effort was set aside in favor >> of IRIbis incorporating the necessary content. >> ... > > a) As already mentioned in this thread, it appears this is covered > by the text in <http://tools.ietf.org/html/rfc3986#appendix-B>. I haven't thought carefully about Appendix B to determine if its component splitting does the right thing in all cases. Tentatively it looks about right. > b) Could you remind us why it's not possible to translate both to > valid URI/IRI and references first and then use the standard behavior? That is what the Web Address algorithm (the former HTML5 algorithm) in fact does, see step 9: <http://www.w3.org/html/wg/href/draft.html#resolving-urls >. However, the steps to translate to valid URIs (and the required postprocessing rules) are quite involved and must be specified somewhere. > > c) Again, see <http://tools.ietf.org/html/rfc3986#appendix-B>. I can't see where Appenix B defines this. > > I do realize that a more normative way of what Appendix B says may > be needed, but please let's not dismiss what's already in the spec > as unusable. I don't believe I dismissed it. But I agree, a normative version is needed. Regards, Maciej |
|
|
Re: Proposed Charter and Agenda for IRI BOF at IETF 76Maciej Stachowiak wrote:
> ... > That is what the Web Address algorithm (the former HTML5 algorithm) in > fact does, see step 9: > <http://www.w3.org/html/wg/href/draft.html#resolving-urls>. However, the > steps to translate to valid URIs (and the required postprocessing rules) > are quite involved and must be specified somewhere. > ... Yes. I think there's consensus that this should be defined for HTML. >> c) Again, see <http://tools.ietf.org/html/rfc3986#appendix-B>. > > I can't see where Appenix B defines this. It's absolute if it has a scheme. Or does this include absolute paths as well? Should still be simple to determine from the parts extracted by the regexp. >> I do realize that a more normative way of what Appendix B says may be >> needed, but please let's not dismiss what's already in the spec as >> unusable. > > I don't believe I dismissed it. But I agree, a normative version is needed. > ... That was a general comment; I didn't indent to claim *you* dismissed it. BR, Julian |
| Free embeddable forum powered by Nabble | Forum Help |