Proposed Charter and Agenda for IRI BOF at IETF 76

View: New views
13 Messages — Rating Filter:   Alert me  

Proposed Charter and Agenda for IRI BOF at IETF 76

by Larry Masinter-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

While I am still hoping a working group isn't necessary, the
fallback is to start an IETF working group. This message
proposes an agenda for a meeting to form the working group
("BOF" at IETF 76) and a draft charter.

Is there interest in participating in a BOF at the
Hiroshima meeting? Please discuss on public-iri@....



AGENDA:
=======
The primary agenda is to review the charter and discover
the level of community interest:

90 minute agenda

* 10 minutes, Masinter
  Review of URI and IRI history & current documents
* 20 minutes, Masinter & Duerst
   Review of IRI draft(s) and open issues
* 20 minutes, Masinter
  Review of other drafts being coordinated
  with URI / IRI sections, and other
   committees, and schedules for these.
* 20 minutes, discussion: which documents can
   be ignored, which other documents added?
   what other committees involved?
* 20 minutes, Masinter
  Working group charter
    Hum on charter section by section
         AS IS vs NEEDS WORK vs NO WAY
    Show of interest in work in working group

Proposed Working Group Charter

MOTIVATION:
===========

Having documents in conflict with widely deployed implementations is a
general pain which we should work to correct. This working group
proposal is specifically being made to address the problem that the
current draft of the HTML 5 specification:

   http://dev.w3.org/html5/spec/Overview.html#urls

in order to match current and expected browser behavior, contains a
definition of the term "URL" and associated algorithms.  For example,
 
 "NOTE: The term "URL" in this specification is used in a manner
  distinct from the precise technical meaning it is given in RFC
  3986. Readers familiar with that RFC will find it easier to read this
  specification if they pretend the term "URL" as used herein is really
  called something else altogether. This is a willful violation of RFC
  3986. [RFC3986]"

In addition, several other specifications also have contained
redefinitions of URI-like elements to match their own idea of what
should or shouldn't be allowed in a URI when it is expanded to allow
non-ASCII characters.

The primary goal of this working group is to bring together the
authors and editors of the multiple specifications being developed
which have independent definitions of what is or isn't a valid
(Uniform Internationalized) Resource (Identifier Locator) in a way
that conservative producers can produce identifiers that will work
with any number of consumers.

The fact that the IETF documents for URI and IRI are, in fact, not
suitable for direct citation by those wishing to describe deployed
browser behavior seems like something the IETF can correct.

Further goals are discussed in
     http://larry.masinter.net/iribis-hack.html, section 12

Charter
=======

This working group is scoped to produce a new version of
RFC 3987 IRI specification which can be used directly by
future W3C XML and HTML specifications, as well as other
IETF documents.

Current Internet Draft
       draft-duerst-iri-bis-06

(see also http://larry.masinter.net/iribis-hack.html for
update in prep ... will update agenda when ID is available)


In addition, the working group MAY consider, if absolutely
necessary to accomplish the goal, updates to RFC 3986  (URI)
RFC 4395  (URI Guidelines), either within the IRI document
(as an 'Updates') or as separate small update documents.
NOTE that topics relating to URIs and IRIs not necessary
to resolve IETF/W3C differences are Out of Scope.

SCHEDULE:

Review of Internet Draft(s) and selection of direction
October 2009: Review of Internet Drafts and selection
    of ONE of the two directions
December 2009: Working group Last Call
March 2010:  Publish IRI update as Draft Standard

Documents for review by this group:
-----------------------------------
As the PRIMARY role of this working group is to bring together a core
group to resolve conflicts between various documents in
preparation. For this reason the liaison function is most important:

IETF HTTPBIS working group
   http://tools.ietf.org/wg/httpbis/charters
preparing:
    draft-ietf-httpbis-p1-messaging-07
    See section 9.2 on HTTP URI scheme

HTML5 definition of "URLs"
      http://dev.w3.org/html5/spec/Overview.html#urls 
   by W3C HTML Working Group
     charter: http://www.w3.org/html/wg/
     URL/IRI issue: http://www.w3.org/html/wg/tracker/issues/56
  and  [WHATWG]  http://www.whatwg.org/
 
Other Documents currently individual submissions
    draft-duerst-mailto-bis-06

IETF IDNABIS working group
  http://www.ietf.org/dyn/wg/charter/idnabis-charter.html

W3C TAG
 issue: http://www.w3.org/2001/tag/group/track/issues/27




Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Larry, your changes to the IRI draft make it incomprehensible.

I think this is getting ridiculous.  We don't need a working group.
We don't even need an updated draft, at least not for LEIRI, Href,
and whatever it is that we call HTML5 references.

HTML5 wants to specify the *process* of taking arbitrary data entry
in various places and transforming it into a) something the browser
displays, and b) a URI for use on the wire.  What they are calling URL
is the arbitrary data entry part, NOT the resulting URI, which is why
it is so frigging annoying and inconsistent with all other standards.
LEIRI made the same mistake.

The purpose of IRI is to specify the allowed syntax for what one
might see on the side of a bus as a Web address in i18n-friendly,
human-readable form.  That is why the IRI syntax does not allow
common delimiters like whitespace, quotes, and brackets (except
for IPv6 literals).  It does not define a data-entry box.

URI is in the same boat, except that it also defines the allowed
syntax for on-the-wire usage in HTTP, etc.  It is intentionally
limited for use in embedded plain text.  It does not define a
data entry box.

Both IRI and URI are intended to define standards for the Internet
in the same way as the US Postal Service residential addresses have
a standard normal form.  The fact that an envelope does not prevent
a person from writing an arbitrary form of address in the hope that
a mail carrier can interpret it for them is not an indication that
the standard is somehow "wrong" -- what matters is that following
the standard is known to be interoperable, and everything else is
just an experiment in forgiveness.

What HTML5 wants to define is how to process a data entry box
in the same way across all browser implementations, and there is
nothing wrong with such a definition appearing in HTML5 *except*
for the fact that the editor has chosen an existing well-known
term that means something else to describe it, which conflicts
with all prior uses of that term.  Just stop that nonsense by
changing the HTML5 draft wording to talk about references, not URLs.
HTML5 does not require changes to IRI, and certainly not to URI.

Changing IRI (or URI) so that it conforms both to the side of a
bus definition and a data entry definition is insane.  They are
not the same thing.  They do not share the same concerns.  A
reference might allow anything, depending on its context and the
technology used to parse it; it is the post-processing that
produces an IRI/URI.

....Roy


Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I agree with Roy, and would add that this document hides the new  
information (i.e., how to get from random bits to a valid URI or IRI)  
too deeply; for example, if HTTPbis wanted to reference this thing, it  
would need to do so by specifying a section in the IRI spec, even  
though HTTP doesn't use IRIs at all.

What I'd like to see is:
   a. A revision of the IRI spec (if necessary), and
   b. A new spec defining how to get from random bits to a URI or an  
IRI (allowing the application to choose which one it needs to end up  
with).

Then, different specs can refer to URIs if they want to, IRIs if they  
want to, and optionally specify this processing as a step beforehand,  
and do so clearly.


On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote:

> Larry, your changes to the IRI draft make it incomprehensible.
>
> I think this is getting ridiculous.  We don't need a working group.
> We don't even need an updated draft, at least not for LEIRI, Href,
> and whatever it is that we call HTML5 references.
>
> HTML5 wants to specify the *process* of taking arbitrary data entry
> in various places and transforming it into a) something the browser
> displays, and b) a URI for use on the wire.  What they are calling URL
> is the arbitrary data entry part, NOT the resulting URI, which is why
> it is so frigging annoying and inconsistent with all other standards.
> LEIRI made the same mistake.
>
> The purpose of IRI is to specify the allowed syntax for what one
> might see on the side of a bus as a Web address in i18n-friendly,
> human-readable form.  That is why the IRI syntax does not allow
> common delimiters like whitespace, quotes, and brackets (except
> for IPv6 literals).  It does not define a data-entry box.
>
> URI is in the same boat, except that it also defines the allowed
> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
> limited for use in embedded plain text.  It does not define a
> data entry box.
>
> Both IRI and URI are intended to define standards for the Internet
> in the same way as the US Postal Service residential addresses have
> a standard normal form.  The fact that an envelope does not prevent
> a person from writing an arbitrary form of address in the hope that
> a mail carrier can interpret it for them is not an indication that
> the standard is somehow "wrong" -- what matters is that following
> the standard is known to be interoperable, and everything else is
> just an experiment in forgiveness.
>
> What HTML5 wants to define is how to process a data entry box
> in the same way across all browser implementations, and there is
> nothing wrong with such a definition appearing in HTML5 *except*
> for the fact that the editor has chosen an existing well-known
> term that means something else to describe it, which conflicts
> with all prior uses of that term.  Just stop that nonsense by
> changing the HTML5 draft wording to talk about references, not URLs.
> HTML5 does not require changes to IRI, and certainly not to URI.
>
> Changing IRI (or URI) so that it conforms both to the side of a
> bus definition and a data entry definition is insane.  They are
> not the same thing.  They do not share the same concerns.  A
> reference might allow anything, depending on its context and the
> technology used to parse it; it is the post-processing that
> produces an IRI/URI.
>
> ....Roy
>


--
Mark Nottingham     http://www.mnot.net/



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote:

> I agree with Roy, and would add that this document hides the new  
> information (i.e., how to get from random bits to a valid URI or  
> IRI) too deeply; for example, if HTTPbis wanted to reference this  
> thing, it would need to do so by specifying a section in the IRI  
> spec, even though HTTP doesn't use IRIs at all.
>
> What I'd like to see is:
>  a. A revision of the IRI spec (if necessary), and
>  b. A new spec defining how to get from random bits to a URI or an  
> IRI (allowing the application to choose which one it needs to end up  
> with).
>
> Then, different specs can refer to URIs if they want to, IRIs if  
> they want to, and optionally specify this processing as a step  
> beforehand, and do so clearly.

It seems like the only difference between your proposal and Larry's is  
whether strict processing of IRIs and lenient processing of strings  
that may or may not be valid IRIs are in the same spec or two separate  
specs.

I don't see a great advantage in splitting the specs, as this makes  
cross-references more complicated. If it were just a matter of  
transforming the kind of string that may appear in an "href" attribute  
into a valid IRI, then your proposal might be plausible. However, in  
addition to converting to a URI, HTML UAs also need to be able to do  
the following to resource identifiers treated with lenient processing:  
(a) separate into components, even when the string is not a valid URI  
or IRI, and in a way that is not necessarily equivalent to first  
converting to a valid IRI or URI; (b) resolve a reference relative to  
a base when either the reference or the base might not be a valid URI  
or IRI; (c) determine if a reference is "absolute" even if it might  
not be a valid URI or IRI. That would mean a great deal of algorithms  
defined in a totally separate place from the URI spec. This is what  
the Web Address spec[1] attempted to do, and it ends up duplicating a  
lot of concepts from IRI/URI. This effort was set aside in favor of  
IRIbis incorporating the necessary content.

I do agree that the lenient processing rules are hidden too deeply in  
Larry's current draft, but I do not think this is intrinsic to the  
content being in a single document.

Note: it's not clear to me why HTTPbis would want to reference lenient  
processing rules for URIs/IRIs. Are HTTP servers and proxies not  
strict in what they accept?

Regards,
Maciej

[1] http://www.w3.org/html/wg/href/draft.html

>
>
> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote:
>
>> Larry, your changes to the IRI draft make it incomprehensible.
>>
>> I think this is getting ridiculous.  We don't need a working group.
>> We don't even need an updated draft, at least not for LEIRI, Href,
>> and whatever it is that we call HTML5 references.
>>
>> HTML5 wants to specify the *process* of taking arbitrary data entry
>> in various places and transforming it into a) something the browser
>> displays, and b) a URI for use on the wire.  What they are calling  
>> URL
>> is the arbitrary data entry part, NOT the resulting URI, which is why
>> it is so frigging annoying and inconsistent with all other standards.
>> LEIRI made the same mistake.
>>
>> The purpose of IRI is to specify the allowed syntax for what one
>> might see on the side of a bus as a Web address in i18n-friendly,
>> human-readable form.  That is why the IRI syntax does not allow
>> common delimiters like whitespace, quotes, and brackets (except
>> for IPv6 literals).  It does not define a data-entry box.
>>
>> URI is in the same boat, except that it also defines the allowed
>> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
>> limited for use in embedded plain text.  It does not define a
>> data entry box.
>>
>> Both IRI and URI are intended to define standards for the Internet
>> in the same way as the US Postal Service residential addresses have
>> a standard normal form.  The fact that an envelope does not prevent
>> a person from writing an arbitrary form of address in the hope that
>> a mail carrier can interpret it for them is not an indication that
>> the standard is somehow "wrong" -- what matters is that following
>> the standard is known to be interoperable, and everything else is
>> just an experiment in forgiveness.
>>
>> What HTML5 wants to define is how to process a data entry box
>> in the same way across all browser implementations, and there is
>> nothing wrong with such a definition appearing in HTML5 *except*
>> for the fact that the editor has chosen an existing well-known
>> term that means something else to describe it, which conflicts
>> with all prior uses of that term.  Just stop that nonsense by
>> changing the HTML5 draft wording to talk about references, not URLs.
>> HTML5 does not require changes to IRI, and certainly not to URI.
>>
>> Changing IRI (or URI) so that it conforms both to the side of a
>> bus definition and a data entry definition is insane.  They are
>> not the same thing.  They do not share the same concerns.  A
>> reference might allow anything, depending on its context and the
>> technology used to parse it; it is the post-processing that
>> produces an IRI/URI.
>>
>> ....Roy
>>
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
>



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 27/09/2009, at 9:37 AM, Maciej Stachowiak wrote:

>
> On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote:
>
>> I agree with Roy, and would add that this document hides the new  
>> information (i.e., how to get from random bits to a valid URI or  
>> IRI) too deeply; for example, if HTTPbis wanted to reference this  
>> thing, it would need to do so by specifying a section in the IRI  
>> spec, even though HTTP doesn't use IRIs at all.
>>
>> What I'd like to see is:
>> a. A revision of the IRI spec (if necessary), and
>> b. A new spec defining how to get from random bits to a URI or an  
>> IRI (allowing the application to choose which one it needs to end  
>> up with).
>>
>> Then, different specs can refer to URIs if they want to, IRIs if  
>> they want to, and optionally specify this processing as a step  
>> beforehand, and do so clearly.
>
> It seems like the only difference between your proposal and Larry's  
> is whether strict processing of IRIs and lenient processing of  
> strings that may or may not be valid IRIs are in the same spec or  
> two separate specs.
>
> I don't see a great advantage in splitting the specs, as this makes  
> cross-references more complicated.

I don't have a lie-down-in-the-road issue with structuring these as  
one document, although I do think it's more natural to separate them.  
What I want to avoid is having this extra step hidden away in a non-
obvious place that's difficult to reference and specify externally; as  
it currently sits, the processing is specified in an informally named  
section of the IRI spec, which is the last place I'd look for it if I  
were working with URIs.

So, at a minimum, the section needs to be re-cast as something more  
prominent and normative (i.e., if someone chooses to conform to it,  
they should be able to know what that means), and the spec needs to be  
named to reflect that.

> If it were just a matter of transforming the kind of string that may  
> appear in an "href" attribute into a valid IRI, then your proposal  
> might be plausible. However, in addition to converting to a URI,  
> HTML UAs also need to be able to do the following to resource  
> identifiers treated with lenient processing: (a) separate into  
> components, even when the string is not a valid URI or IRI, and in a  
> way that is not necessarily equivalent to first converting to a  
> valid IRI or URI; (b) resolve a reference relative to a base when  
> either the reference or the base might not be a valid URI or IRI;  
> (c) determine if a reference is "absolute" even if it might not be a  
> valid URI or IRI. That would mean a great deal of algorithms defined  
> in a totally separate place from the URI spec. This is what the Web  
> Address spec[1] attempted to do, and it ends up duplicating a lot of  
> concepts from IRI/URI. This effort was set aside in favor of IRIbis  
> incorporating the necessary content.
>
> I do agree that the lenient processing rules are hidden too deeply  
> in Larry's current draft, but I do not think this is intrinsic to  
> the content being in a single document.
>
> Note: it's not clear to me why HTTPbis would want to reference  
> lenient processing rules for URIs/IRIs. Are HTTP servers and proxies  
> not strict in what they accept?

It's been discussed for the Location header. No decision as of yet,  
though.

If something like Location (i.e., something that needs a URI, not an  
IRI, as output) needs this algorithm, including this in the IRI spec  
is going to make things more complex.

>
> Regards,
> Maciej
>
> [1] http://www.w3.org/html/wg/href/draft.html
>
>>
>>
>> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote:
>>
>>> Larry, your changes to the IRI draft make it incomprehensible.
>>>
>>> I think this is getting ridiculous.  We don't need a working group.
>>> We don't even need an updated draft, at least not for LEIRI, Href,
>>> and whatever it is that we call HTML5 references.
>>>
>>> HTML5 wants to specify the *process* of taking arbitrary data entry
>>> in various places and transforming it into a) something the browser
>>> displays, and b) a URI for use on the wire.  What they are calling  
>>> URL
>>> is the arbitrary data entry part, NOT the resulting URI, which is  
>>> why
>>> it is so frigging annoying and inconsistent with all other  
>>> standards.
>>> LEIRI made the same mistake.
>>>
>>> The purpose of IRI is to specify the allowed syntax for what one
>>> might see on the side of a bus as a Web address in i18n-friendly,
>>> human-readable form.  That is why the IRI syntax does not allow
>>> common delimiters like whitespace, quotes, and brackets (except
>>> for IPv6 literals).  It does not define a data-entry box.
>>>
>>> URI is in the same boat, except that it also defines the allowed
>>> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
>>> limited for use in embedded plain text.  It does not define a
>>> data entry box.
>>>
>>> Both IRI and URI are intended to define standards for the Internet
>>> in the same way as the US Postal Service residential addresses have
>>> a standard normal form.  The fact that an envelope does not prevent
>>> a person from writing an arbitrary form of address in the hope that
>>> a mail carrier can interpret it for them is not an indication that
>>> the standard is somehow "wrong" -- what matters is that following
>>> the standard is known to be interoperable, and everything else is
>>> just an experiment in forgiveness.
>>>
>>> What HTML5 wants to define is how to process a data entry box
>>> in the same way across all browser implementations, and there is
>>> nothing wrong with such a definition appearing in HTML5 *except*
>>> for the fact that the editor has chosen an existing well-known
>>> term that means something else to describe it, which conflicts
>>> with all prior uses of that term.  Just stop that nonsense by
>>> changing the HTML5 draft wording to talk about references, not URLs.
>>> HTML5 does not require changes to IRI, and certainly not to URI.
>>>
>>> Changing IRI (or URI) so that it conforms both to the side of a
>>> bus definition and a data entry definition is insane.  They are
>>> not the same thing.  They do not share the same concerns.  A
>>> reference might allow anything, depending on its context and the
>>> technology used to parse it; it is the post-processing that
>>> produces an IRI/URI.
>>>
>>> ....Roy
>>>
>>
>>
>> --
>> Mark Nottingham     http://www.mnot.net/
>>
>>
>


--
Mark Nottingham     http://www.mnot.net/



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The definition for how to perform forgiving processing of resource  
identifiers originally started out in the HTML5 spec, where you  
suggest it should go. However, it was moved to a separate document  
based on strong objections from many parties. I understand from the  
below that your objection was solely to the use of the term "URL", and  
not to these processing rules being in the HTML spec. But that was not  
the sole objection. Many thought it was architecturally wrong to  
define these rules in the HTML spec. Thus, while I'm sure Ian Hickson  
would be perfectly happy to put the processing requirements back in  
HTML5, I'm not sure that is an acceptable long-term solution.

Furthermore, besides the general architectural objection, there may be  
applications and technologies that wish to use HTML-style loose  
processing rules. Having those rules in the HTML spec instead of in a  
standalone specification makes it more difficult to reuse the  
technology.

On a more philosophical level: a lot more resource identifiers are  
extracted from attributes in HTML documents than from the sides of  
busses. It is not clear to me why the side-of-bus use case should be  
privileged. IRIs are a standard for the Internet, not for vehicular  
advertising. And indeed, many print ads these days drop the initial  
http: from the addresses they print.

For an Internet standard, there is nothing wrong with defining rules  
for lenient processing as well as the syntax of strictly conforming  
input. Doing so can convert "experiment[s] in forgiveness" into  
interoperability.

Regards,
Maciej

On Sep 25, 2009, at 12:19 PM, Roy T. Fielding wrote:

> Larry, your changes to the IRI draft make it incomprehensible.
>
> I think this is getting ridiculous.  We don't need a working group.
> We don't even need an updated draft, at least not for LEIRI, Href,
> and whatever it is that we call HTML5 references.
>
> HTML5 wants to specify the *process* of taking arbitrary data entry
> in various places and transforming it into a) something the browser
> displays, and b) a URI for use on the wire.  What they are calling URL
> is the arbitrary data entry part, NOT the resulting URI, which is why
> it is so frigging annoying and inconsistent with all other standards.
> LEIRI made the same mistake.
>
> The purpose of IRI is to specify the allowed syntax for what one
> might see on the side of a bus as a Web address in i18n-friendly,
> human-readable form.  That is why the IRI syntax does not allow
> common delimiters like whitespace, quotes, and brackets (except
> for IPv6 literals).  It does not define a data-entry box.
>
> URI is in the same boat, except that it also defines the allowed
> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
> limited for use in embedded plain text.  It does not define a
> data entry box.
>
> Both IRI and URI are intended to define standards for the Internet
> in the same way as the US Postal Service residential addresses have
> a standard normal form.  The fact that an envelope does not prevent
> a person from writing an arbitrary form of address in the hope that
> a mail carrier can interpret it for them is not an indication that
> the standard is somehow "wrong" -- what matters is that following
> the standard is known to be interoperable, and everything else is
> just an experiment in forgiveness.
>
> What HTML5 wants to define is how to process a data entry box
> in the same way across all browser implementations, and there is
> nothing wrong with such a definition appearing in HTML5 *except*
> for the fact that the editor has chosen an existing well-known
> term that means something else to describe it, which conflicts
> with all prior uses of that term.  Just stop that nonsense by
> changing the HTML5 draft wording to talk about references, not URLs.
> HTML5 does not require changes to IRI, and certainly not to URI.
>
> Changing IRI (or URI) so that it conforms both to the side of a
> bus definition and a data entry definition is insane.  They are
> not the same thing.  They do not share the same concerns.  A
> reference might allow anything, depending on its context and the
> technology used to parse it; it is the post-processing that
> produces an IRI/URI.
>
> ....Roy
>



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 26, 2009, at 5:01 PM, Mark Nottingham wrote:

>
> On 27/09/2009, at 9:37 AM, Maciej Stachowiak wrote:
>
>>
>> On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote:
>>
>>> I agree with Roy, and would add that this document hides the new  
>>> information (i.e., how to get from random bits to a valid URI or  
>>> IRI) too deeply; for example, if HTTPbis wanted to reference this  
>>> thing, it would need to do so by specifying a section in the IRI  
>>> spec, even though HTTP doesn't use IRIs at all.
>>>
>>> What I'd like to see is:
>>> a. A revision of the IRI spec (if necessary), and
>>> b. A new spec defining how to get from random bits to a URI or an  
>>> IRI (allowing the application to choose which one it needs to end  
>>> up with).
>>>
>>> Then, different specs can refer to URIs if they want to, IRIs if  
>>> they want to, and optionally specify this processing as a step  
>>> beforehand, and do so clearly.
>>
>> It seems like the only difference between your proposal and Larry's  
>> is whether strict processing of IRIs and lenient processing of  
>> strings that may or may not be valid IRIs are in the same spec or  
>> two separate specs.
>>
>> I don't see a great advantage in splitting the specs, as this makes  
>> cross-references more complicated.
>
> I don't have a lie-down-in-the-road issue with structuring these as  
> one document, although I do think it's more natural to separate  
> them. What I want to avoid is having this extra step hidden away in  
> a non-obvious place that's difficult to reference and specify  
> externally; as it currently sits, the processing is specified in an  
> informally named section of the IRI spec, which is the last place  
> I'd look for it if I were working with URIs.
>
> So, at a minimum, the section needs to be re-cast as something more  
> prominent and normative (i.e., if someone chooses to conform to it,  
> they should be able to know what that means), and the spec needs to  
> be named to reflect that.

I agree that the rules should be more prominent and normative. The  
HTML spec, and any other referencing spec, should be able to cite a  
specific section for the algorithm to convert a loosely-processed  
reference into a URI, or to perform a lenient resolution relative to a  
base, or whatever. And that algorithm should be normatively defined,  
even if it is only applicable to cases where other specs require  
lenient processing.

[...snip...]

>>
>> Note: it's not clear to me why HTTPbis would want to reference  
>> lenient processing rules for URIs/IRIs. Are HTTP servers and  
>> proxies not strict in what they accept?
>
> It's been discussed for the Location header. No decision as of yet,  
> though.
>
> If something like Location (i.e., something that needs a URI, not an  
> IRI, as output) needs this algorithm, including this in the IRI spec  
> is going to make things more complex.

It seems like putting these rules in the HTML5 spec would make things  
even harder than that for the Location header, since it would create a  
dependency inversion. So perhaps you don't entirely agree with Roy  
after all?

Regards,
Maciej


>
>>
>> Regards,
>> Maciej
>>
>> [1] http://www.w3.org/html/wg/href/draft.html
>>
>>>
>>>
>>> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote:
>>>
>>>> Larry, your changes to the IRI draft make it incomprehensible.
>>>>
>>>> I think this is getting ridiculous.  We don't need a working group.
>>>> We don't even need an updated draft, at least not for LEIRI, Href,
>>>> and whatever it is that we call HTML5 references.
>>>>
>>>> HTML5 wants to specify the *process* of taking arbitrary data entry
>>>> in various places and transforming it into a) something the browser
>>>> displays, and b) a URI for use on the wire.  What they are  
>>>> calling URL
>>>> is the arbitrary data entry part, NOT the resulting URI, which is  
>>>> why
>>>> it is so frigging annoying and inconsistent with all other  
>>>> standards.
>>>> LEIRI made the same mistake.
>>>>
>>>> The purpose of IRI is to specify the allowed syntax for what one
>>>> might see on the side of a bus as a Web address in i18n-friendly,
>>>> human-readable form.  That is why the IRI syntax does not allow
>>>> common delimiters like whitespace, quotes, and brackets (except
>>>> for IPv6 literals).  It does not define a data-entry box.
>>>>
>>>> URI is in the same boat, except that it also defines the allowed
>>>> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
>>>> limited for use in embedded plain text.  It does not define a
>>>> data entry box.
>>>>
>>>> Both IRI and URI are intended to define standards for the Internet
>>>> in the same way as the US Postal Service residential addresses have
>>>> a standard normal form.  The fact that an envelope does not prevent
>>>> a person from writing an arbitrary form of address in the hope that
>>>> a mail carrier can interpret it for them is not an indication that
>>>> the standard is somehow "wrong" -- what matters is that following
>>>> the standard is known to be interoperable, and everything else is
>>>> just an experiment in forgiveness.
>>>>
>>>> What HTML5 wants to define is how to process a data entry box
>>>> in the same way across all browser implementations, and there is
>>>> nothing wrong with such a definition appearing in HTML5 *except*
>>>> for the fact that the editor has chosen an existing well-known
>>>> term that means something else to describe it, which conflicts
>>>> with all prior uses of that term.  Just stop that nonsense by
>>>> changing the HTML5 draft wording to talk about references, not  
>>>> URLs.
>>>> HTML5 does not require changes to IRI, and certainly not to URI.
>>>>
>>>> Changing IRI (or URI) so that it conforms both to the side of a
>>>> bus definition and a data entry definition is insane.  They are
>>>> not the same thing.  They do not share the same concerns.  A
>>>> reference might allow anything, depending on its context and the
>>>> technology used to parse it; it is the post-processing that
>>>> produces an IRI/URI.
>>>>
>>>> ....Roy
>>>>
>>>
>>>
>>> --
>>> Mark Nottingham     http://www.mnot.net/
>>>
>>>
>>
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by mnot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 27/09/2009, at 10:19 AM, Maciej Stachowiak wrote:

>>>
>>> Note: it's not clear to me why HTTPbis would want to reference  
>>> lenient processing rules for URIs/IRIs. Are HTTP servers and  
>>> proxies not strict in what they accept?
>>
>> It's been discussed for the Location header. No decision as of yet,  
>> though.
>>
>> If something like Location (i.e., something that needs a URI, not  
>> an IRI, as output) needs this algorithm, including this in the IRI  
>> spec is going to make things more complex.
>
> It seems like putting these rules in the HTML5 spec would make  
> things even harder than that for the Location header, since it would  
> create a dependency inversion. So perhaps you don't entirely agree  
> with Roy after all?


Stranger things have happened.

For the benefit of those who haven't heard me droning on about it  
before, I believe that what HTML5 should be doing is defining:

   - a syntax specification for HTML5
   - a bits -> URI/IRI spec
   - various other specs as needed
   - a "browser profile" that pulls all of these various  
specifications together, along with refs to HTTP, URI, etc. as  
appropriate

In this fashion, the individual functions will be more independent,  
and thus more easily referenced by other, non-web-browser applications  
that might need them. Furthermore, they can evolve more independently,  
as can the browser profile.

Cheers,

--
Mark Nottingham     http://www.mnot.net/



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Roy T. Fielding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sep 26, 2009, at 5:13 PM, Maciej Stachowiak wrote:

> The definition for how to perform forgiving processing of resource  
> identifiers originally started out in the HTML5 spec, where you  
> suggest it should go. However, it was moved to a separate document  
> based on strong objections from many parties. I understand from the  
> below that your objection was solely to the use of the term "URL",  
> and not to these processing rules being in the HTML spec. But that  
> was not the sole objection. Many thought it was architecturally  
> wrong to define these rules in the HTML spec. Thus, while I'm sure  
> Ian Hickson would be perfectly happy to put the processing  
> requirements back in HTML5, I'm not sure that is an acceptable long-
> term solution.

I think it is hopeless to trace back all the screwed-up  
misunderstandings
of Web architecture that led to anyURI, LEIRI, and now HTML5-URL.
I think I explained how it is supposed to work, succinctly and to the
point where actual text can be applied to the HTML5 draft that will
resolve all objections and settle this matter once and for all.
If not, then we can deal with those new objections when they arise.

> Furthermore, besides the general architectural objection, there may  
> be applications and technologies that wish to use HTML-style loose  
> processing rules. Having those rules in the HTML spec instead of in  
> a standalone specification makes it more difficult to reuse the  
> technology.

Those rules already exist in RFC3986, Appendix B.  What does not
exist there is the behavior after parsing into the components,
since that behavior is entirely application-dependent.  If HTML5
wants to define that behavior, it can do so only if the requirements
are stated to be specific to browser-like applications.

> On a more philosophical level: a lot more resource identifiers are  
> extracted from attributes in HTML documents than from the sides of  
> busses. It is not clear to me why the side-of-bus use case should  
> be privileged. IRIs are a standard for the Internet, not for  
> vehicular advertising. And indeed, many print ads these days drop  
> the initial http: from the addresses they print.

Also explained in 3986.  I don't remember if that was copied into 3987.

> For an Internet standard, there is nothing wrong with defining  
> rules for lenient processing as well as the syntax of strictly  
> conforming input. Doing so can convert "experiment[s] in  
> forgiveness" into interoperability.

There is nothing wrong with defining correct processing rules for
whatever thing you are trying to process, whether those rules be
strict or lenient.  The problem is saying that the rules are for
processing X when in fact you are actually processing Y and then
unilaterally declaring that Y is the new definition of X.

....Roy


Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 26, 2009, at 6:06 PM, Roy T. Fielding wrote:

> On Sep 26, 2009, at 5:13 PM, Maciej Stachowiak wrote:
>
>> The definition for how to perform forgiving processing of resource  
>> identifiers originally started out in the HTML5 spec, where you  
>> suggest it should go. However, it was moved to a separate document  
>> based on strong objections from many parties. I understand from the  
>> below that your objection was solely to the use of the term "URL",  
>> and not to these processing rules being in the HTML spec. But that  
>> was not the sole objection. Many thought it was architecturally  
>> wrong to define these rules in the HTML spec. Thus, while I'm sure  
>> Ian Hickson would be perfectly happy to put the processing  
>> requirements back in HTML5, I'm not sure that is an acceptable long-
>> term solution.
>
> I think it is hopeless to trace back all the screwed-up  
> misunderstandings
> of Web architecture that led to anyURI, LEIRI, and now HTML5-URL.
> I think I explained how it is supposed to work, succinctly and to the
> point where actual text can be applied to the HTML5 draft that will
> resolve all objections and settle this matter once and for all.
> If not, then we can deal with those new objections when they arise.

I think removing the use of the term "URL" from HTML5 would remove  
some objections, but I don't think folding the text of Web Address  
into HTML5 would address any objections, except perhaps the concern  
about lack of timely progress in this area.

>
>> Furthermore, besides the general architectural objection, there may  
>> be applications and technologies that wish to use HTML-style loose  
>> processing rules. Having those rules in the HTML spec instead of in  
>> a standalone specification makes it more difficult to reuse the  
>> technology.
>
> Those rules already exist in RFC3986, Appendix B.  What does not
> exist there is the behavior after parsing into the components,
> since that behavior is entirely application-dependent.  If HTML5
> wants to define that behavior, it can do so only if the requirements
> are stated to be specific to browser-like applications.

As far as I can tell, RFC3986 *only* defines how to extract  
components. It does not define how to turn an arbitrary string into a  
URI, which is potentially needed for HTTPbis. It does not define how  
to perform a relative resolution on a possibly-invalid reference  
against a possibly-invalid base.

That being said, I think what RFC3986 Appendix B says is a good  
definition of how to extract components from possibly-invalid strings.  
It seems way easier to understand than what the Web Address draft  
says, and better matches what implementations actually do.

>
>> On a more philosophical level: a lot more resource identifiers are  
>> extracted from attributes in HTML documents than from the sides of  
>> busses. It is not clear to me why the side-of-bus use case should  
>> be privileged. IRIs are a standard for the Internet, not for  
>> vehicular advertising. And indeed, many print ads these days drop  
>> the initial http: from the addresses they print.
>
> Also explained in 3986.  I don't remember if that was copied into  
> 3987.

Either way, the upshot is that strings may appear in bus ads that are  
not allowed to appear in format or protocol elements that require an  
IRI.

>
>> For an Internet standard, there is nothing wrong with defining  
>> rules for lenient processing as well as the syntax of strictly  
>> conforming input. Doing so can convert "experiment[s] in  
>> forgiveness" into interoperability.
>
> There is nothing wrong with defining correct processing rules for
> whatever thing you are trying to process, whether those rules be
> strict or lenient.  The problem is saying that the rules are for
> processing X when in fact you are actually processing Y and then
> unilaterally declaring that Y is the new definition of X.

I don't think Larry proposed to do that. He just suggested that a  
certain form of reusable lenient processing rules should be in the  
same spec as the normative definition of an IRI. I don't think he  
suggested that these rules should redefine what an IRI is.

Regards,
Maciej



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Maciej Stachowiak wrote:

> ...
> I don't see a great advantage in splitting the specs, as this makes
> cross-references more complicated. If it were just a matter of
> transforming the kind of string that may appear in an "href" attribute
> into a valid IRI, then your proposal might be plausible. However, in
> addition to converting to a URI, HTML UAs also need to be able to do the
> following to resource identifiers treated with lenient processing: (a)
> separate into components, even when the string is not a valid URI or
> IRI, and in a way that is not necessarily equivalent to first converting
> to a valid IRI or URI; (b) resolve a reference relative to a base when
> either the reference or the base might not be a valid URI or IRI; (c)
> determine if a reference is "absolute" even if it might not be a valid
> URI or IRI. That would mean a great deal of algorithms defined in a
> totally separate place from the URI spec. This is what the Web Address
> spec[1] attempted to do, and it ends up duplicating a lot of concepts
> from IRI/URI. This effort was set aside in favor of IRIbis incorporating
> the necessary content.
> ...

a) As already mentioned in this thread, it appears this is covered by
the text in <http://tools.ietf.org/html/rfc3986#appendix-B>.

b) Could you remind us why it's not possible to translate both to valid
URI/IRI and references first and then use the standard behavior?

c) Again, see <http://tools.ietf.org/html/rfc3986#appendix-B>.

I do realize that a more normative way of what Appendix B says may be
needed, but please let's not dismiss what's already in the spec as unusable.

BR, Julian


Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sep 26, 2009, at 11:51 PM, Julian Reschke wrote:

> Maciej Stachowiak wrote:
>> ...
>> I don't see a great advantage in splitting the specs, as this makes  
>> cross-references more complicated. If it were just a matter of  
>> transforming the kind of string that may appear in an "href"  
>> attribute into a valid IRI, then your proposal might be plausible.  
>> However, in addition to converting to a URI, HTML UAs also need to  
>> be able to do the following to resource identifiers treated with  
>> lenient processing: (a) separate into components, even when the  
>> string is not a valid URI or IRI, and in a way that is not  
>> necessarily equivalent to first converting to a valid IRI or URI;  
>> (b) resolve a reference relative to a base when either the  
>> reference or the base might not be a valid URI or IRI; (c)  
>> determine if a reference is "absolute" even if it might not be a  
>> valid URI or IRI. That would mean a great deal of algorithms  
>> defined in a totally separate place from the URI spec. This is what  
>> the Web Address spec[1] attempted to do, and it ends up duplicating  
>> a lot of concepts from IRI/URI. This effort was set aside in favor  
>> of IRIbis incorporating the necessary content.
>> ...
>
> a) As already mentioned in this thread, it appears this is covered  
> by the text in <http://tools.ietf.org/html/rfc3986#appendix-B>.

I haven't thought carefully about Appendix B to determine if its  
component splitting does the right thing in all cases. Tentatively it  
looks about right.

> b) Could you remind us why it's not possible to translate both to  
> valid URI/IRI and references first and then use the standard behavior?

That is what the Web Address algorithm (the former HTML5 algorithm) in  
fact does, see step 9: <http://www.w3.org/html/wg/href/draft.html#resolving-urls 
 >. However, the steps to translate to valid URIs (and the required  
postprocessing rules) are quite involved and must be specified  
somewhere.

>
> c) Again, see <http://tools.ietf.org/html/rfc3986#appendix-B>.

I can't see where Appenix B defines this.

>
> I do realize that a more normative way of what Appendix B says may  
> be needed, but please let's not dismiss what's already in the spec  
> as unusable.

I don't believe I dismissed it. But I agree, a normative version is  
needed.

Regards,
Maciej



Re: Proposed Charter and Agenda for IRI BOF at IETF 76

by Julian Reschke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Maciej Stachowiak wrote:
> ...
> That is what the Web Address algorithm (the former HTML5 algorithm) in
> fact does, see step 9:
> <http://www.w3.org/html/wg/href/draft.html#resolving-urls>. However, the
> steps to translate to valid URIs (and the required postprocessing rules)
> are quite involved and must be specified somewhere.
> ...

Yes. I think there's consensus that this should be defined for HTML.

>> c) Again, see <http://tools.ietf.org/html/rfc3986#appendix-B>.
>
> I can't see where Appenix B defines this.

It's absolute if it has a scheme. Or does this include absolute paths as
well? Should still be simple to determine from the parts extracted by
the regexp.

>> I do realize that a more normative way of what Appendix B says may be
>> needed, but please let's not dismiss what's already in the spec as
>> unusable.
>
> I don't believe I dismissed it. But I agree, a normative version is needed.
> ...

That was a general comment; I didn't indent to claim *you* dismissed it.

BR, Julian