|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: '#' in mailto URIsLarry Masinter wrote:
> What about encouraging URI/IRI scheme registrations to > say about whether fragment identifiers are necessary, > important, useful, allowed. > > mailto: could then disallow # fragment identifiers. > ... Not sure. Consider a scheme that was designed not to be resolvable, and thus its specs says that fragment identifiers do not make sense. If it *becomes* resolvable later on, they would. I think fragment identifiers always make sense if you can use the scheme to obtain a representation of the resource. That's all that should be said. BR, Julian |
|
|
RE: '#' in mailto URIsMartin Dürst wrote:
> The text that I might put in (if we think we need some) is: > > >>>> > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the media type of the retrieved resource. In the currently > known usage scenarios, a 'mailto' URI does not serve to retreive a > resource with a media type. Therefore, fragment identifiers are > meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored > upon resolution. > >>>> So, this reminds me of an aspect of RFC 3986 that I find surprising. It says [1] : > The fragment identifier component of a URI allows indirect > identification of a secondary resource by reference to a primary > resource and additional identifying information. The identified > secondary resource may be some portion or subset of the primary > resource, some view on representations of the primary resource, or > some other resource defined or described by those representations. A > fragment identifier component is indicated by the presence of a > number sign ("#") character and terminated by the end of the URI. > > fragment = *( pchar / "/" / "?" ) > > The semantics of a fragment identifier are defined by the set of > representations that might result from a retrieval action on the > primary resource. The fragment's format and resolution is therefore > dependent on the media type [RFC2046] of a potentially retrieved > representation, even though such a retrieval is only performed if the > URI is dereferenced. If no such representation exists, then the > semantics of the fragment are considered unknown and are effectively > unconstrained. Fragment identifier semantics are independent of the > URI scheme and thus cannot be redefined by scheme specifications. What surprises me in the above is the specific reference to media types. If I hadn't read the above, I would have assumed that the Web worked something like this: * Resources are identified with URIs, each of which has a scheme * For some such URIs, protocols such as HTTP can be used to retrieve representations of the resource * For the representation to be usable, it will typically be necessary for the protocol to convey (explictly or implicitly) the type of each such representation. In the case of HTTP, typing is done using media types [RFC 2046], but other protocols may use different typing schemes. The quote form RFC 3986 seems to imply that media types are the only supported typing mechanism for media types, regardless of the protocol used for retrieval. I understand that we are also trying to achieve a situation in which fragment identifier resolution is defined with respect to the type of the representation, not the URI scheme or retrieval protocol. Still, I would have thought it should say something like: "The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on >the type< of a potentially retrieved representation >(media type [RFC2046] in the case of HTTP retrievals)<, even though such a retrieval is only performed if the URI is dereferenced. Martin: given what's in 3986, your specific reference to media type is OK, I guess, but it still feels strange to me in the context of mailto. I also find it somewhat more appropriate to speak of retrieving representations than retrieving resources. Therefore, I wonder whether it might be a little better to say (changes marked with >...<): ---Proposed--- Note that this specification, like any URI scheme specification, does not define syntax or meaning of a fragment identifier, because these depend on the >type of a retrieved representation<. In the currently known usage scenarios, a 'mailto' URI >cannot be used to retreive such representations<. Therefore, fragment identifiers are meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon resolution. ---End Proposed--- Noah [1] http://www.ietf.org/rfc/rfc3986.txt -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Larry Masinter <masinter@...> Sent by: uri-request@... 10/14/2009 01:31 PM To: "Martin J. Dürst" <duerst@...>, "Michael A. Puls II" <shadow@...> cc: "jwz@..." <jwz@...>, "PUBLIC-IRI@..." <PUBLIC-IRI@...>, (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: '#' in mailto URIs What about encouraging URI/IRI scheme registrations to say about whether fragment identifiers are necessary, important, useful, allowed. mailto: could then disallow # fragment identifiers. Larry -----Original Message----- From: "Martin J. Dürst" [mailto:duerst@...] Sent: Tuesday, October 13, 2009 9:37 PM To: Michael A. Puls II Cc: Larry Masinter; jwz@... Subject: Re: '#' in mailto URIs This is some very old mail. The current mailto: draft doesn't contain anything about fragment identifiers. Should it? The text that I might put in (if we think we need some) is: >>>> Note that this specification, like any URI scheme specification, does not define syntax or meaning of a fragment identifier, because these depend on the media type of the retrieved resource. In the currently known usage scenarios, a 'mailto' URI does not serve to retreive a resource with a media type. Therefore, fragment identifiers are meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon resolution. >>>> Regards, Martin. On 2008/04/02 6:32, Michael A. Puls II wrote: > > <!--"charset=utf-8"--> > On Tue, 01 Apr 2008 13:18:27 -0400, Larry Masinter <LMM@...> wrote: > >>> So, it sounds like, in short, you're saying that Safari and Firefox >>> shouldn't use # that way because it's reserved for future use in mailto >>> URIs. >>> >>> Perhaps you could explicitly note that in your next draft? >> >> It isn't reserved "for future use", it's just not allowed. > > Martin said that # is *always* a fragment identifier. If it's not > allowed, ever, then you're saying that mailto URIs don't support > fragment identifiers and won't ever support fragment identifiers because > # is not allowed. (Which would make sense to me) > > If that's true, then a raw # that is found in a mailto URI (even though > it's not allowed) would not be anything special and could just be > accepted literally (if you were not going to throw an error). > > That would make sense to me. > > However, if mailto URIs support fragment identifiers or might support > fragment identiers in the future, then # and everything after it in the > URI needs to be ignored (at least by the mail client itself when parsing > and filling in the compose fields). > > What I got from Martin's response is that mailto URIs (like http URIs) > support fragment identifiers. It's just that no client *currently* makes > use of them in any way for 'mailto'. > > Basically, I just need to be sure what to do with a raw # in a mailto > URI (even if it's an error). > >> Not every possible string has to have an interpretation. > > I don't know what you mean by that sentence or what it pertains to. > Please clarify. > > Thanks > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
Re: '#' in mailto URIsOn Wed, 14 Oct 2009 13:31:09 -0400, Larry Masinter <masinter@...>
wrote: > This is some very old mail. The current mailto: draft doesn't contain > anything about fragment identifiers. Should it? > The text that I might put in (if we think we need some) is: > >>>> > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the media type of the retrieved resource. In the currently > known usage scenarios, a 'mailto' URI does not serve to retreive a > resource with a media type. Therefore, fragment identifiers are > meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored > upon resolution. > >>>> Actually, I think we should just look at how clients handle # in mailto URIs. 1. msimn.exe /mailurl:mailto:?body=before#after 2. opera.exe mailto:?body=before#after 3. opera.exe page.html (where page has <a href="mailto:?body=before#after">click me</a>) 4. sylpheed.exe --compose mailto:?body=before#after 5. thunderbird31.exe mailto:?body=before#after All those mail clients (except for the #2 situation where Opera's address field's URI parser splits up the URI into its pieces and Opera's mail code doesn't recompose the URI right (forgets the fragid part) before parsing) emit "before#after" in the body field of the compose window instead of just "before". That says that for mailto, # should just be treated literally as a non-reserved character like a-zA-Z0-9 etc. is. (Note that IE's address field can do what Opera does for #2 too before it passes the URI to the mail client.) So, I'd rather just say that for mailto URIs, # is just another character and any client that doesn't follow that notion should be fixed. However, if others are not fine with that, in addition to the proposed text quoted above, it should be said 'why' you should not use # in mailto URIs. The reason of course would be that whether the # gets treated as part of a header field value or whether the URI gets chopped off at the first # depends on the client and situation. It should also be explicitly mentioned that if you need a # to be part of a header field value, you should use %23 instead. So, although I'm fine with the proposed text, it could be improved to mention (more like I do) 'why' you *currently at least* should not use # in mailto URIs. Also, fwiw, the mention of 'media type', or the lack of, in the presence of 'mailto' seems odd. Ultimately though, what I need from specs is a definitive answer as to what should happen when there's a # in a mailto URI. Even if # is invalid in a mailto URI, I need to know what to do so I can file bugs for clients and make things interoperable. There should be no ambiguity on how to *handle* # in a mailto URI. I need that explicitly spelled out in a spec so that vendors will be willing to fix it. And, the less changes vendors have to make to their code, the better. So, to recap, here are 2 questions that must be answered: 1. If I enter mailto:?body=before#after in my browser's address field (or click on an HTML link of the same), must "mailto:?body=before#after" or "mailto:?body=before" be passed to the mail client? 2. If a mail client receives mailto:?body=before#after , must it ignore the first # and everything after it where only "before" shows up in the body field or must the body hfvalue be "before#after"? -- Michael |
|
|
Re: '#' in mailto URIsMichael A. Puls II wrote:
> ... > Actually, I think we should just look at how clients handle # in mailto > URIs. > > 1. msimn.exe /mailurl:mailto:?body=before#after > 2. opera.exe mailto:?body=before#after > 3. opera.exe page.html (where page has <a > href="mailto:?body=before#after">click me</a>) > 4. sylpheed.exe --compose mailto:?body=before#after > 5. thunderbird31.exe mailto:?body=before#after > > All those mail clients (except for the #2 situation where Opera's > address field's URI parser splits up the URI into its pieces and Opera's > mail code doesn't recompose the URI right (forgets the fragid part) > before parsing) emit "before#after" in the body field of the compose > window instead of just "before". That says that for mailto, # should > just be treated literally as a non-reserved character like a-zA-Z0-9 > etc. is. > > (Note that IE's address field can do what Opera does for #2 too before > it passes the URI to the mail client.) > > So, I'd rather just say that for mailto URIs, # is just another > character and any client that doesn't follow that notion should be fixed. > ... Nope. Treating "#after" as fragment in a URI is what RFC 3986 defines. We should not require clients to violate the base specification. > However, if others are not fine with that, in addition to the proposed > text quoted above, it should be said 'why' you should not use # in > mailto URIs. The reason of course would be that whether the # gets > treated as part of a header field value or whether the URI gets chopped > off at the first # depends on the client and situation. It should also > be explicitly mentioned that if you need a # to be part of a header > field value, you should use %23 instead. ...must... > ... BR, Julian |
|
|
Re: '#' in mailto URIsOn Thu, 15 Oct 2009 04:47:08 -0400, Julian Reschke <julian.reschke@...>
wrote: > Michael A. Puls II wrote: >> ... >> Actually, I think we should just look at how clients handle # in mailto >> URIs. >> 1. msimn.exe /mailurl:mailto:?body=before#after >> 2. opera.exe mailto:?body=before#after >> 3. opera.exe page.html (where page has <a >> href="mailto:?body=before#after">click me</a>) >> 4. sylpheed.exe --compose mailto:?body=before#after >> 5. thunderbird31.exe mailto:?body=before#after >> All those mail clients (except for the #2 situation where Opera's >> address field's URI parser splits up the URI into its pieces and >> Opera's mail code doesn't recompose the URI right (forgets the fragid >> part) before parsing) emit "before#after" in the body field of the >> compose window instead of just "before". That says that for mailto, # >> should just be treated literally as a non-reserved character like >> a-zA-Z0-9 etc. is. >> (Note that IE's address field can do what Opera does for #2 too before >> it passes the URI to the mail client.) >> So, I'd rather just say that for mailto URIs, # is just another >> character and any client that doesn't follow that notion should be >> fixed. >> ... > > Nope. Treating "#after" as fragment in a URI is what RFC 3986 defines. > We should not require clients to violate the base specification. > >> However, if others are not fine with that, in addition to the proposed >> text quoted above, it should be said 'why' you should not use # in >> mailto URIs. The reason of course would be that whether the # gets >> treated as part of a header field value or whether the URI gets chopped >> off at the first # depends on the client and situation. It should also >> be explicitly mentioned that if you need a # to be part of a header >> field value, you should use %23 instead. > > ...must... So, because of RFC3986's definition of #, all mail clients encountering "mailto:?body=before#after" must only put "before" in the body field of the compose window? If so, that, means I have bugs to file for Opera, Thunderbird, Sylpheed, probably Outlook if it does like Outlook Express, and any other clients that put "before#after" in the body field in that situation. Now, what does RFC3986 say about the case where you click on a link to "mailto:?body=before#after" or enter "mailto:?body=before#after" in the address field. Is "#after" part of the URI and "mailto:?body=before#after" must be passed to the mail client, or is "#after" not part of the URI and "mailto:?body=before" must only be passed to the mail client? If no RFC or spec says anything about that, what is your personal expectation? Depending on the answer, I have bugs to file for IE and Opera or Firefox and Safari. -- Michael |
|
|
Re: '#' in mailto URIsOn Fri, Oct 16, 2009 at 11:55 AM, Michael A. Puls II
<shadow2531@...> wrote: > On Thu, 15 Oct 2009 04:47:08 -0400, Julian Reschke <julian.reschke@...> > wrote: > >> Michael A. Puls II wrote: >>> >>> ... >>> Actually, I think we should just look at how clients handle # in mailto >>> URIs. >>> 1. msimn.exe /mailurl:mailto:?body=before#after >>> 2. opera.exe mailto:?body=before#after >>> 3. opera.exe page.html (where page has <a >>> href="mailto:?body=before#after">click me</a>) >>> 4. sylpheed.exe --compose mailto:?body=before#after >>> 5. thunderbird31.exe mailto:?body=before#after >>> All those mail clients (except for the #2 situation where Opera's >>> address field's URI parser splits up the URI into its pieces and Opera's >>> mail code doesn't recompose the URI right (forgets the fragid part) before >>> parsing) emit "before#after" in the body field of the compose window instead >>> of just "before". That says that for mailto, # should just be treated >>> literally as a non-reserved character like a-zA-Z0-9 etc. is. >>> (Note that IE's address field can do what Opera does for #2 too before >>> it passes the URI to the mail client.) >>> So, I'd rather just say that for mailto URIs, # is just another >>> character and any client that doesn't follow that notion should be fixed. >>> ... >> >> Nope. Treating "#after" as fragment in a URI is what RFC 3986 defines. We >> should not require clients to violate the base specification. >> >>> However, if others are not fine with that, in addition to the proposed >>> text quoted above, it should be said 'why' you should not use # in mailto >>> URIs. The reason of course would be that whether the # gets treated as part >>> of a header field value or whether the URI gets chopped off at the first # >>> depends on the client and situation. It should also be explicitly mentioned >>> that if you need a # to be part of a header field value, you should use %23 >>> instead. >> >> ...must... > > So, because of RFC3986's definition of #, all mail clients encountering > "mailto:?body=before#after" must only put "before" in the body field of the > compose window? > > If so, that, means I have bugs to file for Opera, Thunderbird, Sylpheed, > probably Outlook if it does like Outlook Express, and any other clients that > put "before#after" in the body field in that situation. > > Now, what does RFC3986 say about the case where you click on a link to > "mailto:?body=before#after" or enter "mailto:?body=before#after" in the > address field. Is "#after" part of the URI and "mailto:?body=before#after" > must be passed to the mail client, or is "#after" not part of the URI and > "mailto:?body=before" must only be passed to the mail client? > > If no RFC or spec says anything about that, what is your personal > expectation? > > Depending on the answer, I have bugs to file for IE and Opera or Firefox and > Safari. The main problem that I see is where "#" is being used multiple times in such a uri, e.g. mailto:?subject=asdf#ghij&body=before#after Per RFC3986, the first "#" creates the fragment, so the body is never regarded as another query parameter. I would think that "#" has to be escaped in mailto uris. If there weren't multiple query parameters in a mailto uri, one could simply make the user agent append the fragment part to the query parameter data to get around the contradiction, but that is not possible with multiple "#" parameters. Regards, Silvia. |
|
|
Re: '#' in mailto URIsOn Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer
<silviapfeiffer1@...> wrote: > The main problem that I see is where "#" is being used multiple times > in such a uri, e.g. > > mailto:?subject=asdf#ghij&body=before#after > > Per RFC3986, the first "#" creates the fragment, so the body is never > regarded as another query parameter. I would think that "#" has to be > escaped in mailto uris. If there weren't multiple query parameters in > a mailto uri, one could simply make the user agent append the fragment > part to the query parameter data to get around the contradiction, but > that is not possible with multiple "#" parameters. Well, since frag ids are of no use in mailto URIs currently, if you encounter "mailto:?subject=asdf#ghij&body=before#after", what do you think the creator of the URI intended? For me, the creator obviously meant "mailto:?subject=asdf%23ghij&body=before%23after" and could not have meant anything else. So, although # is invalid in a header field value, in the case of mailto, it's obvious what the creator meant, imo. For mutliple # in the case above, if the first # starts a fragid for mailto, and fragids in mailto URIs actually did something, then, I would consider the fragid segment to just be "#ghij&body=before#after", where the creator actually meant "#ghij%26body%3Dbefore%23after". (Or, you can assume the creator meant "mailto:?subject=asdf%23ghij&body=before#after" where the creator meant the first # to be %23 and actually meant to use a fragid of #after. But that's highly unlikely the creator meant that.) To be clear though, the concern I have is how to handle mailto URIs where the creator meant %23, but used a raw # instead, because they did it on accident or didn't know that it had to be encoded as %23. You could even say that in all cases where you find a # in a mailto URI, the creator meant %23. The only reason for UAs not to make that assumption is so things don't get messed up in the future if fragid support for mailto is actually defined and does something. That's my reasoning fwiw. But, if UAs should just chop off the maito URI at the first # no matter what, then O.K., but that should be explicitly mentioned. -- Michael |
|
|
Re: '#' in mailto URIsOn Oct 16, 2009, at 8:54 AM, Michael A. Puls II wrote:
> On Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer <silviapfeiffer1@... > > wrote: >> The main problem that I see is where "#" is being used multiple times >> in such a uri, e.g. >> >> mailto:?subject=asdf#ghij&body=before#after >> >> Per RFC3986, the first "#" creates the fragment, so the body is never >> regarded as another query parameter. I would think that "#" has to be >> escaped in mailto uris. If there weren't multiple query parameters in >> a mailto uri, one could simply make the user agent append the >> fragment >> part to the query parameter data to get around the contradiction, but >> that is not possible with multiple "#" parameters. > > Well, since frag ids are of no use in mailto URIs currently, if you > encounter "mailto:?subject=asdf#ghij&body=before#after", what do you > think the creator of the URI intended? For me, the creator obviously > meant "mailto:?subject=asdf%23ghij&body=before%23after" and could > not have meant anything else. That's only because you think there is no client-side role for a fragment on mailto, which is probably right today and most likely wrong eventually. I have no doubt that someone is going to write a javascript handler that does something funky based on the fragid in a mailto reference, eventually. > So, although # is invalid in a header field value, in the case of > mailto, it's obvious what the creator meant, imo. No, it isn't. > For mutliple # in the case above, if the first # starts a fragid for > mailto, and fragids in mailto URIs actually did something, then, I > would consider the fragid segment to just be > "#ghij&body=before#after", where the creator actually meant "#ghij > %26body%3Dbefore%23after". (Or, you can assume the creator meant > "mailto:?subject=asdf%23ghij&body=before#after" where the creator > meant the first # to be %23 and actually meant to use a fragid of > #after. But that's highly unlikely the creator meant that.) > > To be clear though, the concern I have is how to handle mailto URIs > where the creator meant %23, but used a raw # instead, because they > did it on accident or didn't know that it had to be encoded as %23. Actually, your concern is how to parse an invalid reference and transform it into something that is valid but may or may not be what the author intended. That is simple error handling and the "right" answer depends on whether your parser is a browser, a link checker, or something else. > You could even say that in all cases where you find a # in a mailto > URI, the creator meant %23. The only reason for UAs not to make that > assumption is so things don't get messed up in the future if fragid > support for mailto is actually defined and does something. > > That's my reasoning fwiw. But, if UAs should just chop off the maito > URI at the first # no matter what, then O.K., but that should be > explicitly mentioned. It should be explicitly mentioned by something, most likely a browser implementation spec for parsing arbitrary data as IRI references. It doesn't belong in the definition of the URI because the only interoperable string is the one with %23 where the # is used as data. Anything else is going to break at least one of the many forms of web components. ....Roy |
|
|
Re: '#' in mailto URIsOn Fri, 16 Oct 2009 04:28:08 -0400, Roy T. Fielding <fielding@...>
wrote: > On Oct 16, 2009, at 8:54 AM, Michael A. Puls II wrote: > >> On Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer >> <silviapfeiffer1@...> wrote: >>> The main problem that I see is where "#" is being used multiple times >>> in such a uri, e.g. >>> >>> mailto:?subject=asdf#ghij&body=before#after >>> >>> Per RFC3986, the first "#" creates the fragment, so the body is never >>> regarded as another query parameter. I would think that "#" has to be >>> escaped in mailto uris. If there weren't multiple query parameters in >>> a mailto uri, one could simply make the user agent append the fragment >>> part to the query parameter data to get around the contradiction, but >>> that is not possible with multiple "#" parameters. >> >> Well, since frag ids are of no use in mailto URIs currently, if you >> encounter "mailto:?subject=asdf#ghij&body=before#after", what do you >> think the creator of the URI intended? For me, the creator obviously >> meant "mailto:?subject=asdf%23ghij&body=before%23after" and could not >> have meant anything else. > > That's only because you think there is no client-side role > for a fragment on mailto, which is probably right today > and most likely wrong eventually. I have no doubt that someone > is going to write a javascript handler that does something funky > based on the fragid in a mailto reference, eventually. O.K., so you're saying that # has to be reserved for all URIs no matter what and no matter if it currently has any use for the scheme, because, it might have some use in the future for that scheme and we must not screw up that use-case? >> So, although # is invalid in a header field value, in the case of >> mailto, it's obvious what the creator meant, imo. > > No, it isn't. Thanks. It's good to hear everyone's interpretation on that. >> For mutliple # in the case above, if the first # starts a fragid for >> mailto, and fragids in mailto URIs actually did something, then, I >> would consider the fragid segment to just be "#ghij&body=before#after", >> where the creator actually meant "#ghij%26body%3Dbefore%23after". (Or, >> you can assume the creator meant >> "mailto:?subject=asdf%23ghij&body=before#after" where the creator meant >> the first # to be %23 and actually meant to use a fragid of #after. But >> that's highly unlikely the creator meant that.) >> >> To be clear though, the concern I have is how to handle mailto URIs >> where the creator meant %23, but used a raw # instead, because they did >> it on accident or didn't know that it had to be encoded as %23. > > Actually, your concern is how to parse an invalid reference > and transform it into something that is valid but may or may > not be what the author intended. That is simple error handling The simple error handling that I use for mailto parsers I've written is to treat # as %23 (and even normalize # to %23). As mentioned, the parsers of a lot of mail clients treat # as %23 too, which I think makes sense for error handling (since there's currently nothing they do with fragids). > and the "right" answer depends on whether your parser is a > browser, a link checker, or something else. 1. A browser that passes the URI to a mail client. Must it pass #value to the mail client or only pass everything before the first #? 2. A mail client. If it doesn't support any type of handling of a fragid, must it assume that # is %23, or must it chop off the URI at the first # before parsing? What are the right answers for those 2 specifically? >> You could even say that in all cases where you find a # in a mailto >> URI, the creator meant %23. The only reason for UAs not to make that >> assumption is so things don't get messed up in the future if fragid >> support for mailto is actually defined and does something. >> >> That's my reasoning fwiw. But, if UAs should just chop off the maito >> URI at the first # no matter what, then O.K., but that should be >> explicitly mentioned. > > It should be explicitly mentioned by something, most likely > a browser implementation spec for parsing arbitrary data as > IRI references. > > It doesn't belong in the definition of the URI because the > only interoperable string is the one with %23 where the # is > used as data. Anything else is going to break at least one > of the many forms of web components. So, what you're saying is that you don't want the mailto URI spec touching error handling with a 10ft pole and only want to assume that perfect URIs will be processed? -- Michael |
|
|
Re: '#' in mailto URIsOn Oct 16, 2009, at 11:41 AM, Michael A. Puls II wrote:
> On Fri, 16 Oct 2009 04:28:08 -0400, Roy T. Fielding > <fielding@...> wrote: > >> On Oct 16, 2009, at 8:54 AM, Michael A. Puls II wrote: >> >>> On Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer <silviapfeiffer1@... >>> > wrote: >>>> The main problem that I see is where "#" is being used multiple >>>> times >>>> in such a uri, e.g. >>>> >>>> mailto:?subject=asdf#ghij&body=before#after >>>> >>>> Per RFC3986, the first "#" creates the fragment, so the body is >>>> never >>>> regarded as another query parameter. I would think that "#" has >>>> to be >>>> escaped in mailto uris. If there weren't multiple query >>>> parameters in >>>> a mailto uri, one could simply make the user agent append the >>>> fragment >>>> part to the query parameter data to get around the contradiction, >>>> but >>>> that is not possible with multiple "#" parameters. >>> >>> Well, since frag ids are of no use in mailto URIs currently, if >>> you encounter "mailto:?subject=asdf#ghij&body=before#after", what >>> do you think the creator of the URI intended? For me, the creator >>> obviously meant "mailto:?subject=asdf%23ghij&body=before%23after" >>> and could not have meant anything else. >> >> That's only because you think there is no client-side role >> for a fragment on mailto, which is probably right today >> and most likely wrong eventually. I have no doubt that someone >> is going to write a javascript handler that does something funky >> based on the fragid in a mailto reference, eventually. > > O.K., so you're saying that # has to be reserved for all URIs no > matter what and no matter if it currently has any use for the > scheme, because, it might have some use in the future for that > scheme and we must not screw up that use-case? Yes. >>> So, although # is invalid in a header field value, in the case of >>> mailto, it's obvious what the creator meant, imo. >> >> No, it isn't. > > Thanks. It's good to hear everyone's interpretation on that. > >>> For mutliple # in the case above, if the first # starts a fragid >>> for mailto, and fragids in mailto URIs actually did something, >>> then, I would consider the fragid segment to just be >>> "#ghij&body=before#after", where the creator actually meant "#ghij >>> %26body%3Dbefore%23after". (Or, you can assume the creator meant >>> "mailto:?subject=asdf%23ghij&body=before#after" where the creator >>> meant the first # to be %23 and actually meant to use a fragid of >>> #after. But that's highly unlikely the creator meant that.) >>> >>> To be clear though, the concern I have is how to handle mailto >>> URIs where the creator meant %23, but used a raw # instead, >>> because they did it on accident or didn't know that it had to be >>> encoded as %23. >> >> Actually, your concern is how to parse an invalid reference >> and transform it into something that is valid but may or may >> not be what the author intended. That is simple error handling > > The simple error handling that I use for mailto parsers I've written > is to treat # as %23 (and even normalize # to %23). As mentioned, > the parsers of a lot of mail clients treat # as %23 too, which I > think makes sense for error handling (since there's currently > nothing they do with fragids). I think that is fine error handling for a current parser of references (not URI/IRIs) that is not doing validation. A CMS is going to have very different error handling in that case, as will a third-party link checker, as will a spider that is reading mailto links for an entirely different reason. I am not saying that error handling is bad or cannot be standardized within limited contexts. I am saying that it is not standard across all contexts and thus cannot be defined for a standard that is, by definition, context-free. >> and the "right" answer depends on whether your parser is a >> browser, a link checker, or something else. > > 1. A browser that passes the URI to a mail client. Must it pass > #value to the mail client or only pass everything before the first #? > > 2. A mail client. If it doesn't support any type of handling of a > fragid, must it assume that # is %23, or must it chop off the URI at > the first # before parsing? > > What are the right answers for those 2 specifically? It is not a URI, and the right answer depends on context. If it is a data entry dialog (like location bar) then I would "internally redirect" the reference so that it is rewritten with %23. If it is in an href attribute, then I would not use the fragment portion (i.e., ignore it). The reason is because this is not a common scenario in existing content and it is far better to correct bad content than to just assume you know what the author wanted. >>> You could even say that in all cases where you find a # in a >>> mailto URI, the creator meant %23. The only reason for UAs not to >>> make that assumption is so things don't get messed up in the >>> future if fragid support for mailto is actually defined and does >>> something. >>> >>> That's my reasoning fwiw. But, if UAs should just chop off the >>> maito URI at the first # no matter what, then O.K., but that >>> should be explicitly mentioned. >> >> It should be explicitly mentioned by something, most likely >> a browser implementation spec for parsing arbitrary data as >> IRI references. >> >> It doesn't belong in the definition of the URI because the >> only interoperable string is the one with %23 where the # is >> used as data. Anything else is going to break at least one >> of the many forms of web components. > > So, what you're saying is that you don't want the mailto URI spec > touching error handling with a 10ft pole and only want to assume > that perfect URIs will be processed? The role of the scheme spec is to define what is interoperable for all consumers of the URI -- that's why we have a restricted syntax. We don't need all consumers to behave the same way when encountering an invalid reference, even if all browsers do behave the same way (browsers are easily the smallest subset of Web implementations with the least variance among them). ....Roy |
|
|
Re: '#' in mailto URIsThanks to everybody for their contributions to this discussion.
I have added >>>>>>>> Note that this specification, like any URI scheme specification, does not define syntax or meaning of a fragment identifier, because these depend on the type of a retrieved representation. In the currently known usage scenarios, a 'mailto' URI cannot be used to retreive such representations. Therefore, fragment identifiers are meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon resolution. The character '#' in hfvalues MUST be escaped as %23. >>>>>>>> to my internal version. Regards, Martin. On 2009/10/15 6:54, noah_mendelsohn@... wrote: > Martin Dürst wrote: > >> The text that I might put in (if we think we need some) is: >> >> >>>> >> Note that this specification, like any URI scheme specification, does >> not define syntax or meaning of a fragment identifier, because these >> depend on the media type of the retrieved resource. In the currently >> known usage scenarios, a 'mailto' URI does not serve to retreive a >> resource with a media type. Therefore, fragment identifiers are >> meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored >> upon resolution. >> >>>> > > So, this reminds me of an aspect of RFC 3986 that I find surprising. It > says [1] : > >> The fragment identifier component of a URI allows indirect >> identification of a secondary resource by reference to a primary >> resource and additional identifying information. The identified >> secondary resource may be some portion or subset of the primary >> resource, some view on representations of the primary resource, or >> some other resource defined or described by those representations. A >> fragment identifier component is indicated by the presence of a >> number sign ("#") character and terminated by the end of the URI. >> >> fragment = *( pchar / "/" / "?" ) >> >> The semantics of a fragment identifier are defined by the set of >> representations that might result from a retrieval action on the >> primary resource. The fragment's format and resolution is therefore >> dependent on the media type [RFC2046] of a potentially retrieved >> representation, even though such a retrieval is only performed if the >> URI is dereferenced. If no such representation exists, then the >> semantics of the fragment are considered unknown and are effectively >> unconstrained. Fragment identifier semantics are independent of the >> URI scheme and thus cannot be redefined by scheme specifications. > > > What surprises me in the above is the specific reference to media types. > If I hadn't read the above, I would have assumed that the Web worked > something like this: > > * Resources are identified with URIs, each of which has a scheme > * For some such URIs, protocols such as HTTP can be used to retrieve > representations of the resource > * For the representation to be usable, it will typically be necessary for > the protocol to convey (explictly or implicitly) the type of each such > representation. In the case of HTTP, typing is done using media types > [RFC 2046], but other protocols may use different typing schemes. > > The quote form RFC 3986 seems to imply that media types are the only > supported typing mechanism for media types, regardless of the protocol > used for retrieval. I understand that we are also trying to achieve a > situation in which fragment identifier resolution is defined with respect > to the type of the representation, not the URI scheme or retrieval > protocol. Still, I would have thought it should say something like: > > "The semantics of a fragment identifier are defined by the set of > representations that might result from a retrieval action on the > primary resource. The fragment's format and resolution is therefore > dependent on>the type< of a potentially retrieved representation>(media > type [RFC2046] in the case of HTTP retrievals)<, even though such a > retrieval is only performed if the URI is dereferenced. > > Martin: given what's in 3986, your specific reference to media type is OK, > I guess, but it still feels strange to me in the context of mailto. I > also find it somewhat more appropriate to speak of retrieving > representations than retrieving resources. Therefore, I wonder whether it > might be a little better to say (changes marked with>...<): > > ---Proposed--- > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the>type of a retrieved representation<. In the currently > known usage scenarios, a 'mailto' URI>cannot be used to retreive > such representations<. Therefore, fragment identifiers are meaningless, > SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon > resolution. > ---End Proposed--- > > Noah > > [1] http://www.ietf.org/rfc/rfc3986.txt > > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > > > > > > > Larry Masinter<masinter@...> > Sent by: uri-request@... > 10/14/2009 01:31 PM > > To: "Martin J. Dürst"<duerst@...>, "Michael A. > Puls II"<shadow@...> > cc: "jwz@..."<jwz@...>, "PUBLIC-IRI@..." > <PUBLIC-IRI@...>, (bcc: Noah Mendelsohn/Cambridge/IBM) > Subject: RE: '#' in mailto URIs > > > What about encouraging URI/IRI scheme registrations to > say about whether fragment identifiers are necessary, > important, useful, allowed. > > mailto: could then disallow # fragment identifiers. > > Larry > > -----Original Message----- > From: "Martin J. Dürst" [mailto:duerst@...] > Sent: Tuesday, October 13, 2009 9:37 PM > To: Michael A. Puls II > Cc: Larry Masinter; jwz@... > Subject: Re: '#' in mailto URIs > > This is some very old mail. The current mailto: draft doesn't contain > anything about fragment identifiers. Should it? > > The text that I might put in (if we think we need some) is: > > >>>> > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the media type of the retrieved resource. In the currently > known usage scenarios, a 'mailto' URI does not serve to retreive a > resource with a media type. Therefore, fragment identifiers are > meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored > upon resolution. > >>>> > > Regards, Martin. > > On 2008/04/02 6:32, Michael A. Puls II wrote: >> <!--"charset=utf-8"--> >> On Tue, 01 Apr 2008 13:18:27 -0400, Larry Masinter<LMM@...> wrote: >> >>>> So, it sounds like, in short, you're saying that Safari and Firefox >>>> shouldn't use # that way because it's reserved for future use in > mailto >>>> URIs. >>>> >>>> Perhaps you could explicitly note that in your next draft? >>> It isn't reserved "for future use", it's just not allowed. >> Martin said that # is *always* a fragment identifier. If it's not >> allowed, ever, then you're saying that mailto URIs don't support >> fragment identifiers and won't ever support fragment identifiers because >> # is not allowed. (Which would make sense to me) >> >> If that's true, then a raw # that is found in a mailto URI (even though >> it's not allowed) would not be anything special and could just be >> accepted literally (if you were not going to throw an error). >> >> That would make sense to me. >> >> However, if mailto URIs support fragment identifiers or might support >> fragment identiers in the future, then # and everything after it in the >> URI needs to be ignored (at least by the mail client itself when parsing >> and filling in the compose fields). >> >> What I got from Martin's response is that mailto URIs (like http URIs) >> support fragment identifiers. It's just that no client *currently* makes >> use of them in any way for 'mailto'. >> >> Basically, I just need to be sure what to do with a raw # in a mailto >> URI (even if it's an error). >> >>> Not every possible string has to have an interpretation. >> I don't know what you mean by that sentence or what it pertains to. >> Please clarify. >> >> Thanks >> > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
Re: '#' in mailto URIsTerrific, thank you!
Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Martin J. Dürst" <duerst@...> 10/16/2009 01:05 PM To: noah_mendelsohn@... cc: Larry Masinter <masinter@...>, "jwz@..." <jwz@...>, "PUBLIC-IRI@..." <PUBLIC-IRI@...>, "Michael A. Puls II" <shadow@...> Subject: Re: '#' in mailto URIs Thanks to everybody for their contributions to this discussion. I have added >>>>>>>> Note that this specification, like any URI scheme specification, does not define syntax or meaning of a fragment identifier, because these depend on the type of a retrieved representation. In the currently known usage scenarios, a 'mailto' URI cannot be used to retreive such representations. Therefore, fragment identifiers are meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon resolution. The character '#' in hfvalues MUST be escaped as %23. >>>>>>>> to my internal version. Regards, Martin. On 2009/10/15 6:54, noah_mendelsohn@... wrote: > Martin Dürst wrote: > >> The text that I might put in (if we think we need some) is: >> >> >>>> >> Note that this specification, like any URI scheme specification, does >> not define syntax or meaning of a fragment identifier, because these >> depend on the media type of the retrieved resource. In the currently >> known usage scenarios, a 'mailto' URI does not serve to retreive a >> resource with a media type. Therefore, fragment identifiers are >> meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored >> upon resolution. >> >>>> > > So, this reminds me of an aspect of RFC 3986 that I find surprising. It > says [1] : > >> The fragment identifier component of a URI allows indirect >> identification of a secondary resource by reference to a primary >> resource and additional identifying information. The identified >> secondary resource may be some portion or subset of the primary >> resource, some view on representations of the primary resource, or >> some other resource defined or described by those representations. >> fragment identifier component is indicated by the presence of a >> number sign ("#") character and terminated by the end of the URI. >> >> fragment = *( pchar / "/" / "?" ) >> >> The semantics of a fragment identifier are defined by the set of >> representations that might result from a retrieval action on the >> primary resource. The fragment's format and resolution is therefore >> dependent on the media type [RFC2046] of a potentially retrieved >> representation, even though such a retrieval is only performed if the >> URI is dereferenced. If no such representation exists, then the >> semantics of the fragment are considered unknown and are effectively >> unconstrained. Fragment identifier semantics are independent of the >> URI scheme and thus cannot be redefined by scheme specifications. > > > What surprises me in the above is the specific reference to media types. > If I hadn't read the above, I would have assumed that the Web worked > something like this: > > * Resources are identified with URIs, each of which has a scheme > * For some such URIs, protocols such as HTTP can be used to retrieve > representations of the resource > * For the representation to be usable, it will typically be necessary > the protocol to convey (explictly or implicitly) the type of each such > representation. In the case of HTTP, typing is done using media types > [RFC 2046], but other protocols may use different typing schemes. > > The quote form RFC 3986 seems to imply that media types are the only > supported typing mechanism for media types, regardless of the protocol > used for retrieval. I understand that we are also trying to achieve a > situation in which fragment identifier resolution is defined with respect > to the type of the representation, not the URI scheme or retrieval > protocol. Still, I would have thought it should say something like: > > "The semantics of a fragment identifier are defined by the set of > representations that might result from a retrieval action on the > primary resource. The fragment's format and resolution is therefore > dependent on>the type< of a potentially retrieved representation>(media > type [RFC2046] in the case of HTTP retrievals)<, even though such a > retrieval is only performed if the URI is dereferenced. > > Martin: given what's in 3986, your specific reference to media type is > I guess, but it still feels strange to me in the context of mailto. I > also find it somewhat more appropriate to speak of retrieving > representations than retrieving resources. Therefore, I wonder whether it > might be a little better to say (changes marked with>...<): > > ---Proposed--- > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the>type of a retrieved representation<. In the currently > known usage scenarios, a 'mailto' URI>cannot be used to retreive > such representations<. Therefore, fragment identifiers are meaningless, > SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon > resolution. > ---End Proposed--- > > Noah > > [1] http://www.ietf.org/rfc/rfc3986.txt > > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > > > > > > > Larry Masinter<masinter@...> > Sent by: uri-request@... > 10/14/2009 01:31 PM > > To: "Martin J. Dürst"<duerst@...>, "Michael A. > Puls II"<shadow@...> > cc: "jwz@..."<jwz@...>, "PUBLIC-IRI@..." > <PUBLIC-IRI@...>, (bcc: Noah Mendelsohn/Cambridge/IBM) > Subject: RE: '#' in mailto URIs > > > What about encouraging URI/IRI scheme registrations to > say about whether fragment identifiers are necessary, > important, useful, allowed. > > mailto: could then disallow # fragment identifiers. > > Larry > > -----Original Message----- > From: "Martin J. Dürst" [mailto:duerst@...] > Sent: Tuesday, October 13, 2009 9:37 PM > To: Michael A. Puls II > Cc: Larry Masinter; jwz@... > Subject: Re: '#' in mailto URIs > > This is some very old mail. The current mailto: draft doesn't contain > anything about fragment identifiers. Should it? > > The text that I might put in (if we think we need some) is: > > >>>> > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the media type of the retrieved resource. In the currently > known usage scenarios, a 'mailto' URI does not serve to retreive a > resource with a media type. Therefore, fragment identifiers are > meaningless, SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored > upon resolution. > >>>> > > Regards, Martin. > > On 2008/04/02 6:32, Michael A. Puls II wrote: >> <!--"charset=utf-8"--> >> On Tue, 01 Apr 2008 13:18:27 -0400, Larry Masinter<LMM@...> wrote: >> >>>> So, it sounds like, in short, you're saying that Safari and Firefox >>>> shouldn't use # that way because it's reserved for future use in > mailto >>>> URIs. >>>> >>>> Perhaps you could explicitly note that in your next draft? >>> It isn't reserved "for future use", it's just not allowed. >> Martin said that # is *always* a fragment identifier. If it's not >> allowed, ever, then you're saying that mailto URIs don't support >> fragment identifiers and won't ever support fragment identifiers >> # is not allowed. (Which would make sense to me) >> >> If that's true, then a raw # that is found in a mailto URI (even though >> it's not allowed) would not be anything special and could just be >> accepted literally (if you were not going to throw an error). >> >> That would make sense to me. >> >> However, if mailto URIs support fragment identifiers or might support >> fragment identiers in the future, then # and everything after it in the >> URI needs to be ignored (at least by the mail client itself when >> and filling in the compose fields). >> >> What I got from Martin's response is that mailto URIs (like http URIs) >> support fragment identifiers. It's just that no client *currently* makes >> use of them in any way for 'mailto'. >> >> Basically, I just need to be sure what to do with a raw # in a mailto >> URI (even if it's an error). >> >>> Not every possible string has to have an interpretation. >> I don't know what you mean by that sentence or what it pertains to. >> Please clarify. >> >> Thanks >> > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
Re: '#' in mailto URIsOn Fri, 16 Oct 2009 13:05:40 -0400, Martin J. Dürst
<duerst@...> wrote: > I have added > >>>>>>>>> > Note that this specification, like any URI scheme specification, does > not define syntax or meaning of a fragment identifier, because these > depend on the type of a retrieved representation. In the currently > known usage scenarios, a 'mailto' URI cannot be used to retreive > such representations. Therefore, fragment identifiers are meaningless, > SHOULD NOT be used on 'mailto' URIs, and SHOULD be ignored upon > resolution. The character '#' in hfvalues MUST be escaped as %23. >>>>>>>>> > to my internal version. Thanks everyone. Here's what I'm going to take from this discussion: Mailto parsing error handling in the mail client: For "mailto:test#abc", "test" ends up in the To field. For "mailto:?subject=1#2", "1" ends up in the subject field. For "mailto:?body=before#after&subject=2", "before" ends up in the body field and all other fields are empty. In short, it gets cut off at the first # before parsing. This will make it so creators of mailto URIs don't try to use # without encoding it to %23 (mailto URI creators and validators can help with this too of course). This will pave the way for the future so that if mail clients want to use a fragid for something, there's no legacy/broken handling of # getting in the way. Browsers/UAs passing mailto links (via clickable link or user input) to mail clients: For "mailto:test#abc", "mailto:?subject=1#2", and "mailto:?body=before#after&subject=2", the browser passes all of it. This will leave the handling up the mail client and if a mail client in the future does something with fragment identifiers, the older browser will have no problem passing all of it to the new mail client. After all, the UA is just supposed to say to the mail client, "Here's what I was given, have at it!". -- Michael |
| Free embeddable forum powered by Nabble | Forum Help |