Re: [EAI] [Fwd: AD review of draft-duerst-mailto-bis-06.txt]

View: New views
4 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: [EAI] [Fwd: AD review of draft-duerst-mailto-bis-06.txt]

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Alex,

On 2009/08/04 20:54, Alexey Melnikov wrote:
> I believe I completed the todo item assigned to me during the Stockholm
> meeting.

Many thanks for your review. Very helpful.

 > as it seems to be blocking some EAI drafts.

Which? EAI seems to move forward nicely.

 > In Section 2:
 >
 > addr-spec = local-part "@" domain
 > local-part = dot-atom / quoted-string
 >
 > I don't think this change goes all the way to clarify that obsolete RFC
 > 5322 syntax and comments are disallowed.
 > RFC 5322:
 > domain = dot-atom / domain-literal / obs-domain
 >
 > domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
 >
 > dot-atom-text = 1*atext *("." 1*atext)
 >
 > dot-atom = [CFWS] dot-atom-text [CFWS]
 >
 > atom = [CFWS] 1*atext [CFWS]
 >
 > obs-domain = atom *("." atom)
 >
 > I think "obs-domain" and "domain-literal" definitions are problematic
 > (at least).

I changed 'domain' too simply be 'dot-atom'. I hope this works.
(not exactly my area of expertise)

 > Within 'mailto' URIs, the characters "?", "=", and "&" are reserved.
 >
 > "Reserved" in URI sense? If yes, I think this can be made clearer.

Changed the text to read:

Within 'mailto' URIs, the characters "?", "=", and "&" are reserved,
serving as delimiters. They must be escaped (as "%3F", "%3D", and "%26",
respectively) when not serving as delimiters.

But these are explained elsewhere in the spec, too, so that may now be
too much, and may get reduced again on proofreading (after careful
cross-checking).

 > 4. Percent-encoding can be used in the <domain> part of an <addr-
 > spec>, in order to denote an internationalized domain name. The
 > considerations for <reg-name> in [STD66] apply. In particular,
 > non-ASCII characters must
 >
 > s/must/MUST ?
 >
 > first be encoded according to UTF-8
 > [STD63], and then each octet of the corresponding UTF-8 sequence
 > must
 >
 > s/must/MUST ?
 >
 > be percent-encoded to be represented as URI characters. URI
 > producing applications must not
 >
 > s/must not/MUST NOT ?

Fixed all of those. Sometimes adjusted wording, sometimes upper-casing
and sometimes using clearly non-normative wording.

 > use percent-encoding in domain
 > names unless it is used to represent a UTF-8 character sequence.
 > When the internationalized domain name is used to compose a
 > message, the name must be transformed to the IDNA encoding where
 > appropriate [RFC3490]. URI producers should provide these domain
 > names in the IDNA encoding, rather than percent-encoded, if they
 > wish to maximize interoperability with legacy 'mailto' URI
 > interpreters.
 >
 > As per IRI bar BOF in Stockholm: this needs to be aligned with any
 > [potential] changes to the IRI spec.

Yes. I personally don't think we need to change this (except for some
more careful wording).


 > 5. Percent-encoding of non-ASCII octets in the <local-part> of an
 > <addr-spec> is reserved for the internationalization of the
 > <local-part>. Non-ASCII characters must
 >
 > s/must/MUST ?
 >
 > first be encoded
 > according to UTF-8 [STD63], and then each octet of the
 > corresponding UTF-8 sequence must
 >
 > s/must/MUST ?
 >
 > be percent-encoded to be
 > represented as URI characters. Any other percent-encoding of
 > non-ASCII characters is prohibited. When a <local-part>
 > containing non-ASCII characters will be used to compose a
 > message, the <local-part> must
 >
 > s/must/MUST ?
 >
 > be transformed to conform to
 > whatever encoding may be defined in a future specification for
 > the internationalization of email addresses.
 >
 > [...]
 >
 > Non-ASCII characters can be encoded in hfvalue as follows:
 > [...]
 >
 > 2. Non-ASCII characters can be encoded according to UTF-8 [STD63],
 > and then each octet of the corresponding UTF-8 sequence is
 > percent-encoded to be represented as URI characters. When header
 > field values encoded in this way are used to compose a message,
 > the <hfvalue> must
 >
 > s/must/MUST ?

Done (sometimes with wording changes).

 > be transformed into MIME encoded words
 > [RFC2047], except for an <hfvalue> of a "body" <hfname>, which
 > has to be encoded according to [RFC2045]. Please note that for
 > MIME encoded words and for bodies in composed email messages,
 > encodings other than UTF-8 MAY be used as long as the characters
 > are properly transcoded.
 >
 > [...]
 >
 > MIME encoded words and UTF-8-based percent-encoding SHOULD NOT both
 > be used sequentially in the same <hfvalue>, and MUST NOT be combined.
 >
 > Can you clarify what you are trying to say here?
 > In particular I am not clear on the meaning of "sequentially" here.

Ok. Sequentially means e.g. using MIME for the first word in the
subject, and UTF-8-based percent-encoding for the second word.

As for the "MUST NOT be combined", that either makes MIME completely
impossible ('?' and '=' used in MIME encoded words have to be reencoded,
but that isn't allowed) or leaves that provision hanging in the air ('?'
and '=' are US-ASCII, so UTF-8 is irrelevant when percent-encoding them)
depending on the interpretation of 'UTF-8'. So that has to be fixed.

First I was thinking about replacing the paragraph with something like:
"In mailto: URIs, UTF-8-based percent-encoding is preferred to MIME
encoded words because for the later, the '=' and '?' characters have to
be percent-encoded."

But then that's also slightly inappropriate because MIME encoded words
may work in some old implementations where UTF-8 doesn't. Then I went
ahead and deleted that paragraph (because even 'sequential' mixing may
be okay assuming implementations peel off one encoding layer after the
other), and just inserted a short notice about the need to
percent-encode '=' and '?' in point 1. a few lines above.


 > In Section 3:
 >
 > In current practice, resolving URIs such as those in the 'http' URI
 > scheme causes an immediate interaction between client software and a
 > host running an interactive server. The 'mailto' URI has unusual
 > semantics because resolving such a URI does not cause an immediate
 > interaction. Instead, the client creates a message to the designated
 > address with the various header fields set as default. The user can
 > edit the message, send this message unedited, or choose not to send
 > the message. The operation of how any URI scheme is resolved is not
 > mandated by the URI specifications.
 >
 > The last sentence doesn't seem to be related to the rest of the
 > paragraph. Should it be deleted or moved to a separate paragraph?

This sentence is giving the motivation for why the paragraph starts with
"in current practice" and why there isn't a more normative definition
along the lines of "to resolve a 'mailto' URI scheme, you MUST ...". So
the position of this sentence seems okay to me. If you have any proposal
for how to make this clearer, I'll be glad to use that.


 > In Section 4:
 >
 > The creator of a 'mailto' URI cannot expect the resolver of a URI to
 > understand more than the "subject" header field and "body".
 >
 > What about the "To" header field?

I don't know too much about actual implementations, but the fact that
what corresponds to 'To' is usually given befor the '?' seems to suggest
to me that universal support for 'To' is neither necessary nor therefore
guaranteed.


 > Clients
 > that resolve 'mailto' URIs into mail messages MUST be able to
 > correctly create [RFC5322]-compliant mail messages using the
 > "subject" header field and "body".
 >
 > In Section 8:
 >
 > A 'mailto' URI gives a template for a message that can be sent by
 > mail client software. The contents of that template may be opaque or
 > difficult to read by the user at the time of specifying the URI.
 > Thus, a mail client should never send a message based on a 'mailto'
 >
 > s/should/SHOULD ?
 >
 > URI without first showing the full message that will be sent to the
 > user (including all header fields that were specified by the 'mailto'
 > URI), fully decoded, and asking the user for approval to send the
 > message as electronic mail. The mail client should also make it
 >
 > s/should/SHOULD
 >
 > clear that the user is about to send an electronic mail message,
 > since the user may not be aware that this is the result of a 'mailto'
 > URI.
 >
 > A mail client should never send anything without complete disclosure
 >
 > s/should/SHOULD
 >
 > to the user of what will be sent; it should disclose not only the
 >
 > s/should/SHOULD

Done.


 > message destination, but also any header fields. Unrecognized header
 > fields, or header fields with values inconsistent with those the mail
 > client would normally send should be especially suspect. MIME header
 > fields (MIME- Version, Content-*) are most likely inappropriate,
 > except when added by the MUA to correctly encode the text(s) being
 > sent, as are those relating to routing (From, Apparently-To, etc.)
 >
 >
 > 9. IANA Considerations
 >
 > This document changes the definition of the 'mailto' URI scheme; the
 > registry of URI schemes needs to be updated to refer to this document
 > rather than its predecessor, [RFC2368].
 >
 > It doesn't look like the proper URI registration template was ever
 > specified in this document or its predecessor.

Of course not in its predecessor, that was before we had any templates,
I guess. Anyway, I added a template, please have a look at it when I
post the draft.

Regards,    Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


Parent Message unknown Re: [EAI] [Fwd: AD review of draft-duerst-mailto-bis-06.txt]

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Alex, others,

Sorry to be late with this answer. Please check
draft-duerst-mailto-bis-07.txt (soon to appear) to see whether the
updates are according to your expectations.

On 2009/10/17 4:49, Alexey Melnikov wrote:

> Martin J. Dürst wrote:
>
>> Hello Alex,
>
> Hi Martin,
>
>> On 2009/08/04 20:54, Alexey Melnikov wrote:
>>
>>> I believe I completed the todo item assigned to me during the Stockholm
>>> meeting.
>>
>> Many thanks for your review. Very helpful.
>>
>> > as it seems to be blocking some EAI drafts.
>>
>> Which? EAI seems to move forward nicely.
>
> I think it was either the "mailing lists" or the "downgrade" document.

Ok.

>> > In Section 2:
>> >
>> > addr-spec = local-part "@" domain
>> > local-part = dot-atom / quoted-string
>> >
>> > I don't think this change goes all the way to clarify that obsolete RFC
>> > 5322 syntax and comments are disallowed.
>> > RFC 5322:
>> > domain = dot-atom / domain-literal / obs-domain
>> >
>> > domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
>> >
>> > dot-atom-text = 1*atext *("." 1*atext)
>> >
>> > dot-atom = [CFWS] dot-atom-text [CFWS]
>> >
>> > atom = [CFWS] 1*atext [CFWS]
>> >
>> > obs-domain = atom *("." atom)
>> >
>> > I think "obs-domain" and "domain-literal" definitions are problematic
>> > (at least).
>>
>> I changed 'domain' too simply be 'dot-atom'. I hope this works.
>> (not exactly my area of expertise)
>
> I think a version of "domain-literal" which doesn't allow CFWS/FWS is
> also needed. E.g. to allow IPv6.
> Ok, this might be obscure. But I think earlier versions allowed for that.

Can you (or somebody else) tell me exactly what to do? For me, all this
mail-related syntax is very unsure ground.



>> > 4. Percent-encoding can be used in the <domain> part of an <addr-
>> > spec>, in order to denote an internationalized domain name. The
>> > considerations for <reg-name> in [STD66] apply. In particular,
>> > non-ASCII characters must
>> >
>> > s/must/MUST ?
>> >
>> > first be encoded according to UTF-8
>> > [STD63], and then each octet of the corresponding UTF-8 sequence
>> > must
>> >
>> > s/must/MUST ?
>> >
>> > be percent-encoded to be represented as URI characters. URI
>> > producing applications must not
>> >
>> > s/must not/MUST NOT ?
>>
>> Fixed all of those. Sometimes adjusted wording, sometimes upper-casing
>> and sometimes using clearly non-normative wording.
>
> Ok.

Please check the newest draft.

>> > [...]
>> >
>> > MIME encoded words and UTF-8-based percent-encoding SHOULD NOT both
>> > be used sequentially in the same <hfvalue>, and MUST NOT be combined.
>> >
>> > Can you clarify what you are trying to say here?
>> > In particular I am not clear on the meaning of "sequentially" here.
>
>> Ok. Sequentially means e.g. using MIME for the first word in the
>> subject, and UTF-8-based percent-encoding for the second word.
>>
>> As for the "MUST NOT be combined", that either makes MIME completely
>> impossible ('?' and '=' used in MIME encoded words have to be
>> reencoded, but that isn't allowed) or leaves that provision hanging in
>> the air ('?' and '=' are US-ASCII, so UTF-8 is irrelevant when
>> percent-encoding them) depending on the interpretation of 'UTF-8'. So
>> that has to be fixed.
>>
>> First I was thinking about replacing the paragraph with something like:
>> "In mailto: URIs, UTF-8-based percent-encoding is preferred to MIME
>> encoded words because for the later, the '=' and '?' characters have
>> to be percent-encoded."
>>
>> But then that's also slightly inappropriate because MIME encoded words
>> may work in some old implementations where UTF-8 doesn't. Then I went
>> ahead and deleted that paragraph (because even 'sequential' mixing may
>> be okay assuming implementations peel off one encoding layer after the
>> other), and just inserted a short notice about the need to
>> percent-encode '=' and '?' in point 1. a few lines above.
>
> Some explanation of this in the document might be useful though.

I have looked through the document again. I think that points 1. and 2.
(just after "Non-ASCII characters can be encoded in hfvalue as
follows:") are perfectly enough.

There's also an example of both using UTF-8 and encoded-word syntax at
the start of Section 6.3.

If you really think more is needed, then please propose some wording.


>> > In Section 3:
>> >
>> > In current practice, resolving URIs such as those in the 'http' URI
>> > scheme causes an immediate interaction between client software and a
>> > host running an interactive server. The 'mailto' URI has unusual
>> > semantics because resolving such a URI does not cause an immediate
>> > interaction. Instead, the client creates a message to the designated
>> > address with the various header fields set as default. The user can
>> > edit the message, send this message unedited, or choose not to send
>> > the message. The operation of how any URI scheme is resolved is not
>> > mandated by the URI specifications.
>> >
>> > The last sentence doesn't seem to be related to the rest of the
>> > paragraph. Should it be deleted or moved to a separate paragraph?
>>
>> This sentence is giving the motivation for why the paragraph starts
>> with "in current practice" and why there isn't a more normative
>> definition along the lines of "to resolve a 'mailto' URI scheme, you
>> MUST ...". So the position of this sentence seems okay to me. If you
>> have any proposal for how to make this clearer, I'll be glad to use that.
>
> It might be better to move this sentence to the beginning of this
> paragraph?

done.


>> > In Section 4:
>> >
>> > The creator of a 'mailto' URI cannot expect the resolver of a URI to
>> > understand more than the "subject" header field and "body".
>> >
>> > What about the "To" header field?
>>
>> I don't know too much about actual implementations, but the fact that
>> what corresponds to 'To' is usually given befor the '?' seems to
>> suggest to me that universal support for 'To' is neither necessary nor
>> therefore guaranteed.
>
> It might be good to do some research on this.
> If mailto URI parameters are handled at all, I would be expecting To and
> the address at the beginning to be always handled.

My expectation is different. The mailto scheme started just with the
part before the '?', nothing else. Later, "Subject", and even later,
"Body" were added, because they proved to be most useful (e.g. for
subscription/unsubscription links,...). From such a historic viewpoint,
"To" is least important, because it essentially just duplicates
functionality. It's also more difficult to implement correctly than the
average header field, because the data has to be merged.

Of course if somebody can come up with some research that shows
something else, I'm glad to adjust the wording.


Regards,   Martin.


--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


Parent Message unknown Re: [EAI] [Fwd: AD review of draft-duerst-mailto-bis-06.txt]

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[cc'ed to public-iri@...]

Hello Shawn,

Many thanks for your comments.

On 2009/10/27 4:26, Shawn Steele wrote:
> Currently lots of software seem to just allow Unicode in the mailto:  (Or actually convert it based on the encoding of the web page).  I'm not going to argue whether that's "correct" or not, but it does seem to be the prevailing current practice.

Of course. These are mailto: IRIs. Nothing wrong with that, as far as I
understand. Actually, one main point of update from RFC 2368 is to allow
that, by allowing mailto: URIs to use UTF-8-based %-encoding.


>  From the experience with IDN where non-ASCII values are being directly encoded in http, I think it's unrealistic that all mailto: URIs will be "correctly" % escaped.  On the contrary, I think most probably won't be.

If by "not correctly escaped" you mean "not escaped", then that's just
fine, they are IRIs. If you meant "not escaped based on UTF-8, but based
on some other encoding", then than would be a problem.


> So I'd like the mailto-bis to allow that applications MAY recognize unescaped UTF-8 in UTF-8 documents.

It's not the mailto: URI scheme definition, but the spec for that
application that has to say whether you can use IRIs or not. Once you
can use IRIs, you can use unescaped UTF-8 in UTF-8 documents, unescaped
Shift_JIS in Shift_JIS documents (being converted to UTF-8 when
conversion to an URI is necessary), and so on.

So I don't see the need for any textual changes. If you think some
textual changes are necessary, please send a more concrete proposal.

> -Shawn
>
> P.S: Yes, "lots" includes Microsoft software since it's easy to play with.  On my machine if I open "run", then type mailto:shäwn, then Outlook opens up with shäwn in the To: line.  Same thing happens if I stick it in an href in an HTML document.  I think I even tried it with a different browser (sorry, don't remember which one, don't have others installed at the moment).  Of course I couldn't actually send the mail, but the "mailto" part worked.

I'm not sure who at Microsoft writes the spec for the "run" command, and
whether this spec is publicly viewable or not, but essentially it seems
to treat input that looks like an IRI/URI as an IRI, and do the right
encoding conversion (I assume that internally, it uses UTF-16, not
UTF-8, or might even use some OEM encoding or whatever).

Also, while the HTML spec only allows IRI processing as error behavior,
implementations actually allow IRIs.

So everything is fine as far as I understand.

Regards,    Martin.

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...


Parent Message unknown Re: [EAI] [Fwd: AD review of draft-duerst-mailto-bis-06.txt]

by "Martin J. Dürst" :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[copying public-iri@...]

Hello Ted, others,

[everybody: There is a short summary at the end of this mail.]

On 2009/10/27 5:03, Ted Hardie wrote:

> On Mon, Oct 26, 2009 at 12:26 PM, Shawn Steele
> <Shawn.Steele@...>  wrote:
>>   On my machine if I open "run", then type mailto:shäwn, then Outlook
>> opens up with shäwn in the To: line.  Same thing happens if I stick it
>> in an href in an HTML document.  I think I even tried it with a
>> different browser (sorry, don't remember which one, don't have others
>> installed at the moment).  Of course I couldn't actually send the
>> mail, but the "mailto" part worked.
>
> For "the 'mailto part" to be considered to have worked, the mail has
> to get end to end, at least in my opinion.

At first sight, this seems reasonable. But it's basically the same
difference as the difference between an http: URI/IRI that is
*syntactically* correct and one that *actually resolves* (i.e. not
produces a 404 or something similar).

> The problem here is fundamentally that some part of the system has to
> take what the user thinks is correct and turn into something that the
> mail system can deliver.  The more pieces we allow to contain "what
> the user thinks is correct" rather than "what the mail system can
> deliver", the further down into the system any translation between the
> two must occur.

There are two problems with mailto:shäwn:

1) There's no At-sign
2) In particular for LHS, shäwn only works under EAI

For your everyday email user agent, both of these mean that the mail
won't be sent.


> Unless "shäwn" is a valid email address, showing that in a protocol
> slot (which mailto is) seems like the wrong trade-off to me.

First, the current draft clearly says that an at-sign is needed, by
using <addr-spec> (http://tools.ietf.org/html/rfc5322#section-3.4.1).
So mailto:shäwn is not a valid IRI even according to an updated mailto:
spec. So to conform to the spec, we may have to try with something like
mailto:shawn@shäwn.com or so.

But the question is where this should be checked. An important principle
that many people often forget is that IRIs/URIs are just "carriers".
Because there are many schemes, it is completely unrealistic to expect a
generic IRI/URI handler to check these. So checking is up to the
application that "resolves" the mailto: IRI/URI, which is the mail user
agent.

Now what does the average mail user agent do if you put "shäwn" into a
To: field? I can only report about Thunderbird (Eudora version,
3.0b1pre). It absolutely has no problem putting "shäwn" into a To: field
(I have to admit that I actually tested with a Cc: field, but I don't
think there would be any difference for a To: field). Thinking about it,
that's quite understandable, it also shouldn't produce an error if I put
"Dürst” into such a field, in particular if I continue input with "
<duerst@...>". The "Dürst” will result in an encoded word,
but it would be weird to ask the user to input that encoded word, or to
show that encoded word to the user.

[The syntax proposed in the draft, as far as I understand, excludes
something like "name <lhs@rhs>" anyway. This is different in RFC 2368,
which I thinks allows this, but I got told that this wasn't actually
supported well, and RFC 2368's predecessor (RFC 1738) also didn't allow
it. (see http://tools.ietf.org/html/rfc1738#section-3.5)]

When putting "shäwn" into a Cc: field, what happened is that when I
tried to send the mail, there was an error message that to me suggested
that between the mail user agent and the server, the non-7bit byte
caused a connection abort. Not a very helpful message for a general
user, but not an issue for the mailto: spec. I have no idea whether
other mail user agents do better here (e.g. checking and telling the
user that the address isn't well-formed before they actually try to send
something).

On top of all the above thoughts, if we want to claim:

 > For "the 'mailto part" to be considered to have worked, the mail has
 > to get end to end, at least in my opinion.

then that would also exclude things such as
mailto:nobody@...
would also not be valid mailto: IRIs/URIs. I hope you agree that it
doesn't make sense to actually send an email just to figure out whether
a mailto: URI/IRI is valid or not. So I don't think it makes sense to
include the existence of a mail address a precondition for the validity
of a mailto: URI.

Also, the draft does not contain any syntax restrictions for any of the
other fields (body, Subject:, To:, Cc:, Bcc:,...). So according to the
draft, there has to be an at-sign before the '?', but there is no check
for an at-sign in the "foo" part of mailto:a@...?cc=foo.


So in summary:

- It's the responsibility of the resolver (in this case the mail user
agent), not some generic IRI/URI software, to check for possible syntax
problems in the IRI/URI.

- There's a whole series of different cases ranging from a fully
workable example to a syntactically totally invalid example. Therefore,
problems may be detected (or show up) sooner or later.

- IDNA-aware slots, or EAI, may be a different protocol from old-style
SMTP without any extensions, but the former use (somewhat different)
protocol elements nevertheless.

- If and where interoperability can be achieved with a protocol element
that is closer to the user (i.e. an IDN, or an IRI,...), there is no
reason to use a lower-level protocol element (e.g. xn--... or lots of
%-escapes or a mime encoded word (maybe with %-escapes on top of that).


> YMMV; offer not good in jurisdictions legislating the value of pi.

IMHO, offer not good even in jurisdictions that leave the value of pi to
mathematicians :-)


Regards,   Martin.


--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@...