|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?see the source of it, please. In Firefox, Safari and Opera, I get the following for the resolved .href value of the link. (IE8 just shows the value as it is in the source) <mailto:?Subject=%D0%9C%D0%B0%D0%B9%D0%BE%D1%80%D1%83%20%D0%95%D0%B2%D1%81%D1%8E%D0%BA%D0%BE%D0%B2%D1%83%20%D0%BF%D1%80%D0%B5%D0%B4%D1%8A%D1%8F%D0%B2%D0%BB%D0%B5%D0%BD%D0%BE%20%D0%BE%D0%BA%D0%BE%D0%BD%D1%87%D0%B0%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D0%BE%D0%B5%20%D0%BE%D0%B1%D0%B2%D0%B8%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5&Body=%D0%9C%D0%9E%D0%A1%D0%9A%D0%92%D0%90,%201%20%D1%81%D0%B5%D0%BD%D1%82%D1%8F%D0%B1%D1%80%D1%8F.%20%D0%9C%D0%B0%D0%B9%D0%BE%D1%80%D1%83%20%D0%94%D0%B5%D0%BD%D0%B8%D1%81%D1%83%20%D0%95%D0%B2%D1%81%D1%8E%D0%BA%D0%BE%D0%B2%D1%83,%20%D1%83%D1%81%D1%82%D1%80%D0%BE%D0%B8%D0%B2%D1%88%D0%B5%D0%BC%D1%83%2027%20%D0%B0%D0%BF%D1%80%D0%B5%D0%BB%D1%8F%20%D1%81%D1%82%D1%80%D0%B5%D0%BB%D1%8C%D0%B1%D1%83%20%D0%B2%20%D1%81%D1%83%D0%BF%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82%D0%B5%20%D0%BD%D0%B0%20%D1%8E%D0%B3%D0%B5%20%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D1%8B,%20%D0%BF%D1%80%D0%B5%D0%B4%D1%8A%D1%8F%D0%B2%D0%BB%D0%B5%D0%BD%D0%BE%20%D0%BE%D0%BA%D0%BE%D0%BD%D1%87%D0%B0%D1%82%D0%B5%D0%BB%D1%8C%D 0%BD%D0%BE%D0%B5%20%D0%BE%D0%B1%D0%B2%D0%B8%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5,%20%D0%BF%D0%B5%D1%80%D0%B5%D0%B4%D0%B0%D0%B5%D1%82%20%D0%98%D0%A2%D0%90%D0%A0-%D0%A2%D0%90%D0%A1%D0%A1.%20%0D%0A%D0%9F%D0%BE%D0%BB%D0%BD%D0%B0%D1%8F%20%D0%B2%D0%B5%D1%80%D1%81%D0%B8%D1%8F%20%D1%81%D1%82%D0%B0%D1%82%D1%8C%D0%B8%20%D0%BD%D0%B0> However, should I instead be getting: <mailto:?Subject=%CC%E0%E9%EE%F0%F3%20%C5%E2%F1%FE%EA%EE%E2%F3%20%EF%F0%E5%E4%FA%FF%E2%EB%E5%ED%EE%20%EE%EA%EE%ED%F7%E0%F2%E5%EB%FC%ED%EE%E5%20%EE%E1%E2%E8%ED%E5%ED%E8%E5&Body=%CC%CE%D1%CA%C2%C0,%201%20%F1%E5%ED%F2%FF%E1%F0%FF.%20%CC%E0%E9%EE%F0%F3%20%C4%E5%ED%E8%F1%F3%20%C5%E2%F1%FE%EA%EE%E2%F3,%20%F3%F1%F2%F0%EE%E8%E2%F8%E5%EC%F3%2027%20%E0%EF%F0%E5%EB%FF%20%F1%F2%F0%E5%EB%FC%E1%F3%20%E2%20%F1%F3%EF%E5%F0%EC%E0%F0%EA%E5%F2%E5%20%ED%E0%20%FE%E3%E5%20%CC%EE%F1%EA%E2%FB,%20%EF%F0%E5%E4%FA%FF%E2%EB%E5%ED%EE%20%EE%EA%EE%ED%F7%E0%F2%E5%EB%FC%ED%EE%E5%20%EE%E1%E2%E8%ED%E5%ED%E8%E5,%20%EF%E5%F0%E5%E4%E0%E5%F2%20%C8%D2%C0%D0-%D2%C0%D1%D1.%20%0D%0A%CF%EE%EB%ED%E0%FF%20%E2%E5%F0%F1%E8%FF%20%F1%F2%E0%F2%FC%E8%20%ED%E0> in browsers, according to HTML5/web addresses/iri-bis? (That's what I'd get in browsers if it was an http link instead) The reason I ask is that for 'http', browsers use the document's charset to resolve the link into a URI, but with 'mailto', they always(by default at least) force UTF-8 in this case (which makes things a lot easier for passing the data to webmails and other mail clients, which usually want percent-encoded utf-8 to decode). What do HTML5/web addresses/iri-bis say about this exactly? Do they allow Firefox, Opera and Safari to do what they do, or do they say that the resolving is like http for all protocols? Or, is this undefined and the browse does what it wants? Thanks -- Michael |
|
|
Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?On Sat, 05 Sep 2009 13:04:20 +0200, Michael A. Puls II
<shadow2531@...> wrote: > What do HTML5/web addresses/iri-bis say about this exactly? Do they allow > Firefox, Opera and Safari to do what they do, or do they say that the > resolving is like http for all protocols? It seems to be scheme-independent at this point. I guess it should be specific to http/https but someone should verify this to be sure. -- Anne van Kesteren http://annevankesteren.nl/ |
|
|
Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?Hello Michael,
Many thanks for this example. I hope Anne can do some checks on the HTML5 side. I just tried your example in Opera 10, and it gave the UTF-8 based URI when I asked for 'copy link address'. I also clicked on the link and asked it to use my default MUA (Thunderbird with Eudora), and I got a draft email with legible text (Moskow at the start, and ITAR-TASS, that's about how much Russian I read). It makes quite a bit of sense to limit the special processing for query parts (reencode back to the document encoding) to http/https. The reason for this special processing in the first place is that it is customary in Web forms (submitted with http/https) to use the encoding of the page the form is in for query parameters, and this custom was transferred to direct activation of links with query parts. For actually submitting a form, what happens isn't part of the IRI or URI spec, but part of the preparation; if the form submission URI/IRI had a query part, it's either ignored or the data is inserted into the form fields (don't know which one actually applies), but either way, there is no need to reencode the data, it's just a matter of saying what bytes you send to the server from the form. Regards, Martin. On 2009/09/05 20:04, Michael A. Puls II wrote: > Attached is 1251.html. It's a Windows-1251 russian page. Load it and > also see the source of it, please. > > In Firefox, Safari and Opera, I get the following for the resolved .href > value of the link. (IE8 just shows the value as it is in the source) > > <mailto:?Subject=%D0%9C%D0%B0%D0%B9%D0%BE%D1%80%D1%83%20%D0%95%D0%B2%D1%81%D1%8E%D0%BA%D0%BE%D0%B2%D1%83%20%D0%BF%D1%80%D0%B5%D0%B4%D1%8A%D1%8F%D0%B2%D0%BB%D0%B5%D0%BD%D0%BE%20%D0%BE%D0%BA%D0%BE%D0%BD%D1%87%D0%B0%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D0%BE%D0%B5%20%D0%BE%D0%B1%D0%B2%D0%B8%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5&Body=%D0%9C%D0%9E%D0%A1%D0%9A%D0%92%D0%90,%201%20%D1%81%D0%B5%D0%BD%D1%82%D1%8F%D0%B1%D1%80%D1%8F.%20%D0%9C%D0%B0%D0%B9%D0%BE%D1%80%D1%83%20%D0%94%D0%B5%D0%BD%D0%B8%D1%81%D1%83%20%D0%95%D0%B2%D1%81%D1%8E%D0%BA%D0%BE%D0%B2%D1%83,%20%D1%83%D1%81%D1%82%D1%80%D0%BE%D0%B8%D0%B2%D1%88%D0%B5%D0%BC%D1%83%2027%20%D0%B0%D0%BF%D1%80%D0%B5%D0%BB%D1%8F%20%D1%81%D1%82%D1%80%D0%B5%D0%BB%D1%8C%D0%B1%D1%83%20%D0%B2%20%D1%81%D1%83%D0%BF%D0%B5%D1%80%D0%BC%D0%B0%D1%80%D0%BA%D0%B5%D1%82%D0%B5%20%D0%BD%D0%B0%20%D1%8E%D0%B3%D0%B5%20%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D1%8B,%20%D0%BF%D1%80%D0%B5%D0%B4%D1%8A%D1%8F%D0%B2%D0%BB%D0%B5%D0%BD%D0%BE%20%D0%BE%D0%BA%D0%BE%D0%BD%D1%87%D0%B0%D1%82%D0%B5%D0 %BB%D1%8C%D0%BD%D0%BE%D0%B5%20%D0%BE%D0%B1%D0%B2%D0%B8%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5,%20%D0%BF%D0%B5%D1%80%D0%B5%D0%B4%D0%B0%D0%B5%D1%82%20%D0%98%D0%A2%D0%90%D0%A0-%D0%A2%D0%90%D0%A1%D0%A1.%20%0D%0A%D0%9F%D0%BE%D0%BB%D0%BD%D0%B0%D1%8F%20%D0%B2%D0%B5%D1%80%D1%81%D0%B8%D1%8F%20%D1%81%D1%82%D0%B0%D1%82%D1%8C%D0%B8%20%D0%BD%D0%B0> > > > However, should I instead be getting: > > <mailto:?Subject=%CC%E0%E9%EE%F0%F3%20%C5%E2%F1%FE%EA%EE%E2%F3%20%EF%F0%E5%E4%FA%FF%E2%EB%E5%ED%EE%20%EE%EA%EE%ED%F7%E0%F2%E5%EB%FC%ED%EE%E5%20%EE%E1%E2%E8%ED%E5%ED%E8%E5&Body=%CC%CE%D1%CA%C2%C0,%201%20%F1%E5%ED%F2%FF%E1%F0%FF.%20%CC%E0%E9%EE%F0%F3%20%C4%E5%ED%E8%F1%F3%20%C5%E2%F1%FE%EA%EE%E2%F3,%20%F3%F1%F2%F0%EE%E8%E2%F8%E5%EC%F3%2027%20%E0%EF%F0%E5%EB%FF%20%F1%F2%F0%E5%EB%FC%E1%F3%20%E2%20%F1%F3%EF%E5%F0%EC%E0%F0%EA%E5%F2%E5%20%ED%E0%20%FE%E3%E5%20%CC%EE%F1%EA%E2%FB,%20%EF%F0%E5%E4%FA%FF%E2%EB%E5%ED%EE%20%EE%EA%EE%ED%F7%E0%F2%E5%EB%FC%ED%EE%E5%20%EE%E1%E2%E8%ED%E5%ED%E8%E5,%20%EF%E5%F0%E5%E4%E0%E5%F2%20%C8%D2%C0%D0-%D2%C0%D1%D1.%20%0D%0A%CF%EE%EB%ED%E0%FF%20%E2%E5%F0%F1%E8%FF%20%F1%F2%E0%F2%FC%E8%20%ED%E0> > > > in browsers, according to HTML5/web addresses/iri-bis? (That's what I'd > get in browsers if it was an http link instead) > > The reason I ask is that for 'http', browsers use the document's charset > to resolve the link into a URI, but with 'mailto', they always(by > default at least) force UTF-8 in this case (which makes things a lot > easier for passing the data to webmails and other mail clients, which > usually want percent-encoded utf-8 to decode). > > What do HTML5/web addresses/iri-bis say about this exactly? Do they > allow Firefox, Opera and Safari to do what they do, or do they say that > the resolving is like http for all protocols? > > Or, is this undefined and the browse does what it wants? > > Thanks > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?On Thu, 10 Sep 2009 11:28:14 +0200, Martin J. Dürst
<duerst@...> wrote: > Many thanks for this example. I hope Anne can do some checks on the > HTML5 side. http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#urls has the HTML5 rules from when this was still in the HTML5 specification. As far as I can tell the encoding <query> was done irrespective of the scheme per that specification. Someone should probably study implementations to see if this should be changed to just affect http/https or more. -- Anne van Kesteren http://annevankesteren.nl/ |
|
|
Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?On Thu, 10 Sep 2009 05:28:14 -0400, Martin J. Dürst
<duerst@...> wrote: > Hello Michael, > Many thanks for this example. I hope Anne can do some checks on the > HTML5 side. I just tried your example in Opera 10, and it gave the UTF-8 > based URI when I asked for 'copy link address'. I also clicked on the > link and asked it to use my default MUA (Thunderbird with Eudora), and I > got a draft email with legible text (Moskow at the start, and ITAR-TASS, > that's about how much Russian I read). Thanks. Yes, I get that utf-8 behavior in Firefox and Safari also. I think things are more interoperable that way. However, my concern is that HTML5 (well, the iri/uri spec additions for HTML5) contradicts that and says to use the page's encoding instead. I do not feel that is a good idea for some schemes. Also, for 'mailto:' links in web pages, I want to specifically avoid the part before '?' and the part after '?' being resolved against a different encoding. For mailto:, that would be undesirable and would force authors to use "mailto:?to=value" instead of "mailto:value" so that the to value is resolved against the same encoding as the other values (like subject and body etc.). But, using mailto:?to= still isn't supported as well as mailto:value, so that'd be bad too. For mailto links in html pages, I think the resolving should always be (by default at least) utf-8 all the way through. (So that the .href getter on a link and copy link address etc. all return something utf-8-based regardless of the page's encoding). This is basically what browsers do now. Just want to make sure the specs don't contradict that, as browsers do it that way for a reason. For mailto in HTML forms, I don't have too much preference as no one uses it. I also think that for javascript:, it's probably best to always resolve to percent-encoded utf-8 too. Also, if I remember correctly, it was desired that http(s) in HTML5 pages be utf-8-only, but that wasn't possible for legacy reasons. I don't think mailto: and some other schemes have that restraint. With that said, as Anne said, maybe using the page encoding should only be a must for http(s) and that other protocols may ignore the page's encoding and resolve to percent-encoded UTF-8. Now, if JS in browsers had an iconv() so that you can easily convert to what you want and browsers had options to control the encoding, per-protocol, for .href etc., per-site, then, maybe it wouldn't matter. But, for now, just always using utf-8 for some schemes makes things consistent and allows that expectation to be relied upon. Now, I'm not 100% sure what iri-bis/HTML5 says about this. It's really low-level. which is why I'm asking for clarification (which Larry said he'd respond when he gets a chance). -- Michael |
|
|
Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?Hello Michael,
On 2009/09/11 7:24, Michael A. Puls II wrote: > On Thu, 10 Sep 2009 05:28:14 -0400, Martin J. Dürst > <duerst@...> wrote: > >> Hello Michael, >> Many thanks for this example. I hope Anne can do some checks on the >> HTML5 side. I just tried your example in Opera 10, and it gave the >> UTF-8 based URI when I asked for 'copy link address'. I also clicked >> on the link and asked it to use my default MUA (Thunderbird with >> Eudora), and I got a draft email with legible text (Moskow at the >> start, and ITAR-TASS, that's about how much Russian I read). > > Thanks. > > Yes, I get that utf-8 behavior in Firefox and Safari also. I think > things are more interoperable that way. > > However, my concern is that HTML5 (well, the iri/uri spec additions for > HTML5) contradicts that and says to use the page's encoding instead. I > do not feel that is a good idea for some schemes. > > Also, for 'mailto:' links in web pages, I want to specifically avoid the > part before '?' and the part after '?' being resolved against a > different encoding. For mailto:, that would be undesirable and would > force authors to use "mailto:?to=value" instead of "mailto:value" so > that the to value is resolved against the same encoding as the other > values (like subject and body etc.). But, using mailto:?to= still isn't > supported as well as mailto:value, so that'd be bad too. > > For mailto links in html pages, I think the resolving should always be > (by default at least) utf-8 all the way through. (So that the .href > getter on a link and copy link address etc. all return something > utf-8-based regardless of the page's encoding). This is basically what > browsers do now. Just want to make sure the specs don't contradict that, > as browsers do it that way for a reason. I agree, and I haven't found anybody who disagrees yet. If that stays as it is, I'll make sure that the spec says what it should say on that point. Regards, Martin. > For mailto in HTML forms, I don't have too much preference as no one > uses it. > > I also think that for javascript:, it's probably best to always resolve > to percent-encoded utf-8 too. > > Also, if I remember correctly, it was desired that http(s) in HTML5 > pages be utf-8-only, but that wasn't possible for legacy reasons. I > don't think mailto: and some other schemes have that restraint. > > With that said, as Anne said, maybe using the page encoding should only > be a must for http(s) and that other protocols may ignore the page's > encoding and resolve to percent-encoded UTF-8. > > Now, if JS in browsers had an iconv() so that you can easily convert to > what you want and browsers had options to control the encoding, > per-protocol, for .href etc., per-site, then, maybe it wouldn't matter. > But, for now, just always using utf-8 for some schemes makes things > consistent and allows that expectation to be relied upon. > > Now, I'm not 100% sure what iri-bis/HTML5 says about this. It's really > low-level. which is why I'm asking for clarification (which Larry said > he'd respond when he gets a chance). > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
What schemes take query parts? (was: Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?)Dear URI experts,
[I have copied the URI mailing list because I hope to get some information from there.] In the context of HTML5-specific treatment of query parts in IRIs/URIs (using the document encoding rather than UTF-8 when converting non-ASCII characters to %-encoding), Michael A. Puls II recently reported that such behavior should not apply to mailto: URIs. Now we are trying to figure out what happens, or what's appropriate, for other kinds of URI schemes. In particular, we also want to know which schemes do not take query parameters (e.g. data, ftp). Or it may be easier to pose the question the other way round: Which schemes do take query parts (we know of http, https, and mailto). For the schemes that take query parts, we would like to know whether these parts are restricted to fixed parameters and values or whether they can contain natural-language (and therefore potentially non-ASCII) data (even if that is encoded with %-escaping), and in the later case, whether there are any encoding conventions for that query part (UTF-8, document encoding, ...). Many thanks in advance for your help. Regards, Martin. On 2009/09/10 18:45, Anne van Kesteren wrote: > On Thu, 10 Sep 2009 11:28:14 +0200, Martin J. Dürst > <duerst@...> wrote: >> Many thanks for this example. I hope Anne can do some checks on the >> HTML5 side. > > http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#urls has > the HTML5 rules from when this was still in the HTML5 specification. As > far as I can tell the encoding <query> was done irrespective of the > scheme per that specification. Someone should probably study > implementations to see if this should be changed to just affect > http/https or more. -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
|
|
Re: What schemes take query parts? (was: Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?)XMPP [RFC4622] has a query part.
iquerycomp = iquerytype [ *ipair ] iquerytype = *iunreserved ipair = ";" ikey "=" ivalue ikey = *iunreserved ivalue = *( iunreserved / pct-encoded ) likewise IMAP [RFC5092] If the "?<enc-search>" field is present, the program interpreting the URL should use the contents of this field as arguments following an IMAP4 SEARCH command. These arguments are likely to contain unsafe characters such as " " (space) (which are likely to be present in the <enc-search>). If unsafe characters are present, they MUST be percent-encoded as described in [URI-GEN]. Note that quoted strings and non-synchronizing literals [LITERAL+] are allowed in the <enc-search> content; however, synchronizing literals are not allowed, as their presence would effectively mean that the agent interpreting IMAP URLs needs to parse an <enc-search> content, find all synchronizing literals, and perform proper command continuation request handling (see Sections 4.3 and 7 of [IMAP4]). Tom Petch ----- Original Message ----- From: "Martin J. Dürst" <duerst@...> To: "Anne van Kesteren" <annevk@...> Cc: "Michael A. Puls II" <shadow2531@...>; <public-iri@...>; <uri@...> Sent: Friday, September 11, 2009 10:16 AM Subject: What schemes take query parts? (was: Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?) Dear URI experts, [I have copied the URI mailing list because I hope to get some information from there.] In the context of HTML5-specific treatment of query parts in IRIs/URIs (using the document encoding rather than UTF-8 when converting non-ASCII characters to %-encoding), Michael A. Puls II recently reported that such behavior should not apply to mailto: URIs. Now we are trying to figure out what happens, or what's appropriate, for other kinds of URI schemes. In particular, we also want to know which schemes do not take query parameters (e.g. data, ftp). Or it may be easier to pose the question the other way round: Which schemes do take query parts (we know of http, https, and mailto). For the schemes that take query parts, we would like to know whether these parts are restricted to fixed parameters and values or whether they can contain natural-language (and therefore potentially non-ASCII) data (even if that is encoded with %-escaping), and in the later case, whether there are any encoding conventions for that query part (UTF-8, document encoding, ...). Many thanks in advance for your help. Regards, Martin. On 2009/09/10 18:45, Anne van Kesteren wrote: > On Thu, 10 Sep 2009 11:28:14 +0200, Martin J. Dürst > <duerst@...> wrote: >> Many thanks for this example. I hope Anne can do some checks on the >> HTML5 side. > > http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#urls has > the HTML5 rules from when this was still in the HTML5 specification. As > far as I can tell the encoding <query> was done irrespective of the > scheme per that specification. Someone should probably study > implementations to see if this should be changed to just affect > http/https or more. #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@... |
| Free embeddable forum powered by Nabble | Forum Help |