|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
URL query componentThe URL query component for URLs found in HTML (exact set still be to be
defined I think) uses the page encoding when the page encoding is not utf-8/utf-16 (then it uses utf-8). E.g. "?€" maps to "?%80" in a windows-1252 encoded page. Currently browsers differ for what happens when the code point cannot be encoded. E.g. "?€" Opera uses "?". Internet Explorer uses "?" (but when the URL hits the network layer, not when you inspect it via script). WebKit uses "...;". Gecko encodes it using utf-8. What Gecko does makes the resulting data impossible to interpret. What WebKit does is consistent with form submission. I like it. Also, given that encoding behavior is not exposed besides form submission and URLs, consistently using "...;" for code points not represented in legacy encodings makes sense to me. Am I missing something? -- Anne van Kesteren http://annevankesteren.nl/ |
|
|
Re: URL query componentOn 2012-04-20 09:15, Anne van Kesteren wrote:
> Currently browsers differ for what happens when the code point cannot be encoded. > What Gecko does [?%C2%A3] makes the resulting data impossible to interpret. > What WebKit does [?%26%23163%3B] is consistent with form submission. I like it. I do not! It makes the data impossible to recover just as Gecko does... in fact worse, because at least Gecko preserves ASCII. With the WebKit behaviour it becomes impossible to determine from an pure ASCII string '£' whether the user really typed '€' or '£' into the input field. It has the advantage of consistency with the POST behaviour, but that behaviour is an unpleasant legacy hack which encourages a misunderstanding of HTML-escaping that promotes XSS vulns. I would not like to see it spread any further than it already has. cheers, -- And Clover mailto:and@... http://www.doxdesk.com/ gtalk:chat?jid=bobince@... |
|
|
Re: URL query componentOn Fri, 20 Apr 2012 14:37:10 +0200, And Clover <and-py@...> wrote:
> On 2012-04-20 09:15, Anne van Kesteren wrote: >> Currently browsers differ for what happens when the code point cannot >> be encoded. >> What Gecko does [?%C2%A3] makes the resulting data impossible to >> interpret. >> What WebKit does [?%26%23163%3B] is consistent with form submission. I >> like it. > > I do not! It makes the data impossible to recover just as Gecko does... > in fact worse, because at least Gecko preserves ASCII. With the WebKit > behaviour it becomes impossible to determine from an pure ASCII string > '£' whether the user really typed '€' or '£' into the input > field. You have the same problem with Gecko's behavior and multi-byte encodings. That's actually worse, since an erroneous three byte sequence will put the multi-byte decoders off. > It has the advantage of consistency with the POST behaviour, but that > behaviour is an unpleasant legacy hack which encourages a > misunderstanding of HTML-escaping that promotes XSS vulns. I would not > like to see it spread any further than it already has. It's both GET and POST. So really the only difference here is manually constructed URLs. Also, I think we should flag all non-utf-8 usage. This is mostly about deciding behavior for legacy content, which will already be broken if it runs into this minor edge case. -- Anne van Kesteren http://annevankesteren.nl/ |
|
|
Re: URL query componentOn 2012-04-20 14:37, And Clover wrote:
> On 2012-04-20 09:15, Anne van Kesteren wrote: >> Currently browsers differ for what happens when the code point cannot >> be encoded. >> What Gecko does [?%C2%A3] makes the resulting data impossible to >> interpret. >> What WebKit does [?%26%23163%3B] is consistent with form submission. I >> like it. > > I do not! It makes the data impossible to recover just as Gecko does... > in fact worse, because at least Gecko preserves ASCII. With the WebKit > behaviour it becomes impossible to determine from an pure ASCII string > '£' whether the user really typed '€' or '£' into the input > field. > > It has the advantage of consistency with the POST behaviour, but that > behaviour is an unpleasant legacy hack which encourages a > misunderstanding of HTML-escaping that promotes XSS vulns. I would not > like to see it spread any further than it already has. +1 Indeed. I think this is a case where you want to fail early (for some value of "fail"); so maybe substituting with "?" makes most sense. Do any servers *expect* the Webkit behavior? If they do so, why don't they just fix the pages they serve to use UTF-8 to get consistent behavior throughout? Best regards, Julian |
|
|
Re: URL query componentOn Fri, 20 Apr 2012 15:19:10 +0200, Julian Reschke <julian.reschke@...>
wrote: > I think this is a case where you want to fail early (for some value of > "fail"); so maybe substituting with "?" makes most sense. > > Do any servers *expect* the Webkit behavior? If they do so, why don't > they just fix the pages they serve to use UTF-8 to get consistent > behavior throughout? Given that every browser does something different I doubt anyone expects anything to work here. Note that this is an edge case, form submission, both GET and POST, uses the "...;" pattern whenever an encoder error is emitted. This is solely about URLs query parameters appearing as string value of HTML attributes that take URLs. Given that it is such an edge case, using the same encoder behavior seems nice as it means one code path less. Having said that, if there are other places where we expose the encoder and there something other than "...;" or fatal error is required, that would be very interesting to know. I have not been able to think of anything myself thus far. -- Anne van Kesteren http://annevankesteren.nl/ |
| Free embeddable forum powered by Nabble | Forum Help |