On Fri, 20 Apr 2012 14:37:10 +0200, And Clover <and-py@...> wrote:
> On 2012-04-20 09:15, Anne van Kesteren wrote:
>> Currently browsers differ for what happens when the code point cannot
>> be encoded.
>> What Gecko does [?%C2%A3] makes the resulting data impossible to
>> What WebKit does [?%26%23163%3B] is consistent with form submission. I
>> like it.
> I do not! It makes the data impossible to recover just as Gecko does...
> in fact worse, because at least Gecko preserves ASCII. With the WebKit
> behaviour it becomes impossible to determine from an pure ASCII string
> '£' whether the user really typed '€' or '£' into the input
You have the same problem with Gecko's behavior and multi-byte encodings.
That's actually worse, since an erroneous three byte sequence will put the
multi-byte decoders off.
> It has the advantage of consistency with the POST behaviour, but that
> behaviour is an unpleasant legacy hack which encourages a
> misunderstanding of HTML-escaping that promotes XSS vulns. I would not
> like to see it spread any further than it already has.
It's both GET and POST. So really the only difference here is manually
Also, I think we should flag all non-utf-8 usage. This is mostly about
deciding behavior for legacy content, which will already be broken if it
runs into this minor edge case.