[XHR] Sending a Document with a mismatched encoding

View: New views
3 Messages — Rating Filter:   Alert me  

[XHR] Sending a Document with a mismatched encoding

by Cameron McCormack-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


It seems to be possible to send a Document using XHR where the encoding
specified by the Content-Type charset parameter differs from the actual
encoding used to encode the serialisation.  For example by:

  var r = new XMLHttpRequest();
  r.open("POST", "somewhere");
  r.setRequestHeader("Content-Type", "application/xml;charset=US-ASCII");
  var doc = document.implementation.createDocument(null, "á", null);
  r.send(doc);

Since passing a String to send() will cause the charset to be fixed up
to match the actual encoding used (UTF-8, in that case), shouldn’t
passing a Document to send() do the same?

--
Cameron McCormack ≝ http://mcc.id.au/


Re: [XHR] Sending a Document with a mismatched encoding

by Cameron McCormack-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Cameron McCormack:
> Since passing a String to send() will cause the charset to be fixed up
> to match the actual encoding used (UTF-8, in that case), shouldn’t
> passing a Document to send() do the same?

For that matter, how about defaulting to sending

  Content-Type: text/plain; charset=UTF-8

if a String is passed to send() without a Content-Type having been given
by setRequestHeader()?

--
Cameron McCormack ≝ http://mcc.id.au/


Re: [XHR] Sending a Document with a mismatched encoding

by Boris Zbarsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Cameron McCormack wrote:
> It seems to be possible to send a Document using XHR where the encoding
> specified by the Content-Type charset parameter differs from the actual
> encoding used to encode the serialisation.

For what it's worth, this got changed from Gecko 1.8 to Gecko 1.9.
Firefox 3 will tweak the charset parameter to correspond to the
inputEncoding of the Document being serialized (which is the encoding
used for the serialization).

We discovered that this will in fact cause some issues, since some
servers treat the data as UTF-8 no matter what headers we send, and the
default inputEncoding in Gecko of documents created via createDocument
is ISO-8859-1 at the moment.  We will likely change that to UTF-8...

Oh, and if there is no charset parameter to start with, Gecko will add
one for both Document and String arguments, just like you suggest in
your followup mail.

-Boris