Character Encoding and Jetty 7

View: New views
7 Messages — Rating Filter:   Alert me  

Character Encoding and Jetty 7

by janb :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Greg (lists in cc),

Jetty seems to have changed the assumption of the default character set for
form data to be ISO-8859-1, whereas previously jetty assumed UTF-8.

What was the reason for the change? A few people are reporting issues with
jetty-7 that are all due to the change in char-encoding defaults.

The HTML5 draft spec makes it clear that for url encoded form data the
default should be UTF-8: http://www.w3.org/TR/html5/forms.html#url-encoded-form-data

Note that we have not changed our default for char encoding ofurls in requests,
and still use UTF-8 as per http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars

Do you think we should revert to UTF-8 for form data for jetty-7.0.0 final?

cheers
Jan
--
Jan Bartel, Webtide LLC | janb@... | http://www.webtide.com

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Character Encoding and Jetty 7

by Greg Wilkins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Jan

I don't think this was done on purpose.
Do you know where the change was made or what is needed to change it
back to UTF-8?

regards



Jan Bartel wrote:

> Greg (lists in cc),
>
> Jetty seems to have changed the assumption of the default character set for
> form data to be ISO-8859-1, whereas previously jetty assumed UTF-8.
>
> What was the reason for the change? A few people are reporting issues with
> jetty-7 that are all due to the change in char-encoding defaults.
>
> The HTML5 draft spec makes it clear that for url encoded form data the
> default should be UTF-8:
> http://www.w3.org/TR/html5/forms.html#url-encoded-form-data
>
> Note that we have not changed our default for char encoding ofurls in
> requests, and still use UTF-8 as per
> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
>
> Do you think we should revert to UTF-8 for form data for jetty-7.0.0 final?
>
> cheers
> Jan


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Character Encoding and Jetty 7

by janb :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Digging a little deeper, in fact it seems we made this change waaay back for
jetty-6.1.12.

Here are the 2 jira issues related to it:

http://jira.codehaus.org/browse/JETTY-633
http://jira.codehaus.org/browse/JETTY-853

I'm not sure why we decided to change to ISO-8859-1 in JETTY-633.
No clues in the commit comments as to the reason.

I seem to have commented in the later issue JETTY-853 that in
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 it
specifies default encoding of ISO-5589-1. But looking at that section now,
it only stipulates ISO-8859-1 for Content-Type of "text", so not sure
why I thought that was relevant.

So in short, I don't know why we changed from UTF8 to ISO-5589-1 in
the first place, but it seems to have been changed at least since
jetty-6.1.12, so the change has some history to it.

The wiki page at http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
seems to be documenting our earlier UTF-8 behaviour.

Do you want to keep it at ISO-8859-1 and I'll update the wiki page,
or do you want to change (back) to UTF-8 ?

cheers
Jan



Greg Wilkins wrote:

> Jan
>
> I don't think this was done on purpose.
> Do you know where the change was made or what is needed to change it
> back to UTF-8?
>
> regards
>
>
>
> Jan Bartel wrote:
>> Greg (lists in cc),
>>
>> Jetty seems to have changed the assumption of the default character set for
>> form data to be ISO-8859-1, whereas previously jetty assumed UTF-8.
>>
>> What was the reason for the change? A few people are reporting issues with
>> jetty-7 that are all due to the change in char-encoding defaults.
>>
>> The HTML5 draft spec makes it clear that for url encoded form data the
>> default should be UTF-8:
>> http://www.w3.org/TR/html5/forms.html#url-encoded-form-data
>>
>> Note that we have not changed our default for char encoding ofurls in
>> requests, and still use UTF-8 as per
>> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
>>
>> Do you think we should revert to UTF-8 for form data for jetty-7.0.0 final?
>>
>> cheers
>> Jan
>

--
Jan Bartel, Webtide LLC | janb@... | http://www.webtide.com

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Character Encoding and Jetty 7

by Greg Wilkins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jan,

I think we should change it back to UTF-8, as that will work with
almost all 8859 content as well.

cheers



Jan Bartel wrote:

> Digging a little deeper, in fact it seems we made this change waaay back
> for jetty-6.1.12.
>
> Here are the 2 jira issues related to it:
>
> http://jira.codehaus.org/browse/JETTY-633
> http://jira.codehaus.org/browse/JETTY-853
>
> I'm not sure why we decided to change to ISO-8859-1 in JETTY-633. No
> clues in the commit comments as to the reason.
>
> I seem to have commented in the later issue JETTY-853 that in
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 it
> specifies default encoding of ISO-5589-1. But looking at that section
> now, it only stipulates ISO-8859-1 for Content-Type of "text", so not sure
> why I thought that was relevant.
>
> So in short, I don't know why we changed from UTF8 to ISO-5589-1 in the
> first place, but it seems to have been changed at least since
> jetty-6.1.12, so the change has some history to it.
>
> The wiki page at
> http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
>
> seems to be documenting our earlier UTF-8 behaviour.
>
> Do you want to keep it at ISO-8859-1 and I'll update the wiki page, or
> do you want to change (back) to UTF-8 ?
>
> cheers
> Jan
>
>
>
> Greg Wilkins wrote:
>> Jan
>>
>> I don't think this was done on purpose.
>> Do you know where the change was made or what is needed to change it
>> back to UTF-8?
>>
>> regards
>>
>>
>>
>> Jan Bartel wrote:
>>> Greg (lists in cc),
>>>
>>> Jetty seems to have changed the assumption of the default character
>>> set for
>>> form data to be ISO-8859-1, whereas previously jetty assumed UTF-8.
>>>
>>> What was the reason for the change? A few people are reporting issues
>>> with
>>> jetty-7 that are all due to the change in char-encoding defaults.
>>>
>>> The HTML5 draft spec makes it clear that for url encoded form data the
>>> default should be UTF-8:
>>> http://www.w3.org/TR/html5/forms.html#url-encoded-form-data
>>>
>>> Note that we have not changed our default for char encoding ofurls in
>>> requests, and still use UTF-8 as per
>>> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
>>>
>>> Do you think we should revert to UTF-8 for form data for jetty-7.0.0
>>> final?
>>>
>>> cheers
>>> Jan
>>
>


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Re: Character Encoding and Jetty 7

by jmcconnell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Why did this change in the first place?  I seem to remember around
that time some discussion about the spec stating it ought to be 8859
so it was switched

I prefer utf-8....just curious now..

jesse

--
jesse mcconnell
jesse.mcconnell@...



On Mon, Sep 7, 2009 at 22:46, Greg Wilkins<gregw@...> wrote:

> Jan,
>
> I think we should change it back to UTF-8, as that will work with
> almost all 8859 content as well.
>
> cheers
>
>
>
> Jan Bartel wrote:
>> Digging a little deeper, in fact it seems we made this change waaay back
>> for jetty-6.1.12.
>>
>> Here are the 2 jira issues related to it:
>>
>> http://jira.codehaus.org/browse/JETTY-633
>> http://jira.codehaus.org/browse/JETTY-853
>>
>> I'm not sure why we decided to change to ISO-8859-1 in JETTY-633. No
>> clues in the commit comments as to the reason.
>>
>> I seem to have commented in the later issue JETTY-853 that in
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 it
>> specifies default encoding of ISO-5589-1. But looking at that section
>> now, it only stipulates ISO-8859-1 for Content-Type of "text", so not sure
>> why I thought that was relevant.
>>
>> So in short, I don't know why we changed from UTF8 to ISO-5589-1 in the
>> first place, but it seems to have been changed at least since
>> jetty-6.1.12, so the change has some history to it.
>>
>> The wiki page at
>> http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
>>
>> seems to be documenting our earlier UTF-8 behaviour.
>>
>> Do you want to keep it at ISO-8859-1 and I'll update the wiki page, or
>> do you want to change (back) to UTF-8 ?
>>
>> cheers
>> Jan
>>
>>
>>
>> Greg Wilkins wrote:
>>> Jan
>>>
>>> I don't think this was done on purpose.
>>> Do you know where the change was made or what is needed to change it
>>> back to UTF-8?
>>>
>>> regards
>>>
>>>
>>>
>>> Jan Bartel wrote:
>>>> Greg (lists in cc),
>>>>
>>>> Jetty seems to have changed the assumption of the default character
>>>> set for
>>>> form data to be ISO-8859-1, whereas previously jetty assumed UTF-8.
>>>>
>>>> What was the reason for the change? A few people are reporting issues
>>>> with
>>>> jetty-7 that are all due to the change in char-encoding defaults.
>>>>
>>>> The HTML5 draft spec makes it clear that for url encoded form data the
>>>> default should be UTF-8:
>>>> http://www.w3.org/TR/html5/forms.html#url-encoded-form-data
>>>>
>>>> Note that we have not changed our default for char encoding ofurls in
>>>> requests, and still use UTF-8 as per
>>>> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
>>>>
>>>> Do you think we should revert to UTF-8 for form data for jetty-7.0.0
>>>> final?
>>>>
>>>> cheers
>>>> Jan
>>>
>>
>
> _______________________________________________
> jetty-dev mailing list
> jetty-dev@...
> https://dev.eclipse.org/mailman/listinfo/jetty-dev
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Re: Re: Character Encoding and Jetty 7

by dyu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For compatibility with the servlet-spec (as users/devs expect it to be), might wanna stick with iso 8859
Hopefully the default encoding for servlet 3.0 be changed to UTF-8

On Tue, Sep 8, 2009 at 10:57 PM, Jesse McConnell <jesse.mcconnell@...> wrote:
Why did this change in the first place?  I seem to remember around
that time some discussion about the spec stating it ought to be 8859
so it was switched

I prefer utf-8....just curious now..

jesse

--
jesse mcconnell
jesse.mcconnell@...



On Mon, Sep 7, 2009 at 22:46, Greg Wilkins<gregw@...> wrote:
> Jan,
>
> I think we should change it back to UTF-8, as that will work with
> almost all 8859 content as well.
>
> cheers
>
>
>
> Jan Bartel wrote:
>> Digging a little deeper, in fact it seems we made this change waaay back
>> for jetty-6.1.12.
>>
>> Here are the 2 jira issues related to it:
>>
>> http://jira.codehaus.org/browse/JETTY-633
>> http://jira.codehaus.org/browse/JETTY-853
>>
>> I'm not sure why we decided to change to ISO-8859-1 in JETTY-633. No
>> clues in the commit comments as to the reason.
>>
>> I seem to have commented in the later issue JETTY-853 that in
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 it
>> specifies default encoding of ISO-5589-1. But looking at that section
>> now, it only stipulates ISO-8859-1 for Content-Type of "text", so not sure
>> why I thought that was relevant.
>>
>> So in short, I don't know why we changed from UTF8 to ISO-5589-1 in the
>> first place, but it seems to have been changed at least since
>> jetty-6.1.12, so the change has some history to it.
>>
>> The wiki page at
>> http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
>>
>> seems to be documenting our earlier UTF-8 behaviour.
>>
>> Do you want to keep it at ISO-8859-1 and I'll update the wiki page, or
>> do you want to change (back) to UTF-8 ?
>>
>> cheers
>> Jan
>>
>>
>>
>> Greg Wilkins wrote:
>>> Jan
>>>
>>> I don't think this was done on purpose.
>>> Do you know where the change was made or what is needed to change it
>>> back to UTF-8?
>>>
>>> regards
>>>
>>>
>>>
>>> Jan Bartel wrote:
>>>> Greg (lists in cc),
>>>>
>>>> Jetty seems to have changed the assumption of the default character
>>>> set for
>>>> form data to be ISO-8859-1, whereas previously jetty assumed UTF-8.
>>>>
>>>> What was the reason for the change? A few people are reporting issues
>>>> with
>>>> jetty-7 that are all due to the change in char-encoding defaults.
>>>>
>>>> The HTML5 draft spec makes it clear that for url encoded form data the
>>>> default should be UTF-8:
>>>> http://www.w3.org/TR/html5/forms.html#url-encoded-form-data
>>>>
>>>> Note that we have not changed our default for char encoding ofurls in
>>>> requests, and still use UTF-8 as per
>>>> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
>>>>
>>>> Do you think we should revert to UTF-8 for form data for jetty-7.0.0
>>>> final?
>>>>
>>>> cheers
>>>> Jan
>>>
>>
>
> _______________________________________________
> jetty-dev mailing list
> jetty-dev@...
> https://dev.eclipse.org/mailman/listinfo/jetty-dev
>

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email




Re: Re: Character Encoding and Jetty 7

by Gregw :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jesse McConnell wrote:
> Why did this change in the first place?  I seem to remember around
> that time some discussion about the spec stating it ought to be 8859
> so it was switched
>
> I prefer utf-8....just curious now..


We used to not even have special 8859 handling and just handle it as utf8.
We added the 8859 handling and made that the default.

So now we have handling for both, but utf8 is used if not charset it
provided.

cheers

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email