
|
Encoding issues on Linux
Hi,
I've run into encoding issues on a SUSE Linux box when entering non-american characters (Swedish ones, such as åäö) in task names. To solve this I had to set the LANG env variable to UTF8 ("sv_SE.utf8") through the nexus start script. Should I really have to do this?
What I'm thinking is that as this is stored in an xml file which is defined as UTF-8 encoded, Nexus should handle this correctly anyways. Thoughts on this?
I've done some testing and looks like a bug. When setting the LANG variable to iso88591 the characters seems to be written correctly in utf8 to the xml file, but then read incorrectly (as iso8859-1, I believe) from that xml file.
If the LANG variable is not set (no effective LANG setting at all), the Swedish/special characters are written as two question marks in the xml file.
/Anders
|

|
Re: Encoding issues on Linux
Hi there,
SUSE defaulted to UTF8 since 9.x if I remember correctly (maybe 10.x?). The JVM will detect environment settings and use proper encoding. In the REST Server <-> Web Browser communication, I think the default should be UTF8 (look at index.html), but this is maybe not true...
It is possible that our UI is creating REST calls that does _not_ specify encoding (even if it carries intl characters), and the server side defaults it to something other than UTF8?
Could you send us some "wire logs"? (using Firebug, or some similar tool, to catch server-UI communication)
Thanks, ~t~ On Wed, Sep 9, 2009 at 1:56 PM, Anders Hammar <anders@...> wrote:
Hi,
I've run into encoding issues on a SUSE Linux box when entering non-american characters (Swedish ones, such as åäö) in task names. To solve this I had to set the LANG env variable to UTF8 ("sv_SE.utf8") through the nexus start script. Should I really have to do this?
What I'm thinking is that as this is stored in an xml file which is defined as UTF-8 encoded, Nexus should handle this correctly anyways. Thoughts on this?
I've done some testing and looks like a bug. When setting the LANG variable to iso88591 the characters seems to be written correctly in utf8 to the xml file, but then read incorrectly (as iso8859-1, I believe) from that xml file.
If the LANG variable is not set (no effective LANG setting at all), the Swedish/special characters are written as two question marks in the xml file.
/Anders
|

|
Re: Encoding issues on Linux
I'll look into this. /Anders 2009/9/9 Tamás Cservenák <tamas@...>
Hi there,
SUSE defaulted to UTF8 since 9.x if I remember correctly (maybe 10.x?). The JVM will detect environment settings and use proper encoding. In the REST Server <-> Web Browser communication, I think the default should be UTF8 (look at index.html), but this is maybe not true...
It is possible that our UI is creating REST calls that does _not_ specify encoding (even if it carries intl characters), and the server side defaults it to something other than UTF8?
Could you send us some "wire logs"? (using Firebug, or some similar tool, to catch server-UI communication)
Thanks, ~t~
On Wed, Sep 9, 2009 at 1:56 PM, Anders Hammar <anders@...> wrote:
Hi,
I've run into encoding issues on a SUSE Linux box when entering non-american characters (Swedish ones, such as åäö) in task names. To solve this I had to set the LANG env variable to UTF8 ("sv_SE.utf8") through the nexus start script. Should I really have to do this?
What I'm thinking is that as this is stored in an xml file which is defined as UTF-8 encoded, Nexus should handle this correctly anyways. Thoughts on this?
I've done some testing and looks like a bug. When setting the LANG variable to iso88591 the characters seems to be written correctly in utf8 to the xml file, but then read incorrectly (as iso8859-1, I believe) from that xml file.
If the LANG variable is not set (no effective LANG setting at all), the Swedish/special characters are written as two question marks in the xml file.
/Anders
|

|
Re: Encoding issues on Linux
Ok, we're having trouble to reproduce this. What I understand though, is that utf 8 is default for users that log on. However, (deamon) services don't get LANG set. In the initial case described, Nexus was started as a service.
I'll be back when I've reproduced the issue and have some communication logs. /Anders 2009/9/9 Tamás Cservenák <tamas@...>
Hi there,
SUSE defaulted to UTF8 since 9.x if I remember correctly (maybe 10.x?). The JVM will detect environment settings and use proper encoding. In the REST Server <-> Web Browser communication, I think the default should be UTF8 (look at index.html), but this is maybe not true...
It is possible that our UI is creating REST calls that does _not_ specify encoding (even if it carries intl characters), and the server side defaults it to something other than UTF8?
Could you send us some "wire logs"? (using Firebug, or some similar tool, to catch server-UI communication)
Thanks, ~t~
On Wed, Sep 9, 2009 at 1:56 PM, Anders Hammar <anders@...> wrote:
Hi,
I've run into encoding issues on a SUSE Linux box when entering non-american characters (Swedish ones, such as åäö) in task names. To solve this I had to set the LANG env variable to UTF8 ("sv_SE.utf8") through the nexus start script. Should I really have to do this?
What I'm thinking is that as this is stored in an xml file which is defined as UTF-8 encoded, Nexus should handle this correctly anyways. Thoughts on this?
I've done some testing and looks like a bug. When setting the LANG variable to iso88591 the characters seems to be written correctly in utf8 to the xml file, but then read incorrectly (as iso8859-1, I believe) from that xml file.
If the LANG variable is not set (no effective LANG setting at all), the Swedish/special characters are written as two question marks in the xml file.
/Anders
|

|
Re: Encoding issues on Linux
Hi, Ok, I was finally able to reproduce this. It turns out that it is related to browser. As I switched to Firefox 3 (from v2) to use firebug it confused me. If Nexus is started as a service and thus has no LANG env variable set:
* It seems as it works in Firefox 3. It's correct in the gui, but I haven't verified that the actual encoding of the int'l chars in nexus.xml is correct with regards to the xml header. * In MS Internet Explorer 6 it does not work. The chars turns in to garbage. This is most likely what I saw in Firefox 2 initially.
Using an HTTP sniffer I see that one difference is that MSIE 6 does not set charset in the Content-Type header (of the request). In ff3 this is set. In both cases there is a charset defined in the response. I'm thinking that maybe I should start a jira issue for this instead of bothering the mailing list.
/Anders 2009/9/9 Tamás Cservenák <tamas@...>
Hi there,
SUSE defaulted to UTF8 since 9.x if I remember correctly (maybe 10.x?). The JVM will detect environment settings and use proper encoding. In the REST Server <-> Web Browser communication, I think the default should be UTF8 (look at index.html), but this is maybe not true...
It is possible that our UI is creating REST calls that does _not_ specify encoding (even if it carries intl characters), and the server side defaults it to something other than UTF8?
Could you send us some "wire logs"? (using Firebug, or some similar tool, to catch server-UI communication)
Thanks, ~t~
On Wed, Sep 9, 2009 at 1:56 PM, Anders Hammar <anders@...> wrote:
Hi,
I've run into encoding issues on a SUSE Linux box when entering non-american characters (Swedish ones, such as åäö) in task names. To solve this I had to set the LANG env variable to UTF8 ("sv_SE.utf8") through the nexus start script. Should I really have to do this?
What I'm thinking is that as this is stored in an xml file which is defined as UTF-8 encoded, Nexus should handle this correctly anyways. Thoughts on this?
I've done some testing and looks like a bug. When setting the LANG variable to iso88591 the characters seems to be written correctly in utf8 to the xml file, but then read incorrectly (as iso8859-1, I believe) from that xml file.
If the LANG variable is not set (no effective LANG setting at all), the Swedish/special characters are written as two question marks in the xml file.
/Anders
|

|
Re: Encoding issues on Linux
https://issues.sonatype.org/browse/NEXUS-2618On Thu, Sep 10, 2009 at 14:53, Anders Hammar <anders@...> wrote:
Hi,
Ok, I was finally able to reproduce this. It turns out that it is related to browser. As I switched to Firefox 3 (from v2) to use firebug it confused me.
If Nexus is started as a service and thus has no LANG env variable set:
* It seems as it works in Firefox 3. It's correct in the gui, but I haven't verified that the actual encoding of the int'l chars in nexus.xml is correct with regards to the xml header. * In MS Internet Explorer 6 it does not work. The chars turns in to garbage. This is most likely what I saw in Firefox 2 initially.
Using an HTTP sniffer I see that one difference is that MSIE 6 does not set charset in the Content-Type header (of the request). In ff3 this is set. In both cases there is a charset defined in the response.
I'm thinking that maybe I should start a jira issue for this instead of bothering the mailing list.
/Anders
Hi there,
SUSE defaulted to UTF8 since 9.x if I remember correctly (maybe 10.x?). The JVM will detect environment settings and use proper encoding. In the REST Server <-> Web Browser communication, I think the default should be UTF8 (look at index.html), but this is maybe not true...
It is possible that our UI is creating REST calls that does _not_ specify encoding (even if it carries intl characters), and the server side defaults it to something other than UTF8?
Could you send us some "wire logs"? (using Firebug, or some similar tool, to catch server-UI communication)
Thanks, ~t~
On Wed, Sep 9, 2009 at 1:56 PM, Anders Hammar <anders@...> wrote:
Hi,
I've run into encoding issues on a SUSE Linux box when entering non-american characters (Swedish ones, such as åäö) in task names. To solve this I had to set the LANG env variable to UTF8 ("sv_SE.utf8") through the nexus start script. Should I really have to do this?
What I'm thinking is that as this is stored in an xml file which is defined as UTF-8 encoded, Nexus should handle this correctly anyways. Thoughts on this?
I've done some testing and looks like a bug. When setting the LANG variable to iso88591 the characters seems to be written correctly in utf8 to the xml file, but then read incorrectly (as iso8859-1, I believe) from that xml file.
If the LANG variable is not set (no effective LANG setting at all), the Swedish/special characters are written as two question marks in the xml file.
/Anders
|