MailingList


UTF-8 characters mangled after updating code on server

View: New views
5 Messages — Rating Filter:   Alert me  

UTF-8 characters mangled after updating code on server

by JensB :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I am using Rails 1.2.1 on MacOS X to develop my application (Locomotive, Jan2007 rmagick bundle) and Rails 1.2.3 on Linux (Debian Sarge) with Ruby 1.8.6 to run it in production. Everything works fine so far, but I have a strange problem.

Whenver I update my code via Capistrano & rsync, I need to restart the Apache server that runs my app via FastCGI, otherwise all accented characters (ie. two-byte UTF-8 chars) in all my translations are mangled.

This is how such a text looks like properly:
(1)   Einverständniserklärung der Eltern

This is how it is encoded in UTF-8:
(2)   Einverst\303\244ndniserkl\303\244rung der Eltern

This is how it looks when I update code and only use 'killall -USR2 ruby1.8' to restart the FastCGI, then refresh the page in Firefox:
(3)   Unterschriebene Einverst\357\277\275ndniserkl\357\277\275rung der Eltern

I created this by copying it out of the browser window and pasting it into a Terminal window on OS X.

The strange thing is that even when Firefox shows (3) i can still see the correct characters in Firefox's HTML source view.
All the time, all strings are stored correctly in the MySQL globalize_translations table. Restarting Apache helps, restarting the FastCGI process doesn't.

Does anybody have an idea how I can track this down further? I'm lost.


Thank you! :)

Jens

Re: UTF-8 characters mangled after updating code on server

by Sven Fuchs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just stabbing around in the dark, but did you check what content-type/
encoding headers are send by Apache in those cases?

        curl -I yoursite.de

should always contain

        Content-Type: text/html; charset=utf-8



Am 22.09.2007 um 11:56 schrieb JensB:

>
> Hello,
>
> I am using Rails 1.2.1 on MacOS X to develop my application  
> (Locomotive,
> Jan2007 rmagick bundle) and Rails 1.2.3 on Linux (Debian Sarge)  
> with Ruby
> 1.8.6 to run it in production. Everything works fine so far, but I  
> have a
> strange problem.
>
> Whenver I update my code via Capistrano & rsync, I need to restart the
> Apache server that runs my app via FastCGI, otherwise all accented
> characters (ie. two-byte UTF-8 chars) in all my translations are  
> mangled.
>
> This is how such a text looks like properly:
> (1)   Einverständniserklärung der Eltern
>
> This is how it is encoded in UTF-8:
> (2)   Einverst\303\244ndniserkl\303\244rung der Eltern
>
> This is how it looks when I update code and only use 'killall -USR2  
> ruby1.8'
> to restart the FastCGI, then refresh the page in Firefox:
> (3)   Unterschriebene Einverst\357\277\275ndniserkl\357\277\275rung  
> der
> Eltern
>
> I created this by copying it out of the browser window and pasting  
> it into a
> Terminal window on OS X.
>
> The strange thing is that even when Firefox shows (3) i can still  
> see the
> correct characters in Firefox's HTML source view.
> All the time, all strings are stored correctly in the MySQL
> globalize_translations table. Restarting Apache helps, restarting the
> FastCGI process doesn't.
>
> Does anybody have an idea how I can track this down further? I'm lost.
>
>
> Thank you! :)
>
> Jens
> --
> View this message in context: http://www.nabble.com/UTF-8- 
> characters-mangled-after-updating-code-on-server-
> tf4500477s17045.html#a12835118
> Sent from the Globalize-rails.org mailing list archive at Nabble.com.
>

--
sven fuchs svenfuchs@...
artweb design http://www.artweb-design.de
grünberger 65 + 49 (0) 30 - 47 98 69 96 (phone)
d-10245 berlin + 49 (0) 171 - 35 20 38 4 (mobile)




Re: UTF-8 characters mangled after updating code on server

by JensB :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Sven Fuchs wrote:
Just stabbing around in the dark, but did you check what content-type/
encoding headers are send by Apache in those cases?

        curl -I yoursite.de

should always contain

        Content-Type: text/html; charset=utf-8

Hello Sven,

I looked into Firefox's "Document information" and it said there it was UTF-8 encoded. It also says the same in a meta tag within the resulting HTML document. Would that be the same?

Thanks,

Jens

Re: UTF-8 characters mangled after updating code on server

by Sven Fuchs :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

No, the meta tag within the HTML doc is not the same like the HTTP  
header sent by the server. A mismatch between these both may lead to  
some confusion which is why I asked. I believe that most browsers  
will ignore the HTML metatag when a HTTP header is present that also  
specifies the encoding.

As for the Firefox Document Info: I'm not really sure where these  
information is gathered from. To me it seems as if Firefox get's this  
from the document's HTML metatags.

You can see what HTTP headers are issued by opening your Terminal  
(you mentioned, you're on Mac OS X?) and do:

curl -I yoursite.de

(that option being an uppercase i)



Am 22.09.2007 um 13:51 schrieb JensB:

>
> Sven Fuchs wrote:
>>
>> Just stabbing around in the dark, but did you check what content-
>> type/
>> encoding headers are send by Apache in those cases?
>>
>> curl -I yoursite.de
>>
>> should always contain
>>
>> Content-Type: text/html; charset=utf-8
>>
>
>
> Hello Sven,
>
> I looked into Firefox's "Document information" and it said there it  
> was
> UTF-8 encoded. It also says the same in a meta tag within the  
> resulting HTML
> document. Would that be the same?
>
> Thanks,
>
> Jens
> --
> View this message in context: http://www.nabble.com/UTF-8- 
> characters-mangled-after-updating-code-on-server-
> tf4500477s17045.html#a12835855
> Sent from the Globalize-rails.org mailing list archive at Nabble.com.
>

--
sven fuchs svenfuchs@...
artweb design http://www.artweb-design.de
grünberger 65 + 49 (0) 30 - 47 98 69 96 (phone)
d-10245 berlin + 49 (0) 171 - 35 20 38 4 (mobile)




Re: UTF-8 characters mangled after updating code on server

by JensB :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I checked the Document Info in Firefox, the HTTP header (by using 'telnet') and the http-equiv headers in the document source, and all point to UTF-8. I restart my Apache nightly and today again it was set to the "wrong" encoding. I'm suspecting *something* is recoding the characters here. See this:

"Bestätigte Ankünfte" should read "Best\303\244tigte Ank\303\274nfte", but when my Rails app displays the characters wrongly, it reads "Best\357\277\275tigte Ank\357\277\275nfte", making three bytes out of every double-byte UTF-8 character.
What's even more strange is that *every* two-byte character is displayed by the *same* (incorrect) sequence of bytes when this happens. So maybe it's not a translation after all, but a real misrepresentation.

Any ideas how to track this down further would be greatly appreciated! :)

Jens


Sven Fuchs wrote:
No, the meta tag within the HTML doc is not the same like the HTTP  
header sent by the server. A mismatch between these both may lead to  
some confusion which is why I asked. I believe that most browsers  
will ignore the HTML metatag when a HTTP header is present that also  
specifies the encoding.

As for the Firefox Document Info: I'm not really sure where these  
information is gathered from. To me it seems as if Firefox get's this  
from the document's HTML metatags.

You can see what HTTP headers are issued by opening your Terminal  
(you mentioned, you're on Mac OS X?) and do:

curl -I yoursite.de

(that option being an uppercase i)



Am 22.09.2007 um 13:51 schrieb JensB:
>
> Sven Fuchs wrote:
>>
>> Just stabbing around in the dark, but did you check what content-
>> type/
>> encoding headers are send by Apache in those cases?
>>
>> curl -I yoursite.de
>>
>> should always contain
>>
>> Content-Type: text/html; charset=utf-8
>>
>
>
> Hello Sven,
>
> I looked into Firefox's "Document information" and it said there it  
> was
> UTF-8 encoded. It also says the same in a meta tag within the  
> resulting HTML
> document. Would that be the same?
>
> Thanks,
>
> Jens
> --
> View this message in context: http://www.nabble.com/UTF-8- 
> characters-mangled-after-updating-code-on-server-
> tf4500477s17045.html#a12835855
> Sent from the Globalize-rails.org mailing list archive at Nabble.com.
>

--
sven fuchs svenfuchs@artweb-design.de
artweb design http://www.artweb-design.de
grünberger 65 + 49 (0) 30 - 47 98 69 96 (phone)
d-10245 berlin + 49 (0) 171 - 35 20 38 4 (mobile)