ajax character encoding issue solved, but WHY?

View: New views
8 Messages — Rating Filter:   Alert me  

Parent Message unknown ajax character encoding issue solved, but WHY?

by seasprocket :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I had a character encoding issue that I finally solved, but I don't understand why the fix works. I'm hoping someone can explain this to me!

The issue was that non-ascii chars were appearing as junk BUT only when retrieved via ajax calls. Otherwise, they displayed fine. The junk display was due to them being interpreted as ISO-8859-1, but I could not figure out why the browser was interpreting that way. All my data is handled as UTF-8.

The problem was fixed by calling utf8::decode on the data prior to sending back via ajax. BUT WHY?

I am using the JSON view to render ajax responses, and it sets the charset header correctly to UTF-8. Of course, even when you decode, perl still represents as "internal" utf8. But why should this be necessary?

Thanks!



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: ajax character encoding issue solved, but WHY?

by Moritz Onken :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Am 19.06.2009 um 06:23 schrieb seasprocket@...:

> I had a character encoding issue that I finally solved, but I don't  
> understand why the fix works. I'm hoping someone can explain this to  
> me!
>
> The issue was that non-ascii chars were appearing as junk BUT only  
> when retrieved via ajax calls. Otherwise, they displayed fine. The  
> junk display was due to them being interpreted as ISO-8859-1, but I  
> could not figure out why the browser was interpreting that way. All  
> my data is handled as UTF-8.
>
> The problem was fixed by calling utf8::decode on the data prior to  
> sending back via ajax. BUT WHY?
>
> I am using the JSON view to render ajax responses, and it sets the  
> charset header correctly to UTF-8. Of course, even when you decode,  
> perl still represents as "internal" utf8. But why should this be  
> necessary?
>
> Thanks!
>

What is the encoding of the web page that issues that ajax request?
Does this occur on different browser as well?
I had similar problems and solved it by making sure that
every page has the utf8 encoding header set.

IMHO using utf8::decode is a hack and should be avoided if possible.

moritz

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: ajax character encoding issue solved, but WHY?

by Phil Mitchell-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 19, 2009 at 12:52 AM, Moritz Onken <onken@...> wrote:

Am 19.06.2009 um 06:23 schrieb seasprocket@...:

What is the encoding of the web page that issues that ajax request?

charset=UTF-8
 

Does this occur on different browser as well?
yes (tested on FF and IE)

I had similar problems and solved it by making sure that
every page has the utf8 encoding header set.

IMHO using utf8::decode is a hack and should be avoided if possible.

I totally agree, but it needs to be fixed!
 


moritz



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: ajax character encoding issue solved, but WHY?

by Francesc Romà i Frigolé-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Fri, Jun 19, 2009 at 6:23 AM, <seasprocket@...> wrote:

The problem was fixed by calling utf8::decode on the data prior to sending back via ajax. BUT WHY?

I am using the JSON view to render ajax responses, and it sets the charset header correctly to UTF-8. Of course, even when you decode, perl still represents as "internal" utf8. But why should this be necessary?


I had exactly the same problem and solution using Catalyst::Controller::REST with the JSON serializer. Still in my list of 'big mysteries to be solved'. 

I hadn't discovered Catalyst::Plugin::Unicode back then, I wonder if using it would help, haven't tried myself yet.

Cheers
Francesc

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: ajax character encoding issue solved, but WHY?

by Aristotle Pagaltzis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* seasprocket@... <seasprocket@...> [2009-06-19 06:30]:
> The issue was that non-ascii chars were appearing as junk BUT
> only when retrieved via ajax calls. Otherwise, they displayed
> fine. The junk display was due to them being interpreted as
> ISO-8859-1, but I could not figure out why the browser was
> interpreting that way. All my data is handled as UTF-8.
>
> The problem was fixed by calling utf8::decode on the data prior
> to sending back via ajax. BUT WHY?

Looks like your code is broken and assumes bytes throughout; as
long as all your data is UTF-8 you won’t notice. Apparently the
JSON serialiser is trying to produce UTF-8 output correctly by
encoding the strings you pass it; since they’re already encoded,
you get double-encoding gremlins.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: Re: ajax character encoding issue solved, but WHY?

by seasprocket :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Jun 20, 2009 at 3:50 AM, Aristotle Pagaltzis <pagaltzis@...> wrote:
* seasprocket@... <seasprocket@...> [2009-06-19 06:30]:>
> The problem was fixed by calling utf8::decode on the data prior
> to sending back via ajax. BUT WHY?

Looks like your code is broken and assumes bytes throughout; as
long as all your data is UTF-8 you won’t notice. Apparently the
JSON serialiser is trying to produce UTF-8 output correctly by
encoding the strings you pass it; since they’re already encoded,
you get double-encoding gremlins.

Thanks for your suggestion, but I'm pretty sure that the data is not getting encoded twice. C::V::JSON tests the data before it encodes ( Encode::is_utf8() ) and only encodes if this test is true. This test only passes if the data is decoded.

I have confirmed this by checking to see if Encode::encode is getting called in C::V::JSON (it's not).

I agree something's broken, I just don't know what it is ... My suspicion is that I don't really understand what's happening inside sqlite -- I assume it's storing as UTF-8, but I don't really know what it's doing.

 


Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: ajax character encoding issue solved, but WHY?

by Aristotle Pagaltzis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* seasprocket@... <seasprocket@...> [2009-06-23 03:00]:
> Thanks for your suggestion, but I'm pretty sure that the data
> is not getting encoded twice. C::V::JSON tests the data before
> it encodes ( Encode::is_utf8() ) and only encodes if this test
> is true. This test only passes if the data is decoded.

Augh! Augh! Why do people keep reading stuff into the UTF8 flag
that it doesn’t mean. (Yeah, I know why, because it’s called the
UTF8 flag when it should’ve been the UOK flag or something.) You
can have decoded data with the UTF8 flag off, and you can have
encoded data with the UTF8 flag on. The UTF8 flag is about the
internals-level format of the byte buffer of a scalar, it has
nothing to do with the meaning of the data on the Perl level.
Testing the flag in pure-Perl code is an almost certain sign of
brokenness.

> My suspicion is that I don't really understand what's happening
> inside sqlite -- I assume it's storing as UTF-8, but I don't
> really know what it's doing.

Try Devel::Peek to examine the strings that come out of it?

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

Re: Re: ajax character encoding issue solved, but WHY?

by seasprocket :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jun 23, 2009 at 1:28 AM, Aristotle Pagaltzis <pagaltzis@...> wrote:
* seasprocket@... <seasprocket@...> [2009-06-23 03:00]:
> Thanks for your suggestion, but I'm pretty sure that the data
> is not getting encoded twice. C::V::JSON tests the data before
> it encodes ( Encode::is_utf8() ) and only encodes if this test
> is true. This test only passes if the data is decoded.

Augh! Augh! Why do people keep reading stuff into the UTF8 flag
that it doesn’t mean. (Yeah, I know why, because it’s called the
UTF8 flag when it should’ve been the UOK flag or something.) You
can have decoded data with the UTF8 flag off, and you can have
encoded data with the UTF8 flag on.

(Sorry to be so slow to reply. I wanted to find time to fully investigate this, but haven't.)

The Encode docs state:

# When you encode, the resulting UTF8 flag is always off.
# When you decode, the resulting UTF8 flag is on unless you can unambiguously represent data [as ASCII].

I was interpreting this to apply to all encoding/decoding -- but I now realize that it may only apply to the Encode package. Which really just leaves me more confused .. :)
 
> My suspicion is that I don't really understand what's happening
> inside sqlite -- I assume it's storing as UTF-8, but I don't
> really know what it's doing.

Try Devel::Peek to examine the strings that come out of it?

I used Devel::StringInfo and found:

[info] string: Madrid Alarcón
is_utf8: 0
octet_length: 15
valid_utf8: 1
decoded_is_same: 0
decoded:
  octet_length: 15
  downgradable: 1
  char_length: 14
  string: Madrid Alarc
  is_utf8: 1
raw = <<Madrid Alarcón>>

I did not draw any brilliant conclusions from this, although I'm curious why the decoded version has the non-ASCII char cut off.

At this point, obviously, I need to find the time to dig in further. Thanks for your thoughts!

 


Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/