« Return to Thread: ajax character encoding issue solved, but WHY?

Re: Re: ajax character encoding issue solved, but WHY?

by seasprocket :: Rate this Message:

Reply to Author | View in Thread

On Tue, Jun 23, 2009 at 1:28 AM, Aristotle Pagaltzis <pagaltzis@...> wrote:
* seasprocket@... <seasprocket@...> [2009-06-23 03:00]:
> Thanks for your suggestion, but I'm pretty sure that the data
> is not getting encoded twice. C::V::JSON tests the data before
> it encodes ( Encode::is_utf8() ) and only encodes if this test
> is true. This test only passes if the data is decoded.

Augh! Augh! Why do people keep reading stuff into the UTF8 flag
that it doesn’t mean. (Yeah, I know why, because it’s called the
UTF8 flag when it should’ve been the UOK flag or something.) You
can have decoded data with the UTF8 flag off, and you can have
encoded data with the UTF8 flag on.

(Sorry to be so slow to reply. I wanted to find time to fully investigate this, but haven't.)

The Encode docs state:

# When you encode, the resulting UTF8 flag is always off.
# When you decode, the resulting UTF8 flag is on unless you can unambiguously represent data [as ASCII].

I was interpreting this to apply to all encoding/decoding -- but I now realize that it may only apply to the Encode package. Which really just leaves me more confused .. :)
 
> My suspicion is that I don't really understand what's happening
> inside sqlite -- I assume it's storing as UTF-8, but I don't
> really know what it's doing.

Try Devel::Peek to examine the strings that come out of it?

I used Devel::StringInfo and found:

[info] string: Madrid Alarcón
is_utf8: 0
octet_length: 15
valid_utf8: 1
decoded_is_same: 0
decoded:
  octet_length: 15
  downgradable: 1
  char_length: 14
  string: Madrid Alarc
  is_utf8: 1
raw = <<Madrid Alarcón>>

I did not draw any brilliant conclusions from this, although I'm curious why the decoded version has the non-ASCII char cut off.

At this point, obviously, I need to find the time to dig in further. Thanks for your thoughts!

 


Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/



--
==========================
http://www.bikewise.org

2People citizen's network for climate action: http://www.2people.org

Greater Seattle Climate Dialogues: http://www.climatedialogues.org
==========================

_______________________________________________
List: Catalyst@...
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@.../
Dev site: http://dev.catalyst.perl.org/

 « Return to Thread: ajax character encoding issue solved, but WHY?