[bug #26577] Changed semantic for unpack breaks UTF-8

View: New views
3 Messages — Rating Filter:   Alert me  

[bug #26577] Changed semantic for unpack breaks UTF-8

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


URL:
  <http://savannah.nongnu.org/bugs/?26577>

                 Summary: Changed semantic for unpack breaks UTF-8
                 Project: MHonArc
            Submitted by: formorer
            Submitted on: Do 14 Mai 2009 12:38:50 GMT
                Category: Mail Parsing
                Severity: 3 - Normal
              Item Group: Incorrect Behavior
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
        Operating System: Linux
            Perl Version: 5.10
       Component Version: 2.6.16
           Fixed Release:

    _______________________________________________________

Details:

Hi,

with perl 5.10 the semantic of unpacks U0 parameter changed from a charbased
to a bytebased version [1]. That means that _utf8_to_sgml from CharEnt.pm
fails for multibyte characters. The fix is pretty simple:

-$char = unpack('U0U*',$1);
+$char = unpack('C0U*',$1);

Of course this took me some time to findout ;). You should around it with
a perl version check, but I guess that should be no Problem.

Thanks

Alex - Debian listmaster

[1]
http://search.cpan.org/dist/perl-5.10.0/pod/perl5100delta.pod#Packing_and_UTF-8_strings





    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?26577>

_______________________________________________
  Nachricht geschickt von/durch Savannah
  http://savannah.nongnu.org/

---------------------------------------------------------------------
To sign-off this list, send email to majordomo@... with the
message text UNSUBSCRIBE MHONARC-DEV


[bug #26577] Changed semantic for unpack breaks UTF-8

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #1, bug #26577 (project mhonarc):

Are you able to test the code change with earlier
versions of Perl?

If the change works for older versions, there will
be no need to conditionalize based upon version
of Perl.

    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?26577>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/

---------------------------------------------------------------------
To sign-off this list, send email to majordomo@... with the
message text UNSUBSCRIBE MHONARC-DEV


[approved] Re: [bug #26577] Changed semantic for unpack breaks UTF-8

by Alexander Wirt-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Earl Hood schrieb am Thursday, den 14. May 2009:

>
> Follow-up Comment #1, bug #26577 (project mhonarc):
>
> Are you able to test the code change with earlier
> versions of Perl?
Sure, tested against perl 5.8, as stated in perldelta using C0 I'll get byte
per byte fur multibytecharacters. In perl 5.8 one char == one byte. So
unfortunatly you need the switch per version.

Alex

---------------------------------------------------------------------
To sign-off this list, send email to majordomo@... with the
message text UNSUBSCRIBE MHONARC-DEV