crash while converting charset

View: New views
5 Messages — Rating Filter:   Alert me  

crash while converting charset

by Manuel Jung :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

My program converts a lot of text from any charset to UTF16 and in the end to
UTF8 for the database. This works most of the time, but sometime a get
crashs. They look like the following backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1467647088 (LWP 9868)]
0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 (
    args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490
490                 *(myOffsets++) = offsetNum++;
Current language:  auto; currently c
(gdb) bt
#0  0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 (
    args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490
#1  0xb78d99b5 in _fromUnicodeWithCallback (pArgs=0xa7ef5370,
    err=0xa8856cec) at ucnv.c:893
#2  0xb78d9f5d in ucnv_fromUnicode_3_6 (cnv=0x83be168,
    target=0xa8856ce4, targetLimit=0x82fa4644 "", source=0xa8856ce8,
    sourceLimit=0x8329b420, offsets=0xa8058000, flush=0 '\0', err=0x72)
    at ucnv.c:1202
#3  0x080637d7 in diver::conv (cnv=@0x83b7750, _source=@0xa8856f20,
    _dest=0x83b74a8, enc=0 '\0') at src/diver.cpp:252
#4  0x0809c0a1 in whale::Call (this=0x83b749c) at src/whale.cpp:413
#5  0x08083acc in octopus::slave::Call (this=0x83b7488)
    at src/octopus.cpp:239
#6  0x0805e853 in slot::Start (this=0x83b76ec) at src/slots.cpp:38
#7  0x08062422 in boost::_mfi::mf0<void, slot>::operator() (
    this=0x83be8f8, p=0x83b76ec)
    at /usr/local/include/boost/bind/mem_fn_template.hpp:45
#8  0x08062af3 in boost::_bi::list1<boost::_bi::value<slot*>
>::operator                                              ---Type <return> to
continue, or q <return> to quit---
()<boost::_mfi::mf0<void, slot>, boost::_bi::list0> (this=0x83be900,
    f=@0x83be8f8, a=@0xa8857322)
    at /usr/local/include/boost/bind.hpp:229
#9  0x08062b47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, slot>,
boost::_bi::list1<boost::_bi::value<slot*> > >::operator() (
    this=0x83be8f8)
    at /usr/local/include/boost/bind/bind_template.hpp:20
#10 0x08062b7a in
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void,
boost::_mfi::mf0<void, slot>, boost::_bi::list1<boost::_bi::value<slot*> > >,
void>::invoke (function_obj_ptr=
      {obj_ptr = 0x83be8f8, const_obj_ptr = 0x83be8f8, func_ptr = 0x83be8f8,
data = "�"})
    at /usr/local/include/boost/function/function_template.hpp:136
#11 0xb7b9857b in ?? ()
   from /usr/local/lib/libboost_thread-gcc-mt-1_33_1.so.1.33.1
#12 0xb7aff183 in start_thread () from /lib/libpthread.so.0
#13 0xb7a869de in clone () from /lib/libc.so.6


Is this maybe a ICU bug? Some ICU hackers who have a idea, what is happening
here?
Thank you,
kind regards
Manuel Jung

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: crash while converting charset

by Steven R. Loomis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Is your offsets buffer as large as your target buffer?

It should be     (targetLimit-target)*sizeof(offsets[0])   in size.  
It is an int32_t array, so 4 bytes per.

-s

On 07 Mej 2007, at 01:49, Manuel Jung wrote:

> Hi,
>
> My program converts a lot of text from any charset to UTF16 and in  
> the end to
> UTF8 for the database. This works most of the time, but sometime a get
> crashs. They look like the following backtrace:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1467647088 (LWP 9868)]
> 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 (
>     args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490
> 490                 *(myOffsets++) = offsetNum++;
> Current language:  auto; currently c
> (gdb) bt
> #0  0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 (
>     args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490
> #1  0xb78d99b5 in _fromUnicodeWithCallback (pArgs=0xa7ef5370,
>     err=0xa8856cec) at ucnv.c:893
> #2  0xb78d9f5d in ucnv_fromUnicode_3_6 (cnv=0x83be168,
>     target=0xa8856ce4, targetLimit=0x82fa4644 "", source=0xa8856ce8,
>     sourceLimit=0x8329b420, offsets=0xa8058000, flush=0 '\0',  
> err=0x72)
>     at ucnv.c:1202
> #3  0x080637d7 in diver::conv (cnv=@0x83b7750, _source=@0xa8856f20,
>     _dest=0x83b74a8, enc=0 '\0') at src/diver.cpp:252
> #4  0x0809c0a1 in whale::Call (this=0x83b749c) at src/whale.cpp:413
> #5  0x08083acc in octopus::slave::Call (this=0x83b7488)
>     at src/octopus.cpp:239
> #6  0x0805e853 in slot::Start (this=0x83b76ec) at src/slots.cpp:38
> #7  0x08062422 in boost::_mfi::mf0<void, slot>::operator() (
>     this=0x83be8f8, p=0x83b76ec)
>     at /usr/local/include/boost/bind/mem_fn_template.hpp:45
> #8  0x08062af3 in boost::_bi::list1<boost::_bi::value<slot*>
>> ::operator                                              ---Type  
>> <return> to
> continue, or q <return> to quit---
> ()<boost::_mfi::mf0<void, slot>, boost::_bi::list0> (this=0x83be900,
>     f=@0x83be8f8, a=@0xa8857322)
>     at /usr/local/include/boost/bind.hpp:229
> #9  0x08062b47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void,  
> slot>,
> boost::_bi::list1<boost::_bi::value<slot*> > >::operator() (
>     this=0x83be8f8)
>     at /usr/local/include/boost/bind/bind_template.hpp:20
> #10 0x08062b7a in
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t
> <void,
> boost::_mfi::mf0<void, slot>,  
> boost::_bi::list1<boost::_bi::value<slot*> > >,
> void>::invoke (function_obj_ptr=
>       {obj_ptr = 0x83be8f8, const_obj_ptr = 0x83be8f8, func_ptr =  
> 0x83be8f8,
> data = "�"})
>     at /usr/local/include/boost/function/function_template.hpp:136
> #11 0xb7b9857b in ?? ()
>    from /usr/local/lib/libboost_thread-gcc-mt-1_33_1.so.1.33.1
> #12 0xb7aff183 in start_thread () from /lib/libpthread.so.0
> #13 0xb7a869de in clone () from /lib/libc.so.6
>
>
> Is this maybe a ICU bug? Some ICU hackers who have a idea, what is  
> happening
> here?
> Thank you,
> kind regards
> Manuel Jung

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: crash while converting charset

by Manuel Jung :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Is your offsets buffer as large as your target buffer?
>
> It should be     (targetLimit-target)*sizeof(offsets[0])   in size.
> It is an int32_t array, so 4 bytes per.
Hi
My target buffer is defined by:

<snip>
        int dest_buffer_len =
                        UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(),
                                                                                  ucnv_getMaxCharSize(cnv[enc].data));
<snip/>

The cnv[enc].data is just a ucnv object. Im holding it in a per thread basis,
so that i can reuse it.

<snip>
        const UChar *source = _source.getBuffer();
        const UChar *source_limit = source + _source.length();

        *_dest = new char[dest_buffer_len+1];
<snip/>

The extra "1" is for NULL-Termination.

<snip>
        char *target = *_dest;
        char *target_limit = target + dest_buffer_len;

        int offsets[dest_buffer_len];
<snip/>

Here you can see, the offset buffer is defined as large as the target buffer.
the ucnv does not need to know about the extra Byte for Termination.

<snip>
        ucnv_fromUnicode(cnv[enc].data,
                                         &target, target_limit,
                                         &source, source_limit,
                                         offsets,
                                         FALSE,
                                         &status);
<snip/>

This just to keep clear, what the variables are good for.
Afer this the status is checked, if a "U_BUFFER_OVERFLOW_ERROR" happened.

Sometimes i get a error similar to this: "encoding error : input conversion
failed due to input error, bytes 0x8D 0x38 0x03 0xD0", but different bytes.
Does this help? The error is written to stdout, even if i pipe the stdout
with ">" to a file!
Greetings
Manuel Jung


> -s
>
> On 07 Mej 2007, at 01:49, Manuel Jung wrote:
> > Hi,
> >
> > My program converts a lot of text from any charset to UTF16 and in
> > the end to
> > UTF8 for the database. This works most of the time, but sometime a get
> > crashs. They look like the following backtrace:
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread -1467647088 (LWP 9868)]
> > 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 (
> >     args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490
> > 490                 *(myOffsets++) = offsetNum++;
> > Current language:  auto; currently c
> > (gdb) bt
> > #0  0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 (
> >     args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490
> > #1  0xb78d99b5 in _fromUnicodeWithCallback (pArgs=0xa7ef5370,
> >     err=0xa8856cec) at ucnv.c:893
> > #2  0xb78d9f5d in ucnv_fromUnicode_3_6 (cnv=0x83be168,
> >     target=0xa8856ce4, targetLimit=0x82fa4644 "", source=0xa8856ce8,
> >     sourceLimit=0x8329b420, offsets=0xa8058000, flush=0 '\0',
> > err=0x72)
> >     at ucnv.c:1202
> > #3  0x080637d7 in diver::conv (cnv=@0x83b7750, _source=@0xa8856f20,
> >     _dest=0x83b74a8, enc=0 '\0') at src/diver.cpp:252
> > #4  0x0809c0a1 in whale::Call (this=0x83b749c) at src/whale.cpp:413
> > #5  0x08083acc in octopus::slave::Call (this=0x83b7488)
> >     at src/octopus.cpp:239
> > #6  0x0805e853 in slot::Start (this=0x83b76ec) at src/slots.cpp:38
> > #7  0x08062422 in boost::_mfi::mf0<void, slot>::operator() (
> >     this=0x83be8f8, p=0x83b76ec)
> >     at /usr/local/include/boost/bind/mem_fn_template.hpp:45
> > #8  0x08062af3 in boost::_bi::list1<boost::_bi::value<slot*>
> >
> >> ::operator                                              ---Type
> >>
> >> <return> to
> >
> > continue, or q <return> to quit---
> > ()<boost::_mfi::mf0<void, slot>, boost::_bi::list0> (this=0x83be900,
> >     f=@0x83be8f8, a=@0xa8857322)
> >     at /usr/local/include/boost/bind.hpp:229
> > #9  0x08062b47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void,
> > slot>,
> > boost::_bi::list1<boost::_bi::value<slot*> > >::operator() (
> >     this=0x83be8f8)
> >     at /usr/local/include/boost/bind/bind_template.hpp:20
> > #10 0x08062b7a in
> > boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t
> > <void,
> > boost::_mfi::mf0<void, slot>,
> > boost::_bi::list1<boost::_bi::value<slot*> > >,
> > void>::invoke (function_obj_ptr=
> >       {obj_ptr = 0x83be8f8, const_obj_ptr = 0x83be8f8, func_ptr =
> > 0x83be8f8,
> > data = "�"})
> >     at /usr/local/include/boost/function/function_template.hpp:136
> > #11 0xb7b9857b in ?? ()
> >    from /usr/local/lib/libboost_thread-gcc-mt-1_33_1.so.1.33.1
> > #12 0xb7aff183 in start_thread () from /lib/libpthread.so.0
> > #13 0xb7a869de in clone () from /lib/libc.so.6
> >
> >
> > Is this maybe a ICU bug? Some ICU hackers who have a idea, what is
> > happening
> > here?
> > Thank you,
> > kind regards
> > Manuel Jung
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> icu-support mailing list - icu-support@...
> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: crash while converting charset

by George Rhoten :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> <snip>
>    int dest_buffer_len =
>          UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(),
>                                 ucnv_getMaxCharSize(cnv[enc].data));
> <snip/>
> <snip>
>    char *target = *_dest;
>    char *target_limit = target + dest_buffer_len;
>
>    int offsets[dest_buffer_len];
> <snip/>

This is invalid syntax. Declaring the offsets array with a non-const size
won't work. You're probably glossing over something important. I suspect
you're giving ICU an offsets array with the wrong size, as Steven
suspects.

If you don't have a need for the offsets, just pass in NULL for the
offsets array. Most people don't use it. It's only helpful if you're
trying to correlate the source and target buffers with each other. The
charset conversion code also works faster without an offsets array.

Also the _source.getBuffer() seems like an ICU call. Is _source a
UnicodeString? If so, then you may want to consider using this
UnicodeString function instead:

  int32_t extract(char *dest, int32_t destCapacity,
                  UConverter *cnv,
                  UErrorCode &errorCode) const;

It removes some of the complication of using the charset API directly.
There are other UnicodeString extract functions you could use, if you
don't plan on caching a UConverter or setting conversion options on it.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: crash while converting charset

by Manuel Jung :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Montag, 7. Mai 2007 22:16 schrieb George Rhoten:

> > <snip>
> >    int dest_buffer_len =
> >          UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(),
> >                                 ucnv_getMaxCharSize(cnv[enc].data));
> > <snip/>
> > <snip>
> >    char *target = *_dest;
> >    char *target_limit = target + dest_buffer_len;
> >
> >    int offsets[dest_buffer_len];
> > <snip/>
>
> This is invalid syntax. Declaring the offsets array with a non-const size
> won't work. You're probably glossing over something important. I suspect
> you're giving ICU an offsets array with the wrong size, as Steven
> suspects.
>
> If you don't have a need for the offsets, just pass in NULL for the
> offsets array. Most people don't use it. It's only helpful if you're
> trying to correlate the source and target buffers with each other. The
> charset conversion code also works faster without an offsets array.

Hm ok, i copied this mostly from a topic in this list. I cannot find the post
again, maybe it was still buggy or is outdated. I have to say i still not
understand every detail in the conversion routine.

> Also the _source.getBuffer() seems like an ICU call. Is _source a
> UnicodeString? If so, then you may want to consider using this
> UnicodeString function instead:
>
>   int32_t extract(char *dest, int32_t destCapacity,
>                   UConverter *cnv,
>                   UErrorCode &errorCode) const;
>
> It removes some of the complication of using the charset API directly.
> There are other UnicodeString extract functions you could use, if you
> don't plan on caching a UConverter or setting conversion options on it.

Yeah, thanks, i just stepped over these functions a few hours before you
posted. The "source_" is really a UnicodeString. Im using now a extract based
function and no crashes after 500 min eaten CPU time. Looks fine! :-)

To be complete for all others out there. My function now looks like this:

<snip>
int32_t conv(const TCnv& cnv,
                const UnicodeString & _source,
            char **_dest,
                   unsigned char enc)
{
UErrorCode status = U_ZERO_ERROR;
        int dest_buffer_len =
                        UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(),
                                                                                  ucnv_getMaxCharSize(cnv[enc].data));
    *_dest = new char[dest_buffer_len];
    dest_buffer_len=_source.extract(*_dest, dest_buffer_len, cnv[enc].data,
status);
    if(status == U_BUFFER_OVERFLOW_ERROR){
            cout << "OVERFLOW" << endl;
        status=U_ZERO_ERROR;
        delete[](*_dest);
        *_dest = new char[dest_buffer_len];
        dest_buffer_len=_source.extract(*_dest, dest_buffer_len,
cnv[enc].data, status);
return dest_buffer_len;
    }

}
<snip/>

So thanks for your help!
Kind Regards
Manuel Jung


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support