|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
crash while converting charsetHi,
My program converts a lot of text from any charset to UTF16 and in the end to UTF8 for the database. This works most of the time, but sometime a get crashs. They look like the following backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1467647088 (LWP 9868)] 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 ( args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490 490 *(myOffsets++) = offsetNum++; Current language: auto; currently c (gdb) bt #0 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 ( args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490 #1 0xb78d99b5 in _fromUnicodeWithCallback (pArgs=0xa7ef5370, err=0xa8856cec) at ucnv.c:893 #2 0xb78d9f5d in ucnv_fromUnicode_3_6 (cnv=0x83be168, target=0xa8856ce4, targetLimit=0x82fa4644 "", source=0xa8856ce8, sourceLimit=0x8329b420, offsets=0xa8058000, flush=0 '\0', err=0x72) at ucnv.c:1202 #3 0x080637d7 in diver::conv (cnv=@0x83b7750, _source=@0xa8856f20, _dest=0x83b74a8, enc=0 '\0') at src/diver.cpp:252 #4 0x0809c0a1 in whale::Call (this=0x83b749c) at src/whale.cpp:413 #5 0x08083acc in octopus::slave::Call (this=0x83b7488) at src/octopus.cpp:239 #6 0x0805e853 in slot::Start (this=0x83b76ec) at src/slots.cpp:38 #7 0x08062422 in boost::_mfi::mf0<void, slot>::operator() ( this=0x83be8f8, p=0x83b76ec) at /usr/local/include/boost/bind/mem_fn_template.hpp:45 #8 0x08062af3 in boost::_bi::list1<boost::_bi::value<slot*> >::operator ---Type <return> to continue, or q <return> to quit--- ()<boost::_mfi::mf0<void, slot>, boost::_bi::list0> (this=0x83be900, f=@0x83be8f8, a=@0xa8857322) at /usr/local/include/boost/bind.hpp:229 #9 0x08062b47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, slot>, boost::_bi::list1<boost::_bi::value<slot*> > >::operator() ( this=0x83be8f8) at /usr/local/include/boost/bind/bind_template.hpp:20 #10 0x08062b7a in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, slot>, boost::_bi::list1<boost::_bi::value<slot*> > >, void>::invoke (function_obj_ptr= {obj_ptr = 0x83be8f8, const_obj_ptr = 0x83be8f8, func_ptr = 0x83be8f8, data = "�"}) at /usr/local/include/boost/function/function_template.hpp:136 #11 0xb7b9857b in ?? () from /usr/local/lib/libboost_thread-gcc-mt-1_33_1.so.1.33.1 #12 0xb7aff183 in start_thread () from /lib/libpthread.so.0 #13 0xb7a869de in clone () from /lib/libc.so.6 Is this maybe a ICU bug? Some ICU hackers who have a idea, what is happening here? Thank you, kind regards Manuel Jung ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: crash while converting charsetIs your offsets buffer as large as your target buffer?
It should be (targetLimit-target)*sizeof(offsets[0]) in size. It is an int32_t array, so 4 bytes per. -s On 07 Mej 2007, at 01:49, Manuel Jung wrote: > Hi, > > My program converts a lot of text from any charset to UTF16 and in > the end to > UTF8 for the database. This works most of the time, but sometime a get > crashs. They look like the following backtrace: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1467647088 (LWP 9868)] > 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 ( > args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490 > 490 *(myOffsets++) = offsetNum++; > Current language: auto; currently c > (gdb) bt > #0 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 ( > args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490 > #1 0xb78d99b5 in _fromUnicodeWithCallback (pArgs=0xa7ef5370, > err=0xa8856cec) at ucnv.c:893 > #2 0xb78d9f5d in ucnv_fromUnicode_3_6 (cnv=0x83be168, > target=0xa8856ce4, targetLimit=0x82fa4644 "", source=0xa8856ce8, > sourceLimit=0x8329b420, offsets=0xa8058000, flush=0 '\0', > err=0x72) > at ucnv.c:1202 > #3 0x080637d7 in diver::conv (cnv=@0x83b7750, _source=@0xa8856f20, > _dest=0x83b74a8, enc=0 '\0') at src/diver.cpp:252 > #4 0x0809c0a1 in whale::Call (this=0x83b749c) at src/whale.cpp:413 > #5 0x08083acc in octopus::slave::Call (this=0x83b7488) > at src/octopus.cpp:239 > #6 0x0805e853 in slot::Start (this=0x83b76ec) at src/slots.cpp:38 > #7 0x08062422 in boost::_mfi::mf0<void, slot>::operator() ( > this=0x83be8f8, p=0x83b76ec) > at /usr/local/include/boost/bind/mem_fn_template.hpp:45 > #8 0x08062af3 in boost::_bi::list1<boost::_bi::value<slot*> >> ::operator ---Type >> <return> to > continue, or q <return> to quit--- > ()<boost::_mfi::mf0<void, slot>, boost::_bi::list0> (this=0x83be900, > f=@0x83be8f8, a=@0xa8857322) > at /usr/local/include/boost/bind.hpp:229 > #9 0x08062b47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, > slot>, > boost::_bi::list1<boost::_bi::value<slot*> > >::operator() ( > this=0x83be8f8) > at /usr/local/include/boost/bind/bind_template.hpp:20 > #10 0x08062b7a in > boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t > <void, > boost::_mfi::mf0<void, slot>, > boost::_bi::list1<boost::_bi::value<slot*> > >, > void>::invoke (function_obj_ptr= > {obj_ptr = 0x83be8f8, const_obj_ptr = 0x83be8f8, func_ptr = > 0x83be8f8, > data = "�"}) > at /usr/local/include/boost/function/function_template.hpp:136 > #11 0xb7b9857b in ?? () > from /usr/local/lib/libboost_thread-gcc-mt-1_33_1.so.1.33.1 > #12 0xb7aff183 in start_thread () from /lib/libpthread.so.0 > #13 0xb7a869de in clone () from /lib/libc.so.6 > > > Is this maybe a ICU bug? Some ICU hackers who have a idea, what is > happening > here? > Thank you, > kind regards > Manuel Jung ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: crash while converting charset> Is your offsets buffer as large as your target buffer?
> > It should be (targetLimit-target)*sizeof(offsets[0]) in size. > It is an int32_t array, so 4 bytes per. Hi My target buffer is defined by: <snip> int dest_buffer_len = UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(), ucnv_getMaxCharSize(cnv[enc].data)); <snip/> The cnv[enc].data is just a ucnv object. Im holding it in a per thread basis, so that i can reuse it. <snip> const UChar *source = _source.getBuffer(); const UChar *source_limit = source + _source.length(); *_dest = new char[dest_buffer_len+1]; <snip/> The extra "1" is for NULL-Termination. <snip> char *target = *_dest; char *target_limit = target + dest_buffer_len; int offsets[dest_buffer_len]; <snip/> Here you can see, the offset buffer is defined as large as the target buffer. the ucnv does not need to know about the extra Byte for Termination. <snip> ucnv_fromUnicode(cnv[enc].data, &target, target_limit, &source, source_limit, offsets, FALSE, &status); <snip/> This just to keep clear, what the variables are good for. Afer this the status is checked, if a "U_BUFFER_OVERFLOW_ERROR" happened. Sometimes i get a error similar to this: "encoding error : input conversion failed due to input error, bytes 0x8D 0x38 0x03 0xD0", but different bytes. Does this help? The error is written to stdout, even if i pipe the stdout with ">" to a file! Greetings Manuel Jung > -s > > On 07 Mej 2007, at 01:49, Manuel Jung wrote: > > Hi, > > > > My program converts a lot of text from any charset to UTF16 and in > > the end to > > UTF8 for the database. This works most of the time, but sometime a get > > crashs. They look like the following backtrace: > > > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread -1467647088 (LWP 9868)] > > 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 ( > > args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490 > > 490 *(myOffsets++) = offsetNum++; > > Current language: auto; currently c > > (gdb) bt > > #0 0xb78e322d in ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_3_6 ( > > args=0xa7ef5370, err=0xa8856cec) at ucnv_u8.c:490 > > #1 0xb78d99b5 in _fromUnicodeWithCallback (pArgs=0xa7ef5370, > > err=0xa8856cec) at ucnv.c:893 > > #2 0xb78d9f5d in ucnv_fromUnicode_3_6 (cnv=0x83be168, > > target=0xa8856ce4, targetLimit=0x82fa4644 "", source=0xa8856ce8, > > sourceLimit=0x8329b420, offsets=0xa8058000, flush=0 '\0', > > err=0x72) > > at ucnv.c:1202 > > #3 0x080637d7 in diver::conv (cnv=@0x83b7750, _source=@0xa8856f20, > > _dest=0x83b74a8, enc=0 '\0') at src/diver.cpp:252 > > #4 0x0809c0a1 in whale::Call (this=0x83b749c) at src/whale.cpp:413 > > #5 0x08083acc in octopus::slave::Call (this=0x83b7488) > > at src/octopus.cpp:239 > > #6 0x0805e853 in slot::Start (this=0x83b76ec) at src/slots.cpp:38 > > #7 0x08062422 in boost::_mfi::mf0<void, slot>::operator() ( > > this=0x83be8f8, p=0x83b76ec) > > at /usr/local/include/boost/bind/mem_fn_template.hpp:45 > > #8 0x08062af3 in boost::_bi::list1<boost::_bi::value<slot*> > > > >> ::operator ---Type > >> > >> <return> to > > > > continue, or q <return> to quit--- > > ()<boost::_mfi::mf0<void, slot>, boost::_bi::list0> (this=0x83be900, > > f=@0x83be8f8, a=@0xa8857322) > > at /usr/local/include/boost/bind.hpp:229 > > #9 0x08062b47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, > > slot>, > > boost::_bi::list1<boost::_bi::value<slot*> > >::operator() ( > > this=0x83be8f8) > > at /usr/local/include/boost/bind/bind_template.hpp:20 > > #10 0x08062b7a in > > boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t > > <void, > > boost::_mfi::mf0<void, slot>, > > boost::_bi::list1<boost::_bi::value<slot*> > >, > > void>::invoke (function_obj_ptr= > > {obj_ptr = 0x83be8f8, const_obj_ptr = 0x83be8f8, func_ptr = > > 0x83be8f8, > > data = "�"}) > > at /usr/local/include/boost/function/function_template.hpp:136 > > #11 0xb7b9857b in ?? () > > from /usr/local/lib/libboost_thread-gcc-mt-1_33_1.so.1.33.1 > > #12 0xb7aff183 in start_thread () from /lib/libpthread.so.0 > > #13 0xb7a869de in clone () from /lib/libc.so.6 > > > > > > Is this maybe a ICU bug? Some ICU hackers who have a idea, what is > > happening > > here? > > Thank you, > > kind regards > > Manuel Jung > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: crash while converting charset> <snip>
> int dest_buffer_len = > UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(), > ucnv_getMaxCharSize(cnv[enc].data)); > <snip/> > <snip> > char *target = *_dest; > char *target_limit = target + dest_buffer_len; > > int offsets[dest_buffer_len]; > <snip/> This is invalid syntax. Declaring the offsets array with a non-const size won't work. You're probably glossing over something important. I suspect you're giving ICU an offsets array with the wrong size, as Steven suspects. If you don't have a need for the offsets, just pass in NULL for the offsets array. Most people don't use it. It's only helpful if you're trying to correlate the source and target buffers with each other. The charset conversion code also works faster without an offsets array. Also the _source.getBuffer() seems like an ICU call. Is _source a UnicodeString? If so, then you may want to consider using this UnicodeString function instead: int32_t extract(char *dest, int32_t destCapacity, UConverter *cnv, UErrorCode &errorCode) const; It removes some of the complication of using the charset API directly. There are other UnicodeString extract functions you could use, if you don't plan on caching a UConverter or setting conversion options on it. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: crash while converting charsetAm Montag, 7. Mai 2007 22:16 schrieb George Rhoten:
> > <snip> > > int dest_buffer_len = > > UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(), > > ucnv_getMaxCharSize(cnv[enc].data)); > > <snip/> > > <snip> > > char *target = *_dest; > > char *target_limit = target + dest_buffer_len; > > > > int offsets[dest_buffer_len]; > > <snip/> > > This is invalid syntax. Declaring the offsets array with a non-const size > won't work. You're probably glossing over something important. I suspect > you're giving ICU an offsets array with the wrong size, as Steven > suspects. > > If you don't have a need for the offsets, just pass in NULL for the > offsets array. Most people don't use it. It's only helpful if you're > trying to correlate the source and target buffers with each other. The > charset conversion code also works faster without an offsets array. Hm ok, i copied this mostly from a topic in this list. I cannot find the post again, maybe it was still buggy or is outdated. I have to say i still not understand every detail in the conversion routine. > Also the _source.getBuffer() seems like an ICU call. Is _source a > UnicodeString? If so, then you may want to consider using this > UnicodeString function instead: > > int32_t extract(char *dest, int32_t destCapacity, > UConverter *cnv, > UErrorCode &errorCode) const; > > It removes some of the complication of using the charset API directly. > There are other UnicodeString extract functions you could use, if you > don't plan on caching a UConverter or setting conversion options on it. Yeah, thanks, i just stepped over these functions a few hours before you posted. The "source_" is really a UnicodeString. Im using now a extract based function and no crashes after 500 min eaten CPU time. Looks fine! :-) To be complete for all others out there. My function now looks like this: <snip> int32_t conv(const TCnv& cnv, const UnicodeString & _source, char **_dest, unsigned char enc) { UErrorCode status = U_ZERO_ERROR; int dest_buffer_len = UCNV_GET_MAX_BYTES_FOR_STRING(_source.length(), ucnv_getMaxCharSize(cnv[enc].data)); *_dest = new char[dest_buffer_len]; dest_buffer_len=_source.extract(*_dest, dest_buffer_len, cnv[enc].data, status); if(status == U_BUFFER_OVERFLOW_ERROR){ cout << "OVERFLOW" << endl; status=U_ZERO_ERROR; delete[](*_dest); *_dest = new char[dest_buffer_len]; dest_buffer_len=_source.extract(*_dest, dest_buffer_len, cnv[enc].data, status); return dest_buffer_len; } } <snip/> So thanks for your help! Kind Regards Manuel Jung ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
| Free embeddable forum powered by Nabble | Forum Help |