|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
hipe crash with compiler modulesHello,
I have been experiencing a random crash with hipe on FreeBSD 32bits (R13B) and MacOS X 10.6 64bits (R13B01) when compiler modules have been recompiled with native code. The compiler modules have been recompiled with some code that goes like this : {_, Beam, Path} = code:get_object_code(Module), {ok, _, Chunks} = beam_lib:all_chunks(Beam), {ok, {Target, HipeBinary}} = hipe:compile(Module), ChunkName = hipe_unified_loader:chunk_name(Target), {ok, NewBeam} = beam_lib:build_module(Chunks ++ [{ChunkName, HipeBinary}]), The crash happens when I compile several files (a dozen) at once with a rpc:pmap. I believe the rpc:pmap is the reason why the crash happens randomly. This is with an internal tool called erl_make. If I run erl_make clean && erl_make install, I get a crash, but if I do erl_make install; erl_make install, the second operation (almost always) succeeds. Or sometimes, I need to run erl_make clean to successfully compile with erl_make install. The stack trace (on MacOS X) looks like this : Thread 4 Crashed: 0 beam.smp 0x000000000055dc0f gensweep_nstack + 623 1 beam.smp 0x00000000004e5591 do_minor + 313 2 beam.smp 0x00000000004e4ef9 minor_collection + 547 3 beam.smp 0x00000000004e34f4 erts_garbage_collect + 590 4 beam.smp 0x00000000004e31de erts_gc_after_bif_call + 153 5 beam.smp 0x000000000051acee process_main + 42816 6 beam.smp 0x000000000047a833 sched_thread_func + 357 7 beam.smp 0x000000000059ca27 thr_wrapper + 103 8 libSystem.B.dylib 0x00007fff86da4f66 _pthread_start + 331 9 libSystem.B.dylib 0x00007fff86da4e19 thread_start + 13 If all compiler beam files are replaced with the original ones (i.e. without the hipe chunk), there is no crash. I couldn't single out a compiler module that causes the crash. It looks like that if several of them are native, the crash does happen. I found a reference to a crash in gensweep_nstack in the archives : http://erlang.org/pipermail/erlang-bugs/2008-December/001131.html In this case, the code that gets compiled natively is just part of OTP. Do you have any hint about what can be done to track down the bug ? Paul -- Semiocast http://titema.com/ +33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris ________________________________________________________________ erlang-bugs mailing list. See http://www.erlang.org/faq.html erlang-bugs (at) erlang.org |
|
|
Re: hipe crash with compiler modulesPaul Guyot writes:
> Hello, > > I have been experiencing a random crash with hipe on FreeBSD 32bits > (R13B) and MacOS X 10.6 64bits (R13B01) when compiler modules have > been recompiled with native code. 64-bit native code on OSX has not been validated by the HiPE group, so it is unsupported. 32-bit native code on OSX 10.5 seems to work, but has been only very lightly tested by us. > The compiler modules have been recompiled with some code that goes > like this : > > {_, Beam, Path} = code:get_object_code(Module), > {ok, _, Chunks} = beam_lib:all_chunks(Beam), > {ok, {Target, HipeBinary}} = hipe:compile(Module), > ChunkName = hipe_unified_loader:chunk_name(Target), > {ok, NewBeam} = beam_lib:build_module(Chunks ++ > [{ChunkName, HipeBinary}]), The proper way to compile modules is to pass 'native' as an option to the BEAM compiler. I do not consider hipe:compile or hipe_unified_loader:chunk_name to be public APIs. So why do you do it in this awkward way? > The crash happens when I compile several files (a dozen) at once with > a rpc:pmap. I believe the rpc:pmap is the reason why the crash happens > randomly. This is with an internal tool called erl_make. If I run > erl_make clean && erl_make install, I get a crash, but if I do > erl_make install; erl_make install, the second operation (almost > always) succeeds. Or sometimes, I need to run erl_make clean to > successfully compile with erl_make install. > > The stack trace (on MacOS X) looks like this : > > Thread 4 Crashed: > 0 beam.smp 0x000000000055dc0f gensweep_nstack > + 623 > 1 beam.smp 0x00000000004e5591 do_minor + 313 > 2 beam.smp 0x00000000004e4ef9 minor_collection > + 547 > 3 beam.smp 0x00000000004e34f4 > erts_garbage_collect + 590 > 4 beam.smp 0x00000000004e31de > erts_gc_after_bif_call + 153 > 5 beam.smp 0x000000000051acee process_main + > 42816 > 6 beam.smp 0x000000000047a833 > sched_thread_func + 357 > 7 beam.smp 0x000000000059ca27 thr_wrapper + 103 > 8 libSystem.B.dylib 0x00007fff86da4f66 _pthread_start + > 331 > 9 libSystem.B.dylib 0x00007fff86da4e19 thread_start + 13 > > If all compiler beam files are replaced with the original ones (i.e. > without the hipe chunk), there is no crash. I couldn't single out a > compiler module that causes the crash. It looks like that if several > of them are native, the crash does happen. > > I found a reference to a crash in gensweep_nstack in the archives : > http://erlang.org/pipermail/erlang-bugs/2008-December/001131.html > > In this case, the code that gets compiled natively is just part of > OTP. Do you have any hint about what can be done to track down the bug ? There is a known problem with concurrent invokations of the HiPE compiler. It looks like the serialization of code loading that the BEAM loader is supposed to do isn't happening, or it is bypassed. This corrupts certain runtime system data structures causing crashes during GC. I'm currently trying to debug this problem. ________________________________________________________________ erlang-bugs mailing list. See http://www.erlang.org/faq.html erlang-bugs (at) erlang.org |
|
|
Re: hipe crash with compiler modulesHello Mikael,
Thank you for your reply. > 64-bit native code on OSX has not been validated by the HiPE group, > so it is unsupported. 32-bit native code on OSX 10.5 seems to work, > but has been only very lightly tested by us. I have been using the patches from MacPorts (http://trac.macports.org/browser/trunk/dports/lang/erlang/files/ ), which I authored, so I realize they're not supported :) >> The compiler modules have been recompiled with some code that goes >> like this : >> >> {_, Beam, Path} = code:get_object_code(Module), >> {ok, _, Chunks} = beam_lib:all_chunks(Beam), >> {ok, {Target, HipeBinary}} = hipe:compile(Module), >> ChunkName = hipe_unified_loader:chunk_name(Target), >> {ok, NewBeam} = beam_lib:build_module(Chunks ++ >> [{ChunkName, HipeBinary}]), > > The proper way to compile modules is to pass 'native' as > an option to the BEAM compiler. I do not consider hipe:compile > or hipe_unified_loader:chunk_name to be public APIs. > > So why do you do it in this awkward way? These lines were inspired from what dialyzer does. My first goal was to factorize the 1 or 2 minutes when dialyzer has to process more than 20 modules and decides to natively recompile "key modules" (by calling hipe:compile/1). It seems such a waste to recompile those modules over and over, so I wrote some code that recompile those modules once and for all, and saves the altered beam. These are the 5 lines above, and indeed, I call hipe_unified_loader:chunk_name/1 to avoid putting a constant in the code there, so the code works on all development and continuous integration machines. I did it this way because it seemed easier than recompiling OTP modules in an OTP binary deployment. Of course, I realize this doesn't use public API. I thought I could natively recompile more modules than those selected by dialyzer. This is how I ended up recompiling all compiler modules. It seems useless to recompile several key OTP modules (e.g. lists) because they are loaded before HiPE is actually loaded, but compiler modules are a good target. Everything went fine as long as the process consisted in running erlc for each of our module and then dialyzer. Then we moved to a new toolchain that calls compile:file/2 and dialyzer from a single VM, with all calls to compile:file/2 through a rpc:rmap, and this is when we started to observe those crashes. >> In this case, the code that gets compiled natively is just part of >> OTP. Do you have any hint about what can be done to track down the >> bug ? > > There is a known problem with concurrent invokations of the HiPE > compiler. > It looks like the serialization of code loading that the BEAM loader > is > supposed to do isn't happening, or it is bypassed. This corrupts > certain > runtime system data structures causing crashes during GC. I'm > currently > trying to debug this problem. Great. I was just asking how we could help fixing this bug. I realize a VM crash is high priority. We're not observing this crash in production (since it's purely related to compiling), and we definitely don't use unsupported HiPE patches such as MacOS X 10.6/64bits on production servers. Thanks again, Paul -- Semiocast http://titema.com/ +33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris ________________________________________________________________ erlang-bugs mailing list. See http://www.erlang.org/faq.html erlang-bugs (at) erlang.org |
|
|
Re: hipe crash with compiler modulesLe 3 nov. 2009 à 23:57, Mikael Pettersson a écrit :
> There is a known problem with concurrent invokations of the HiPE > compiler. > It looks like the serialization of code loading that the BEAM loader > is > supposed to do isn't happening, or it is bypassed. This corrupts > certain > runtime system data structures causing crashes during GC. I'm > currently > trying to debug this problem. Hello, I've just changed the code to load all native modules sequentially before calling the rpc:pmap and the crash disappeared. So it sounds it's exactly this bug. Thanks again, Paul -- Semiocast http://titema.com/ +33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris ________________________________________________________________ erlang-bugs mailing list. See http://www.erlang.org/faq.html erlang-bugs (at) erlang.org |
| Free embeddable forum powered by Nabble | Forum Help |