hipe crash with compiler modules

View: New views
4 Messages — Rating Filter:   Alert me  

hipe crash with compiler modules

by Paul Guyot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I have been experiencing a random crash with hipe on FreeBSD 32bits  
(R13B) and MacOS X 10.6 64bits (R13B01) when compiler modules have  
been recompiled with native code.

The compiler modules have been recompiled with some code that goes  
like this :

             {_, Beam, Path} = code:get_object_code(Module),
             {ok, _, Chunks} = beam_lib:all_chunks(Beam),
             {ok, {Target, HipeBinary}} = hipe:compile(Module),
             ChunkName = hipe_unified_loader:chunk_name(Target),
             {ok, NewBeam} = beam_lib:build_module(Chunks ++  
[{ChunkName, HipeBinary}]),

The crash happens when I compile several files (a dozen) at once with  
a rpc:pmap. I believe the rpc:pmap is the reason why the crash happens  
randomly. This is with an internal tool called erl_make. If I run  
erl_make clean && erl_make install, I get a crash, but if I do  
erl_make install; erl_make install, the second operation (almost  
always) succeeds. Or sometimes, I need to run erl_make clean to  
successfully compile with erl_make install.

The stack trace (on MacOS X) looks like this :

Thread 4 Crashed:
0   beam.smp                       0x000000000055dc0f gensweep_nstack  
+ 623
1   beam.smp                       0x00000000004e5591 do_minor + 313
2   beam.smp                       0x00000000004e4ef9 minor_collection  
+ 547
3   beam.smp                       0x00000000004e34f4  
erts_garbage_collect + 590
4   beam.smp                       0x00000000004e31de  
erts_gc_after_bif_call + 153
5   beam.smp                       0x000000000051acee process_main +  
42816
6   beam.smp                       0x000000000047a833  
sched_thread_func + 357
7   beam.smp                       0x000000000059ca27 thr_wrapper + 103
8   libSystem.B.dylib             0x00007fff86da4f66 _pthread_start +  
331
9   libSystem.B.dylib             0x00007fff86da4e19 thread_start + 13

If all compiler beam files are replaced with the original ones (i.e.  
without the hipe chunk), there is no crash. I couldn't single out a  
compiler module that causes the crash. It looks like that if several  
of them are native, the crash does happen.

I found a reference to a crash in gensweep_nstack in the archives :
http://erlang.org/pipermail/erlang-bugs/2008-December/001131.html

In this case, the code that gets compiled natively is just part of  
OTP. Do you have any hint about what can be done to track down the bug ?

Paul
--
Semiocast                       http://titema.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris


________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org


Re: hipe crash with compiler modules

by Mikael Pettersson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Paul Guyot writes:
 > Hello,
 >
 > I have been experiencing a random crash with hipe on FreeBSD 32bits  
 > (R13B) and MacOS X 10.6 64bits (R13B01) when compiler modules have  
 > been recompiled with native code.

64-bit native code on OSX has not been validated by the HiPE group,
so it is unsupported. 32-bit native code on OSX 10.5 seems to work,
but has been only very lightly tested by us.

 > The compiler modules have been recompiled with some code that goes  
 > like this :
 >
 >              {_, Beam, Path} = code:get_object_code(Module),
 >              {ok, _, Chunks} = beam_lib:all_chunks(Beam),
 >              {ok, {Target, HipeBinary}} = hipe:compile(Module),
 >              ChunkName = hipe_unified_loader:chunk_name(Target),
 >              {ok, NewBeam} = beam_lib:build_module(Chunks ++  
 > [{ChunkName, HipeBinary}]),

The proper way to compile modules is to pass 'native' as
an option to the BEAM compiler. I do not consider hipe:compile
or hipe_unified_loader:chunk_name to be public APIs.

So why do you do it in this awkward way?

 > The crash happens when I compile several files (a dozen) at once with  
 > a rpc:pmap. I believe the rpc:pmap is the reason why the crash happens  
 > randomly. This is with an internal tool called erl_make. If I run  
 > erl_make clean && erl_make install, I get a crash, but if I do  
 > erl_make install; erl_make install, the second operation (almost  
 > always) succeeds. Or sometimes, I need to run erl_make clean to  
 > successfully compile with erl_make install.
 >
 > The stack trace (on MacOS X) looks like this :
 >
 > Thread 4 Crashed:
 > 0   beam.smp                       0x000000000055dc0f gensweep_nstack  
 > + 623
 > 1   beam.smp                       0x00000000004e5591 do_minor + 313
 > 2   beam.smp                       0x00000000004e4ef9 minor_collection  
 > + 547
 > 3   beam.smp                       0x00000000004e34f4  
 > erts_garbage_collect + 590
 > 4   beam.smp                       0x00000000004e31de  
 > erts_gc_after_bif_call + 153
 > 5   beam.smp                       0x000000000051acee process_main +  
 > 42816
 > 6   beam.smp                       0x000000000047a833  
 > sched_thread_func + 357
 > 7   beam.smp                       0x000000000059ca27 thr_wrapper + 103
 > 8   libSystem.B.dylib             0x00007fff86da4f66 _pthread_start +  
 > 331
 > 9   libSystem.B.dylib             0x00007fff86da4e19 thread_start + 13
 >
 > If all compiler beam files are replaced with the original ones (i.e.  
 > without the hipe chunk), there is no crash. I couldn't single out a  
 > compiler module that causes the crash. It looks like that if several  
 > of them are native, the crash does happen.
 >
 > I found a reference to a crash in gensweep_nstack in the archives :
 > http://erlang.org/pipermail/erlang-bugs/2008-December/001131.html
 >
 > In this case, the code that gets compiled natively is just part of  
 > OTP. Do you have any hint about what can be done to track down the bug ?

There is a known problem with concurrent invokations of the HiPE compiler.
It looks like the serialization of code loading that the BEAM loader is
supposed to do isn't happening, or it is bypassed. This corrupts certain
runtime system data structures causing crashes during GC. I'm currently
trying to debug this problem.

________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org


Re: hipe crash with compiler modules

by Paul Guyot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Mikael,

Thank you for your reply.

> 64-bit native code on OSX has not been validated by the HiPE group,
> so it is unsupported. 32-bit native code on OSX 10.5 seems to work,
> but has been only very lightly tested by us.

I have been using the patches from MacPorts (http://trac.macports.org/browser/trunk/dports/lang/erlang/files/ 
), which I authored, so I realize they're not supported :)

>> The compiler modules have been recompiled with some code that goes
>> like this :
>>
>>             {_, Beam, Path} = code:get_object_code(Module),
>>             {ok, _, Chunks} = beam_lib:all_chunks(Beam),
>>             {ok, {Target, HipeBinary}} = hipe:compile(Module),
>>             ChunkName = hipe_unified_loader:chunk_name(Target),
>>             {ok, NewBeam} = beam_lib:build_module(Chunks ++
>> [{ChunkName, HipeBinary}]),
>
> The proper way to compile modules is to pass 'native' as
> an option to the BEAM compiler. I do not consider hipe:compile
> or hipe_unified_loader:chunk_name to be public APIs.
>
> So why do you do it in this awkward way?

These lines were inspired from what dialyzer does. My first goal was  
to factorize the 1 or 2 minutes when dialyzer has to process more than  
20 modules and decides to natively recompile "key modules" (by calling  
hipe:compile/1). It seems such a waste to recompile those modules over  
and over, so I wrote some code that recompile those modules once and  
for all, and saves the altered beam. These are the 5 lines above, and  
indeed, I call hipe_unified_loader:chunk_name/1 to avoid putting a  
constant in the code there, so the code works on all development and  
continuous integration machines. I did it this way because it seemed  
easier than recompiling OTP modules in an OTP binary deployment. Of  
course, I realize this doesn't use public API.

I thought I could natively recompile more modules than those selected  
by dialyzer. This is how I ended up recompiling all compiler modules.  
It seems useless to recompile several key OTP modules (e.g. lists)  
because they are loaded before HiPE is actually loaded, but compiler  
modules are a good target.

Everything went fine as long as the process consisted in running erlc  
for each of our module and then dialyzer. Then we moved to a new  
toolchain that calls compile:file/2 and dialyzer from a single VM,  
with all calls to compile:file/2 through a rpc:rmap, and this is when  
we started to observe those crashes.

>> In this case, the code that gets compiled natively is just part of
>> OTP. Do you have any hint about what can be done to track down the  
>> bug ?
>
> There is a known problem with concurrent invokations of the HiPE  
> compiler.
> It looks like the serialization of code loading that the BEAM loader  
> is
> supposed to do isn't happening, or it is bypassed. This corrupts  
> certain
> runtime system data structures causing crashes during GC. I'm  
> currently
> trying to debug this problem.


Great. I was just asking how we could help fixing this bug. I realize  
a VM crash is high priority. We're not observing this crash in  
production (since it's purely related to compiling), and we definitely  
don't use unsupported HiPE patches such as MacOS X 10.6/64bits on  
production servers.

Thanks again,

Paul
--
Semiocast                       http://titema.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris


________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org


Re: hipe crash with compiler modules

by Paul Guyot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Le 3 nov. 2009 à 23:57, Mikael Pettersson a écrit :

> There is a known problem with concurrent invokations of the HiPE  
> compiler.
> It looks like the serialization of code loading that the BEAM loader  
> is
> supposed to do isn't happening, or it is bypassed. This corrupts  
> certain
> runtime system data structures causing crashes during GC. I'm  
> currently
> trying to debug this problem.

Hello,

I've just changed the code to load all native modules sequentially  
before calling the rpc:pmap and the crash disappeared. So it sounds  
it's exactly this bug.

Thanks again,

Paul
--
Semiocast                       http://titema.com/
+33.175000290 - 62 bis rue Gay-Lussac, 75005 Paris


________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org