Re: Segmentation Fault R13B01

View: New views
3 Messages — Rating Filter:   Alert me  

Re: Segmentation Fault R13B01

by Cliff Moon-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

What's the status on this issue?  I've seen this behavior before as
well.  It seems to be a concurrency issue with active TCP packet
delivery.  You can pretty easily reproduce the issue using the janus app
found here:

http://github.com/cliffmoon/janus/tree/master

You need to test on a linux machine with a lot of CPU's to see it happen
with any frequency.  I've found that 8 cores or more is enough to see it
happen after a few tries.  Using an EC2 high cpu XL instance seems to do
the trick.  Basically just start up the server in one VM like this:
`make run1` and startup the workers by doing `make sh` and issuing the
erlang command bot:test(flashbot, 10000).

You have a pretty good chance of seeing one of the VM's segfault.  If
not you need to restart the VM and start from scratch.  When I run this
with gdb I get a similar backtrace to what was previously mentioned in
this thread, in that it appears to be a problem in active tcp delivery.


------------------------------------------------------------------------------------------

    Hi,I build erlang using gcc 4.1.2 (the default for centos)
    I started erl using
     -env ERL_MAX_PORTS 110000 +K true +P 110000 +S4 -smp -detached

    You can download 3 core dumps
    http://94.75.214.130/core.12514.gz
    http://94.75.214.130/core.939.gz
    http://94.75.214.130/core.28223.gz

    Unfortunately, i have no clue which part of the code triggers the
    segfault, other than it happens constantly, and i
    can not redistribute the whole program. The program though uses
    heavily tcp
    connections, typically i have over
    10k established tcp connections.

    I would try to build the debug emulator tonight and let you know if
    i find
    something.

    Thanks,
    Georgos

    2009/7/1 Raimo Niskanen
    <raimo+erlang-bugs@...<raimo%2Berlang-bugs@...>
     >

     > On Wed, Jul 01, 2009 at 05:25:14PM +0200, Georgos Siganos wrote:
     > > Hi All,I am having problems with R13B01 and segmentation
    faults, as the
     > > following one (in the bottom).
     > > Unfortunately, i am not sure which part of the code triggers the
     > > segmentation fault.
     > >
     > > I am running Centos 5.3 ( 2.6.18-128.1.16.el5 #1 SMP x86_64 )
    on a quad
     > core
     > > intel processor.
     > > The program quits with segfault both when compiled with and
    without hipe.
     > >
     > > Please let me know if there is anything else i can report to
    fix this
     > > problem. This
     > > segmentation
     > > fault is quite consistent and is a show stopper for my code.
     > > Thanks,
     > > Georgos
     >
     > How did you build the Erlang emulator, how did you start it
     > (arguments), how did you provoke the segfault?
     >
     > Can you post the code that provokes this to see
     > if it is reproducable on other OS:es?
     >
     > Can you post the core dump for the Erlang/OTP team to dissect?
     >
     > Can you build and run a debug emulator and see if you get an earlier
     > fault detection? (gmake smp TYPE=debug in the emulator directory)
     >
     > >
     > >
     > > ----------------------- gdb output --------------------------
     > > Program terminated with signal 11, Segmentation fault.
     > > [New process 12531]
     > > [New process 12533]
     > > [New process 12532]
     > > [New process 12530]
     > > [New process 12526]
     > > [New process 12517]
     > > [New process 12516]
     > > [New process 12514]
     > > #0  0x00002add1f7b570b in memcpy () from /lib64/libc.so.6
     > > (gdb) bt
     > > #0  0x00002add1f7b570b in memcpy () from /lib64/libc.so.6
     > > #1  0x0000000000486849 in driver_deliver_term (port=<value
    optimized
     > out>,
     > > to=4816451, data=<value optimized out>,
     > >     len=<value optimized out>) at beam/io.c:2994
     > > #2  0x00000000005513cf in tcp_deliver (desc=0x2aab17c17548,
    len=3) at
     > > drivers/common/inet_drv.c:2980
     > > #3  0x0000000000551891 in tcp_recv (desc=0x2aab17c17548,
    request_len=0)
     > at
     > > drivers/common/inet_drv.c:8043
     > > #4  0x0000000000551afc in tcp_inet_drv_input (data=0x2aaae21a9fc4,
     > > event=<value optimized out>) at drivers/common/inet_drv.c:8381
     > > #5  0x00000000004a3d78 in erts_port_task_execute
    (runq=0x2add1fc19340,
     > > curr_port_pp=0x2aaaaaacb1e8) at beam/erl_port_task.c:853
     > > #6  0x000000000049ebc5 in schedule (p=0x349, calls=<value
    optimized out>)
     > at
     > > beam/erl_process.c:6116
     > > #7  0x0000000000505afd in process_main () at beam/beam_emu.c:1126
     > > #8  0x0000000000499126 in sched_thread_func (vesdp=<value
    optimized out>)
     > at
     > > beam/erl_process.c:3015
     > > #9  0x000000000057a0f4 in thr_wrapper (vtwd=<value optimized
    out>) at
     > > common/ethread.c:475
     > > #10 0x00002add1f31b367 in start_thread () from
    /lib64/libpthread.so.0
     > > #11 0x00002add1f80cf7d in clone () from /lib64/libc.so.6
     > >
     >
    ---------------------------------------------------------------------------
     >
     > --
     >
     > / Raimo Niskanen, Erlang/OTP, Ericsson AB
     >



________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org


Re: Re: Segmentation Fault R13B01

by Rickard Green :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We've not been able to reproduce this. Next time you get a core, please
make the core and the beam.smp file (located in
lib/erlang/erts-5.7.2/bin) available for us. Also include info about
linux distribution, kernel version, hw arch, and gcc version used.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB.

Cliff Moon wrote:

> <div class="moz-text-flowed" style="font-family: -moz-fixed">What's the
> status on this issue?  I've seen this behavior before as well.  It seems
> to be a concurrency issue with active TCP packet delivery.  You can
> pretty easily reproduce the issue using the janus app found here:
>
> http://github.com/cliffmoon/janus/tree/master
>
> You need to test on a linux machine with a lot of CPU's to see it happen
> with any frequency.  I've found that 8 cores or more is enough to see it
> happen after a few tries.  Using an EC2 high cpu XL instance seems to do
> the trick.  Basically just start up the server in one VM like this:
> `make run1` and startup the workers by doing `make sh` and issuing the
> erlang command bot:test(flashbot, 10000).
>
> You have a pretty good chance of seeing one of the VM's segfault.  If
> not you need to restart the VM and start from scratch.  When I run this
> with gdb I get a similar backtrace to what was previously mentioned in
> this thread, in that it appears to be a problem in active tcp delivery.
>
>
> ------------------------------------------------------------------------------------------
>
>
>    Hi,I build erlang using gcc 4.1.2 (the default for centos)
>    I started erl using
>     -env ERL_MAX_PORTS 110000 +K true +P 110000 +S4 -smp -detached
>
>    You can download 3 core dumps
>    http://94.75.214.130/core.12514.gz
>    http://94.75.214.130/core.939.gz
>    http://94.75.214.130/core.28223.gz
>
>    Unfortunately, i have no clue which part of the code triggers the
>    segfault, other than it happens constantly, and i
>    can not redistribute the whole program. The program though uses
>    heavily tcp
>    connections, typically i have over
>    10k established tcp connections.
>
>    I would try to build the debug emulator tonight and let you know if
>    i find
>    something.
>
>    Thanks,
>    Georgos
>
>    2009/7/1 Raimo Niskanen
>    
> <raimo+erlang-bugs@...<raimo%2Berlang-bugs@...>
>     >
>
>     > On Wed, Jul 01, 2009 at 05:25:14PM +0200, Georgos Siganos wrote:
>     > > Hi All,I am having problems with R13B01 and segmentation
>    faults, as the
>     > > following one (in the bottom).
>     > > Unfortunately, i am not sure which part of the code triggers the
>     > > segmentation fault.
>     > >
>     > > I am running Centos 5.3 ( 2.6.18-128.1.16.el5 #1 SMP x86_64 )
>    on a quad
>     > core
>     > > intel processor.
>     > > The program quits with segfault both when compiled with and
>    without hipe.
>     > >
>     > > Please let me know if there is anything else i can report to
>    fix this
>     > > problem. This
>     > > segmentation
>     > > fault is quite consistent and is a show stopper for my code.
>     > > Thanks,
>     > > Georgos
>     >
>     > How did you build the Erlang emulator, how did you start it
>     > (arguments), how did you provoke the segfault?
>     >
>     > Can you post the code that provokes this to see
>     > if it is reproducable on other OS:es?
>     >
>     > Can you post the core dump for the Erlang/OTP team to dissect?
>     >
>     > Can you build and run a debug emulator and see if you get an earlier
>     > fault detection? (gmake smp TYPE=debug in the emulator directory)
>     >
>     > >
>     > >
>     > > ----------------------- gdb output --------------------------
>     > > Program terminated with signal 11, Segmentation fault.
>     > > [New process 12531]
>     > > [New process 12533]
>     > > [New process 12532]
>     > > [New process 12530]
>     > > [New process 12526]
>     > > [New process 12517]
>     > > [New process 12516]
>     > > [New process 12514]
>     > > #0  0x00002add1f7b570b in memcpy () from /lib64/libc.so.6
>     > > (gdb) bt
>     > > #0  0x00002add1f7b570b in memcpy () from /lib64/libc.so.6
>     > > #1  0x0000000000486849 in driver_deliver_term (port=<value
>    optimized
>     > out>,
>     > > to=4816451, data=<value optimized out>,
>     > >     len=<value optimized out>) at beam/io.c:2994
>     > > #2  0x00000000005513cf in tcp_deliver (desc=0x2aab17c17548,
>    len=3) at
>     > > drivers/common/inet_drv.c:2980
>     > > #3  0x0000000000551891 in tcp_recv (desc=0x2aab17c17548,
>    request_len=0)
>     > at
>     > > drivers/common/inet_drv.c:8043
>     > > #4  0x0000000000551afc in tcp_inet_drv_input (data=0x2aaae21a9fc4,
>     > > event=<value optimized out>) at drivers/common/inet_drv.c:8381
>     > > #5  0x00000000004a3d78 in erts_port_task_execute
>    (runq=0x2add1fc19340,
>     > > curr_port_pp=0x2aaaaaacb1e8) at beam/erl_port_task.c:853
>     > > #6  0x000000000049ebc5 in schedule (p=0x349, calls=<value
>    optimized out>)
>     > at
>     > > beam/erl_process.c:6116
>     > > #7  0x0000000000505afd in process_main () at beam/beam_emu.c:1126
>     > > #8  0x0000000000499126 in sched_thread_func (vesdp=<value
>    optimized out>)
>     > at
>     > > beam/erl_process.c:3015
>     > > #9  0x000000000057a0f4 in thr_wrapper (vtwd=<value optimized
>    out>) at
>     > > common/ethread.c:475
>     > > #10 0x00002add1f31b367 in start_thread () from
>    /lib64/libpthread.so.0
>     > > #11 0x00002add1f80cf7d in clone () from /lib64/libc.so.6
>     > >
>     >
>    
> ---------------------------------------------------------------------------
>     >
>     > --
>     >
>     > / Raimo Niskanen, Erlang/OTP, Ericsson AB
>     >
>
>
>
> </div>


--
Rickard Green, Erlang/OTP, Ericsson AB.

________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org


try-catch doesn't work in werl

by John Hughes-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here's an example in erl:

1> try throw(foo) catch A:B -> {A,B} end.
{throw,foo}

Here's the same example in werl:

20> try throw(foo) catch A:B -> {A,B} end.
** exception throw: foo

exits and errors behave the same way--the exception is not caught when the
try...catch is evaluated in the shell.

I'm running emulator version 5.7.2 (in both cases!) and OTP release R13B01,
under Vista.

John Hughes


________________________________________________________________
erlang-bugs mailing list. See http://www.erlang.org/faq.html
erlang-bugs (at) erlang.org