« Return to Thread: 2.4.1 broker failure/crash

2.4.1 broker failure/crash

by dukeFan :: Rate this Message:

| View in Thread

We are running rabbitmq 2.4.1 in production and recently had a failure that we can not determine the root cause. Also we tried a restart of the broker and the restart hung, never returned. We rebooted the machine to restore the broker.

We have only the rabbitmq and sasl logs at this point, but the error messages don't mean much to us.

rabbitmq log snippet:

=INFO REPORT==== 11-Apr-2012::05:04:08 ===
starting TCP connection <0.28490.65> from 172.17.208.67:1522

=INFO REPORT==== 11-Apr-2012::05:04:08 ===
closing TCP connection <0.9195.65> from 10.70.20.75:62045

=INFO REPORT==== 11-Apr-2012::05:04:31 ===
closing TCP connection <0.10243.65> from 10.70.40.77:53173

=ERROR REPORT==== 11-Apr-2012::05:04:31 ===
** Generic server msg_store_transient terminating
** Last message in was {'$gen_cast',
                           {client_dying,
                               <<74,18,61,37,8,55,8,91,210,27,70,185,112,89,
                                 171,154>>}}
** When Server state == {msstate,
                         "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1/msg_store_transient",
                         rabbit_msg_store_ets_index,
                         {state,417861,
                          "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1/msg_store_transient"},
                         0,#Ref<0.0.0.875>,
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         [],undefined,0,12073198,[],<0.233.0>,421958,413764,
                         426055,
                         {set,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
...skipping...
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                            []}}}}
** Reason for termination == 
** {{badmatch,false},
    [{rabbit_msg_store_ets_index,insert,2},
     {rabbit_msg_store,write_message,3},
     {rabbit_msg_store,handle_cast,2},
     {gen_server2,handle_msg,2},
     {proc_lib,wake_up,3}]}
...skipping...
=INFO REPORT==== 11-Apr-2012::05:04:43 ===
closing TCP connection <0.5032.4496> from 172.16.216.217:60234

=INFO REPORT==== 11-Apr-2012::05:04:43 ===
closing TCP connection <0.8419.6115> from 10.65.10.72:54580

=ERROR REPORT==== 11-Apr-2012::05:04:43 ===
** Generic server <0.31907.9> terminating
** Last message in was {'EXIT',<0.241.0>,shutdown}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/alarming">>,queue,<<"alarming.9">>},
                          false,false,none,[],<0.31907.9>},
                         none,true,rabbit_variable_queue,
                         {vqstate,
                          {[],[]},
                          {0,{[],[]}},
                          {delta,undefined,0,undefined},
...skipping...
                         {state,fine,undefined},
                         {dict,0,16,16,8,80,48,
                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                          {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                         undefined,undefined}
** Reason for termination == 
** {noproc,
       {gen_server2,call,
           [msg_store_transient,
            {client_terminate,
                <<17,102,9,148,6,184,165,141,162,246,194,57,36,62,208,135>>},
            infinity]}}
** In 'terminate' callback with reason ==
** shutdown

=ERROR REPORT==== 11-Apr-2012::05:04:43 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.146.0>,
                    {<0.9700.6>,
                     "** Generic server ~p terminating~n** Last message in was ~p~n** When Server state == ~p~n** Reason for termination == ~n** ~p~n** In 'terminate' callback with reason ==~n** ~p~n",
                     [<0.9700.6>,
                      {'EXIT',<0.241.0>,shutdown},
                      {q,
                       {amqqueue,
                        {resource,<<"/rssm">>,queue,
                         <<"cse.rssm.logManager.sqlserver">>},
                        false,false,none,[],<0.9700.6>},
                       none,true,rabbit_variable_queue,
                       {vqstate,
                        {[],[]},
                        {0,{[],[]}},
                        {delta,undefined,0,undefined},
                        {0,{[],[]}},
...skipping...
                      {noproc,
                       {gen_server2,call,
                        [msg_store_transient,
                         {client_terminate,
                          <<143,174,238,76,144,209,125,211,110,123,56,1,237,
                            217,136,2>>},
                         infinity]}},
                      shutdown]}}
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {badarg,[{ets,lookup,[rabbit_registry,{exchange,topic}]},
                      {rabbit_registry,lookup_module,2},
                      {rabbit_exchange,type_to_module,1},
                      {rabbit_exchange,route,2},
                      {rabbit_exchange,publish,2},
                      {rabbit_basic,publish,1},
                      {rabbit_error_logger,publish1,4},
                      {rabbit_error_logger,handle_event,2}]}

=INFO REPORT==== 11-Apr-2012::05:04:43 ===
    application: rabbit
    exited: shutdown
    type: permanent


sasl log snippet:
=SUPERVISOR REPORT==== 11-Apr-2012::00:15:30 ===
     Supervisor: {<0.5419.34>,rabbit_channel_sup_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{pid,<0.5731.34>},
                  {name,channel_sup},
                  {mfa,{rabbit_channel_sup,start_link,[]}},
                  {restart_type,temporary},
                  {shutdown,infinity},
                  {child_type,supervisor}]


=CRASH REPORT==== 11-Apr-2012::05:04:32 ===
  crasher:
    initial call: gen:init_it/7
    pid: <0.232.0>
    registered_name: msg_store_transient
    exception exit: {{badmatch,false},
                     [{rabbit_msg_store_ets_index,insert,2},
                      {rabbit_msg_store,write_message,3},
                      {rabbit_msg_store,handle_cast,2},
                      {gen_server2,handle_msg,2},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_sup,<0.147.0>]
    messages: [{'EXIT',<0.233.0>,normal}]
    links: [<0.148.0>]
    dictionary: [{fhc_age_tree,{0,nil}}]
    trap_exit: true
    status: running
    heap_size: 10946
    stack_size: 24
    reductions: 98380626
  neighbours:
=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===
     Supervisor: {local,rabbit_sup}
     Context:    child_terminated
     Reason:     {{badmatch,false},
                  [{rabbit_msg_store_ets_index,insert,2},
                   {rabbit_msg_store,write_message,3},
                   {rabbit_msg_store,handle_cast,2},
                   {gen_server2,handle_msg,2},
                   {proc_lib,wake_up,3}]}
     Offender:   [{pid,<0.232.0>},
                  {name,msg_store_transient},
                  {mfargs,
                      {rabbit_msg_store,start_link,
                          [msg_store_transient,
                           "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1",
                           undefined,
                           {#Fun<rabbit_variable_queue.0.66952436>,ok}]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===
     Supervisor: {local,rabbit_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.232.0>},
                  {name,msg_store_transient},
                  {mfargs,
                      {rabbit_msg_store,start_link,
                          [msg_store_transient,
                           "/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1",
                           undefined,
                           {#Fun<rabbit_variable_queue.0.66952436>,ok}]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]
...skipping...
=CRASH REPORT==== 11-Apr-2012::05:04:43 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.31907.9>
    registered_name: []
    exception exit: {noproc,
                        {gen_server2,call,
                            [msg_store_transient,
                             {client_terminate,
                                 <<213,104,174,241,176,121,164,159,98,43,221,
                                   160,120,109,6,107>>},
                             infinity]}}
      in function  gen_server2:terminate/3
    ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.147.0>]
    messages: []
    links: []
    dictionary: [{guid,{{9,<0.31907.9>},0}}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 443158598
  neighbours:

=SUPERVISOR REPORT==== 11-Apr-2012::05:04:43 ===
     Supervisor: {local,rabbit_amqqueue_sup}
     Context:    shutdown_error
     Reason:     {noproc,
                     {gen_server2,call,
                         [msg_store_transient,
                          {client_terminate,
                              <<213,104,174,241,176,121,164,159,98,43,221,160,
                                120,109,6,107>>},
                          infinity]}}
     Offender:   [{pid,<0.31907.9>},
                  {name,rabbit_amqqueue},
                  {mfa,{rabbit_amqqueue_process,start_link,[]}},
                  {restart_type,temporary},
                  {shutdown,4294967295},
                  {child_type,worker}]

Any help determining the cause would be appreciated.

Mark.


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 « Return to Thread: 2.4.1 broker failure/crash