We are running rabbitmq 2.4.1 in production and recently had a failure that we can not determine the root cause. Also we tried a restart of the broker and the restart hung, never returned. We rebooted the machine to restore the broker.
We have only the rabbitmq and sasl logs at this point, but the error messages don't mean much to us.
rabbitmq log snippet:
=INFO REPORT==== 11-Apr-2012::05:04:08 ===
starting TCP connection <0.28490.65> from 172.17.208.67:1522
=INFO REPORT==== 11-Apr-2012::05:04:08 ===
closing TCP connection <0.9195.65> from 10.70.20.75:62045
=INFO REPORT==== 11-Apr-2012::05:04:31 ===
closing TCP connection <0.10243.65> from 10.70.40.77:53173
=ERROR REPORT==== 11-Apr-2012::05:04:31 ===
** Generic server msg_store_transient terminating
** Last message in was {'$gen_cast',
{client_dying,
<<74,18,61,37,8,55,8,91,210,27,70,185,112,89,
171,154>>}}
** When Server state == {msstate,
"/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1/msg_store_transient",
rabbit_msg_store_ets_index,
{state,417861,
"/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1/msg_store_transient"},
0,#Ref<0.0.0.875>,
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
[],undefined,0,12073198,[],<0.233.0>,421958,413764,
426055,
{set,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
...skipping...
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[]}}}}
** Reason for termination ==
** {{badmatch,false},
[{rabbit_msg_store_ets_index,insert,2},
{rabbit_msg_store,write_message,3},
{rabbit_msg_store,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}
...skipping...
=INFO REPORT==== 11-Apr-2012::05:04:43 ===
closing TCP connection <0.5032.4496> from 172.16.216.217:60234
=INFO REPORT==== 11-Apr-2012::05:04:43 ===
closing TCP connection <0.8419.6115> from 10.65.10.72:54580
=ERROR REPORT==== 11-Apr-2012::05:04:43 ===
** Generic server <0.31907.9> terminating
** Last message in was {'EXIT',<0.241.0>,shutdown}
** When Server state == {q,
{amqqueue,
{resource,<<"/alarming">>,queue,<<"alarming.9">>},
false,false,none,[],<0.31907.9>},
none,true,rabbit_variable_queue,
{vqstate,
{[],[]},
{0,{[],[]}},
{delta,undefined,0,undefined},
...skipping...
{state,fine,undefined},
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
undefined,undefined}
** Reason for termination ==
** {noproc,
{gen_server2,call,
[msg_store_transient,
{client_terminate,
<<17,102,9,148,6,184,165,141,162,246,194,57,36,62,208,135>>},
infinity]}}
** In 'terminate' callback with reason ==
** shutdown
=ERROR REPORT==== 11-Apr-2012::05:04:43 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.146.0>,
{<0.9700.6>,
"** Generic server ~p terminating~n** Last message in was ~p~n** When Server state == ~p~n** Reason for termination == ~n** ~p~n** In 'terminate' callback with reason ==~n** ~p~n",
[<0.9700.6>,
{'EXIT',<0.241.0>,shutdown},
{q,
{amqqueue,
{resource,<<"/rssm">>,queue,
<<"cse.rssm.logManager.sqlserver">>},
false,false,none,[],<0.9700.6>},
none,true,rabbit_variable_queue,
{vqstate,
{[],[]},
{0,{[],[]}},
{delta,undefined,0,undefined},
{0,{[],[]}},
...skipping...
{noproc,
{gen_server2,call,
[msg_store_transient,
{client_terminate,
<<143,174,238,76,144,209,125,211,110,123,56,1,237,
217,136,2>>},
infinity]}},
shutdown]}}
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {badarg,[{ets,lookup,[rabbit_registry,{exchange,topic}]},
{rabbit_registry,lookup_module,2},
{rabbit_exchange,type_to_module,1},
{rabbit_exchange,route,2},
{rabbit_exchange,publish,2},
{rabbit_basic,publish,1},
{rabbit_error_logger,publish1,4},
{rabbit_error_logger,handle_event,2}]}
=INFO REPORT==== 11-Apr-2012::05:04:43 ===
application: rabbit
exited: shutdown
type: permanent
sasl log snippet:
=SUPERVISOR REPORT==== 11-Apr-2012::00:15:30 ===
Supervisor: {<0.5419.34>,rabbit_channel_sup_sup}
Context: shutdown_error
Reason: shutdown
Offender: [{pid,<0.5731.34>},
{name,channel_sup},
{mfa,{rabbit_channel_sup,start_link,[]}},
{restart_type,temporary},
{shutdown,infinity},
{child_type,supervisor}]
=CRASH REPORT==== 11-Apr-2012::05:04:32 ===
crasher:
initial call: gen:init_it/7
pid: <0.232.0>
registered_name: msg_store_transient
exception exit: {{badmatch,false},
[{rabbit_msg_store_ets_index,insert,2},
{rabbit_msg_store,write_message,3},
{rabbit_msg_store,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}
in function gen_server2:terminate/3
ancestors: [rabbit_sup,<0.147.0>]
messages: [{'EXIT',<0.233.0>,normal}]
links: [<0.148.0>]
dictionary: [{fhc_age_tree,{0,nil}}]
trap_exit: true
status: running
heap_size: 10946
stack_size: 24
reductions: 98380626
neighbours:
=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===
Supervisor: {local,rabbit_sup}
Context: child_terminated
Reason: {{badmatch,false},
[{rabbit_msg_store_ets_index,insert,2},
{rabbit_msg_store,write_message,3},
{rabbit_msg_store,handle_cast,2},
{gen_server2,handle_msg,2},
{proc_lib,wake_up,3}]}
Offender: [{pid,<0.232.0>},
{name,msg_store_transient},
{mfargs,
{rabbit_msg_store,start_link,
[msg_store_transient,
"/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1",
undefined,
{#Fun<rabbit_variable_queue.0.66952436>,ok}]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=SUPERVISOR REPORT==== 11-Apr-2012::05:04:32 ===
Supervisor: {local,rabbit_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.232.0>},
{name,msg_store_transient},
{mfargs,
{rabbit_msg_store,start_link,
[msg_store_transient,
"/var/lib/rabbitmq/mnesia/rabbit@che-csebrokerp1",
undefined,
{#Fun<rabbit_variable_queue.0.66952436>,ok}]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
...skipping...
=CRASH REPORT==== 11-Apr-2012::05:04:43 ===
crasher:
initial call: gen:init_it/6
pid: <0.31907.9>
registered_name: []
exception exit: {noproc,
{gen_server2,call,
[msg_store_transient,
{client_terminate,
<<213,104,174,241,176,121,164,159,98,43,221,
160,120,109,6,107>>},
infinity]}}
in function gen_server2:terminate/3
ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.147.0>]
messages: []
links: []
dictionary: [{guid,{{9,<0.31907.9>},0}}]
trap_exit: true
status: running
heap_size: 987
stack_size: 24
reductions: 443158598
neighbours:
=SUPERVISOR REPORT==== 11-Apr-2012::05:04:43 ===
Supervisor: {local,rabbit_amqqueue_sup}
Context: shutdown_error
Reason: {noproc,
{gen_server2,call,
[msg_store_transient,
{client_terminate,
<<213,104,174,241,176,121,164,159,98,43,221,160,
120,109,6,107>>},
infinity]}}
Offender: [{pid,<0.31907.9>},
{name,rabbit_amqqueue},
{mfa,{rabbit_amqqueue_process,start_link,[]}},
{restart_type,temporary},
{shutdown,4294967295},
{child_type,worker}]
Any help determining the cause would be appreciated.
Mark.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss