WARNING: This server is unstable and will be retired in the next days. If you want to keep this forum available, please request immediately a migration on the Nabble Support forum. Forums that don't receive any migration request will be deleted forever.

 « Return to Thread: rabbitmqctl stall/hang when leaving a cluster

Re: rabbitmqctl stall/hang when leaving a cluster

by Matt Pietrek :: Rate this Message:

| View in Thread

Some other work came up so I needed to drop this thread for a few weeks. However, coming back to it, I can easily reproduce this issue within one or two tries.

In a nutshell, in a clustered environment, simply stop one node, wait a few seconds, then restart it. The last output seen is:

starting database                      
                               ...

I've let it wait for much longer than 30 seconds and it has never come back.

Any chance this may have been stamped out in RabbitMQ 2.8?



On Fri, Feb 24, 2012 at 1:43 PM, Matt Pietrek <mpietrek@...> wrote:
| So how long are you waiting when determining it's hanging? Less than 30 seconds?

Just to be double sure, I let it sit for an hour yesterday. I would have expected a timeout, but it never came.

It's a pretty easy scenario to script and try out. I'd send you my code, but it relies on other internal commands.

There may also be a timing issue. If I put a 10 second delay after restarting one broker, and before stopping the next, it seems to help.

That is:

for x in broker_list:
    stop x
    start x
    sleep(10)

Matt


On Fri, Feb 24, 2012 at 4:22 AM, Simon MacMullen <simon@...> wrote:
On 23/02/12 21:00, Matt Pietrek wrote:
The nohup.out on the failing node ends with:

<snip>

starting database                                                     ...

So how long are you waiting when determining it's hanging? Less than 30 seconds?

Because that looks like Rabbit is waiting for another cluster node (if it was not the last to shut down, but is the first to start up, it will wait for the one that was the last to shut down. But it will only wait for 30 seconds before spitting out an error. I'm not sure how else you could get it to stop there *without* any further output though.


Cheers, Simon

--
Simon MacMullen
RabbitMQ, VMware



_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

 « Return to Thread: rabbitmqctl stall/hang when leaving a cluster