Broker failover

View: Old framed views
6 Messages — Rating Filter:   Alert me  
Niko Felger
Broker failover
Reply More
Rate this Message:
Print
Show in thread view
Show in list view (by date)
Permalink
Hi,

Are there any best practices how to achieve broker failover?

We are currently using two clustered nodes with durable queues and exchanges. The clients are configured to connect to the first node. In the event that this node dies, I would like both existing consumers as well as newly started ones to connect to the other node. Are there standard patterns or recipies to achieve this?

Thanks!
Niko

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Matthew Sackman
Re: Broker failover
Reply More
Rate this Message:
Print
Show in thread view
Show in list view (by date)
Permalink
Hi Niko,

On Wed, Aug 19, 2009 at 03:06:50PM +0100, Niko Felger wrote:
> Are there any best practices how to achieve broker failover?
>
> We are currently using two clustered nodes with durable queues and
> exchanges. The clients are configured to connect to the first node. In the
> event that this node dies, I would like both existing consumers as well as
> newly started ones to connect to the other node. Are there standard patterns
> or recipies to achieve this?

There's nothing standard just yet, but we're getting a lot of interest
in this area and are working on solutions. Just at the moment the
situation is as follows:

Due to the way mnesia works, you can't just transfer the files from one
machine to another and start the broker up. To make this work, both
machines must have the same hostname as mnesia records this in the
database. To solve this, you can just use the nodename of
rabbit@localhost. However, this prevents you doing clustering, which is
a shame.

Therefore, if HA and failover is important to you, we'd recommend the
following:

1) Put a simple TCP/IP load balancer in front of the nodes of rabbits,
but do this only for producers. The load balancer needs to be able to
dynamically cope with nodes going down, reappearing etc.
2) For consumers you really want them to all try and consume from all
the nodes at the same time. They also need to be able to silently cope
with nodes going down and reappearing. Obviously the exact details of
this vary between application.
3) Have a SAN with some shared storage which is not partitioned. All the
rabbit nodes need access to this.
4) Use Linux-HA or equiv to do monitoring of your rabbit nodes, and
start up all the brokers with the nodename of rabbit@localhost

Now, when a node fails, Linux-HA will notice, and should tell a spare
node to start up, setting the RABBITMQ_MNESIA_DIR to the location on the
SAN of the files for the failed node. It should all just start up.

Obviously, this depends on the reliability and availability of your SAN,
and the drawbacks of not having clustering available complicate at least
consumers. However, if HA and failover is more important then this may
be a tradeoff you're willing to make just at the moment.

Also, be aware that with this solution, non persistent messages can be
lost as a node goes down, and even persistent messages which are not
part of a transaction can also be lost.

Needless to say, a more comprehensive solution is on our TODO list, but
may be a little way off just at the moment.

I hope this helps,

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Matthew Sackman
Re: Broker failover
Reply More
Rate this Message:
Print
Show in thread view
Show in list view (by date)
Permalink
Niko,

On Wed, Aug 19, 2009 at 03:41:59PM +0100, Matthew Sackman wrote:
> Therefore, if HA and failover is important to you, we'd recommend the
> following:

...

One further issue with this is that it means really all the nodes need
to manually be configured the same, in terms of queues, exchanges and
bindings. As producers don't know which node they're connected to, this
really demands that:
a) Every producer can attempt configuration whenever it connects; or
b) As consumers may need to be connected to every node, they could do
   the configuration, as they're not in front of the load balancer; or
c) You have some other process that does configuration.

This is definitely one area where the clustered setup saves you effort
as all nodes implicitly get configured in the same way.

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Niko Felger
Re: Broker failover
Reply More
Rate this Message:
Print
Show in thread view
Show in list view (by date)
Permalink
Matthew,

Thanks a lot for all this info!

Is there a way to achieve some of this in a clustered setup? I guess our requirements are not so much HA of the whole messaging subsystem, but rather that an as-large-as-possible proportion of messages gets processed _eventually_. The scenario I am mainly worried about is when producers suddenly cannot publish anymore because the server has gone away and thus any messages are lost at that point.

We tried using a dumb load balancer (in front of both producers and consumers) to achieve this, but so far this has caused us more trouble than it saved, see here: http://www.nabble.com/RabbitMQ-load-balancing-failover-with-LVS-td24683230.html#a24683230

Thanks!
niko

On Wed, Aug 19, 2009 at 15:45, Matthew Sackman <matthew@...> wrote:
Niko,

On Wed, Aug 19, 2009 at 03:41:59PM +0100, Matthew Sackman wrote:
> Therefore, if HA and failover is important to you, we'd recommend the
> following:

...

One further issue with this is that it means really all the nodes need
to manually be configured the same, in terms of queues, exchanges and
bindings. As producers don't know which node they're connected to, this
really demands that:
a) Every producer can attempt configuration whenever it connects; or
b) As consumers may need to be connected to every node, they could do
  the configuration, as they're not in front of the load balancer; or
c) You have some other process that does configuration.

This is definitely one area where the clustered setup saves you effort
as all nodes implicitly get configured in the same way.

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Matthew Sackman
Re: Broker failover
Reply More
Rate this Message:
Print
Show in thread view
Show in list view (by date)
Permalink
Hi Niko,

On Thu, Aug 20, 2009 at 11:09:32AM +0100, Niko Felger wrote:

> Is there a way to achieve some of this in a clustered setup? I guess our
> requirements are not so much HA of the whole messaging subsystem, but rather
> that an as-large-as-possible proportion of messages gets processed
> _eventually_. The scenario I am mainly worried about is when producers
> suddenly cannot publish anymore because the server has gone away and thus
> any messages are lost at that point.
>
> We tried using a dumb load balancer (in front of both producers and
> consumers) to achieve this, but so far this has caused us more trouble than
> it saved, see here:
> http://www.nabble.com/RabbitMQ-load-balancing-failover-with-LVS-td24683230.html#a24683230

Ahh, interesting.

We do have some suspicions that the failover can be made to work with
clustering - provided that when the new node comes up it takes over the
IP / hostname of the failed node, it *might* just work. However, be
aware this pretty much came out of a 5 minute conversation in the office
yesterday and we've not even attempted it let alone fully tested it.
However, we think it might work! :D

LinuxHA can indeed do MAC address stealing and thus IP etc. So I would
suggest, if you have the time to spare, you start down that route.

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
jasonjwwilliams
Re: Broker failover
Reply More
Rate this Message:
Print
Show in thread view
Show in list view (by date)
Permalink
Hey Niko,

Can you set your SLB to persistent mapping based on client IP. That should keep each client on the server they are intially mapped to until that server fails. 

That being said, I still believe HA should be done in Rabbit. SLB is not the right hammer in my opinion. 

-J

Sent via iPhone

On Aug 20, 2009, at 4:09, Niko Felger <niko.felger@...> wrote:

Matthew,

Thanks a lot for all this info!

Is there a way to achieve some of this in a clustered setup? I guess our requirements are not so much HA of the whole messaging subsystem, but rather that an as-large-as-possible proportion of messages gets processed _eventually_. The scenario I am mainly worried about is when producers suddenly cannot publish anymore because the server has gone away and thus any messages are lost at that point.

We tried using a dumb load balancer (in front of both producers and consumers) to achieve this, but so far this has caused us more trouble than it saved, see here: http://www.nabble.com/RabbitMQ-load-balancing-failover-with-LVS-td24683230.html#a24683230

Thanks!
niko

On Wed, Aug 19, 2009 at 15:45, Matthew Sackman <matthew@...> wrote:
Niko,

On Wed, Aug 19, 2009 at 03:41:59PM +0100, Matthew Sackman wrote:
> Therefore, if HA and failover is important to you, we'd recommend the
> following:

...

One further issue with this is that it means really all the nodes need
to manually be configured the same, in terms of queues, exchanges and
bindings. As producers don't know which node they're connected to, this
really demands that:
a) Every producer can attempt configuration whenever it connects; or
b) As consumers may need to be connected to every node, they could do
  the configuration, as they're not in front of the load balancer; or
c) You have some other process that does configuration.

This is definitely one area where the clustered setup saves you effort
as all nodes implicitly get configured in the same way.

Matthew

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss