RabbitMQ memory management

View: New views
9 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Re: RabbitMQ memory management

by Edwin Fine-3 :: Rate this Message:

| View Threaded | Show Only this Message

Very good point. I did ask that question in my previous email, but I don't think it has been answered yet.

On Fri, Sep 12, 2008 at 4:16 PM, tsuraan <tsuraan@...> wrote:
> So it's a bit of a time bomb, but will not be the show-stopper I imagined,
> because if the VM does crash for any reason, I will make sure it gets
> restarted by a watchdog, so service won't be interrupted for long and nobody
> besides myself and my client will know any better or even care. I'll also
> have to make sure that my code notices that Rabbit died and reconnect when
> it comes back up (which the code should be doing anyway). The worst that
> will happen is that some in-flight messages that didn't make it to disk may
> need to be re-sent.

If RabbitMQ crashes because it's out of memory, I understand that it
should be able to start again without losing any data.  Will the next
message sent to it (before any messages are dequeued) cause the queue
to crash again?  I assume that must be the case, since nothing was
lost when the program crashed.  I don't know what to do about this,
but it may be something to try to plan around.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Dmitriy Samovskiy :: Rate this Message:

| View Threaded | Show Only this Message



Ben Hood wrote:

> No, you've understood it correctly. If you send enough non-persistent
> messages, eventually the queues will fill up, the broker will crash
> and you will lose all of the messages that you sent to that node. ATM
I assume this condition can't happen by design if messages are published with immediate
flag set to true, right?


<field name = "immediate" type = "bit">
     request immediate delivery
     <doc>
       This flag tells the server how to react if the message cannot be
       routed to a queue consumer immediately.  If this flag is set, the
       server will return an undeliverable message with a Return method.
       If this flag is zero, the server will queue the message, but with
       no guarantee that it will ever be consumed.
     </doc>
     <doc name = "rule" test = "amq_basic_16">
       The server SHOULD implement the immediate flag.
     </doc>
   </field>


Does it mean a message is returned if it can't be immediately routed to a queue or a
message is returned if it can't be immediately routed to a consumer?

- Dmitriy


_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Matthias Radestock-2 :: Rate this Message:

| View Threaded | Show Only this Message

Dmitriy,

Dmitriy Samovskiy wrote:
> Ben Hood wrote:
>
>> No, you've understood it correctly. If you send enough non-persistent
>> messages, eventually the queues will fill up, the broker will crash
>> and you will lose all of the messages that you sent to that node. ATM
>
> I assume this condition can't happen by design if messages are
> published with immediate flag set to true, right?

Correct.

> Does it mean a message is returned if it can't be immediately routed
> to a queue or a message is returned if it can't be immediately routed
> to a consumer?

The latter. The former is done by the 'mandatory' flag.

With immediate=true no queuing takes place.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Matthias Radestock-2 :: Rate this Message:

| View Threaded | Show Only this Message

Hi,

tsuraan wrote:
> If RabbitMQ crashes because it's out of memory, I understand that it
> should be able to start again without losing any data

Only *durable* queues and exchanges, and *persistent* messages are
recoverable, as per the spec.

> Will the next message sent to it (before any messages are dequeued)
> cause the queue to crash again? I assume that must be the case, since
> nothing was lost when the program crashed.

Messages aren't the only entities consuming memory. I would hope that
RabbitMQ is able to recover the state (as defined above) after an OoM
error, and take in a few more messages, but it's not something we have
tested. It's possible that the startup sequence uses just that little
bit more memory which would push the system over the edge.

Generally, if there are worries about RabbitMQ running out of memory one
should try catching that case well before it happens. It's not really
advisable to run a system so close to the limits.


Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Alexis Richardson-2 :: Rate this Message:

| View Threaded | Show Only this Message

Hi all,

I have been following this discussion on the list.  To recap:

1 - the overhead of storing a message in rabbitmq is very low meaning
that the case 'broker has filled up, I need more nodes' is quite
unusual and certainly manageable
2 - anyone who needs more queue memory can add more nodes to the broker
cluster transparently
3 - durable/persistent messages are also stored on disk for recovery purposes

If you have slower consumers than producers, then either:
a) #queues will grow in size
or:
b) at least one queue will grow in size.

Note that case (a) is solved by 2 above.  Add more nodes.  How often would you
have to add more nodes?  Due to 1, you can work this out based on your message
size.  For almost all use cases the consumers will have to lag
producers by several
days.  Think about it.  And don't forget you can add more consumers.

Let's say that you cannot add more nodes and you cannot add more consumers.
Assume also that rabbitmq is set up to persist messages to disk for recovery per
point 3 above, but that due to your application you need copies of each message
in memory.  Finally assume you have only one queue, many fast producers, and
one slow consumer.  One example would be SMS.  Even a small old server could
store millions of SMS messages without any of them being consumed.  So that
is your scenario.

[ the above is extreme but the problem use cases are variants of this ]

In cases like this we would need to do one of the following:

* provide a means to tell producers to back off, or alert an operator
* or, implement a system that created more queues and nodes for you,
if one queue became ginormous
* implement either rolling buffers, or agents, that flushed/paged
messages to pluggable stores or directly to disk
* .. or (more extremely) implement rolling buffers that deleted old
unconsumed messages completely when queue size maxed out, and notified
consumers and operators as needed
* some other more fancy flow control TBD
* OR - recommend the application deigner thinks about why their queues
are growing to very large size and if this can be prevented through
application changes

You'll appreciate that the first couple of fixes in this list are
fairly simple to implement, with potential difficulties increasing as
you go down the list ;-)

Yet despite all this, in summary, it is reasonable for users to ask
for a solution that is more 'self healing'.

So - any takers?  Comments, thoughts?

alexis


On Sat, Sep 13, 2008 at 5:00 AM, Matthias Radestock <matthias@...> wrote:

> Hi,
>
> tsuraan wrote:
>> If RabbitMQ crashes because it's out of memory, I understand that it
>> should be able to start again without losing any data
>
> Only *durable* queues and exchanges, and *persistent* messages are
> recoverable, as per the spec.
>
>> Will the next message sent to it (before any messages are dequeued)
>> cause the queue to crash again? I assume that must be the case, since
>> nothing was lost when the program crashed.
>
> Messages aren't the only entities consuming memory. I would hope that
> RabbitMQ is able to recover the state (as defined above) after an OoM
> error, and take in a few more messages, but it's not something we have
> tested. It's possible that the startup sequence uses just that little
> bit more memory which would push the system over the edge.
>
> Generally, if there are worries about RabbitMQ running out of memory one
> should try catching that case well before it happens. It's not really
> advisable to run a system so close to the limits.
>
>
> Matthias.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss@...
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Matthias Radestock-2 :: Rate this Message:

| View Threaded | Show Only this Message


Alexis Richardson wrote:

> * provide a means to tell producers to back off, or alert an operator

The easiest solution I can think of is to

1) configure Erlang's memsup
(http://www.erlang.org/doc/apps/os_mon/index.html) to trigger alarms
when memory consumption gets tight.

This can be done without any code change; in the rabbitmq-server startup
script simply change the "-os_mon start_memsup false" to "true" and
adjust the thresholds with additional options of the form "-memsup
<param> <value>"

When a threshold is reached, a message like this will appear in the
rabbit.log:

=INFO REPORT==== 13-Sep-2008::10:59:37 ===
     alarm_handler: {set,{process_memory_high_watermark,<0.31.0>}}

When the memory usage drops below the threshold again a similar message
is logged.

One can also set up SNMP monitoring, but that is more complicated.


2) get queues to drop messages when memory consumption is above the
thresholds.

This does require some coding, but not very much.

We set up an alarm handler that informs all a node's queues when a "high
memory" alarm is set/cleared. Queues record that information as part of
their state.

When a message is routed to a queue while the alarm is set, and the
queue cannot immediately route the message to an auto-ack consumer - in
other words, the message requires queueing - it discards the message. If
that happens and either the mandatory or immediate flag were set, and
the message could not be routed to any other queues / consumers, then
the message is returned to the sender with basic.return.


We can think of other actions to take instead of discarding messages,
but the above is simple and neatly exploits the existing
mandatory/immediate functionality.



Matthias.

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Ben Hood :: Rate this Message:

| View Threaded | Show Only this Message

Matthias,

On Sat, Sep 13, 2008 at 11:28 AM, Matthias Radestock
<matthias@...> wrote:

>
> Alexis Richardson wrote:
>
>> * provide a means to tell producers to back off, or alert an operator
>
> The easiest solution I can think of is to
>
> 1) configure Erlang's memsup
> (http://www.erlang.org/doc/apps/os_mon/index.html) to trigger alarms
> when memory consumption gets tight.
>
> This can be done without any code change; in the rabbitmq-server startup
> script simply change the "-os_mon start_memsup false" to "true" and
> adjust the thresholds with additional options of the form "-memsup
> <param> <value>"
>
> When a threshold is reached, a message like this will appear in the
> rabbit.log:
>
> =INFO REPORT==== 13-Sep-2008::10:59:37 ===
>     alarm_handler: {set,{process_memory_high_watermark,<0.31.0>}}
>
> When the memory usage drops below the threshold again a similar message
> is logged.
>
> One can also set up SNMP monitoring, but that is more complicated.
>
>
> 2) get queues to drop messages when memory consumption is above the
> thresholds.
>
> This does require some coding, but not very much.
>
> We set up an alarm handler that informs all a node's queues when a "high
> memory" alarm is set/cleared. Queues record that information as part of
> their state.
>
> When a message is routed to a queue while the alarm is set, and the
> queue cannot immediately route the message to an auto-ack consumer - in
> other words, the message requires queueing - it discards the message. If
> that happens and either the mandatory or immediate flag were set, and
> the message could not be routed to any other queues / consumers, then
> the message is returned to the sender with basic.return.
>
> We can think of other actions to take instead of discarding messages,
> but the above is simple and neatly exploits the existing
> mandatory/immediate functionality.

I think this is a good idea of how to use the mangement functionality
that comes with OTP. This is one of the reasons why we are using OTP
in the first place.

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Ben Hood :: Rate this Message:

| View Threaded | Show Only this Message

Alexis,

On Sat, Sep 13, 2008 at 10:07 AM, Alexis Richardson
<alexis.richardson@...> wrote:
> Note that case (a) is solved by 2 above.  Add more nodes.  How often would you
> have to add more nodes?  Due to 1, you can work this out based on your message
> size.  For almost all use cases the consumers will have to lag
> producers by several
> days.  Think about it.  And don't forget you can add more consumers.

Good point.

The main reason why I asked Edwin about his realsitic expectations
surrounding volumetrics was to find out what the breaking point was
for a simple OTS Rabbit installation to do a *very* naive reality
check.

So in the absense of better knowledge, I just thought to myself that
an SMS is roughly 160 bytes long (160 chars with an encoding that is
something less than 8 bit/char plus some routing headers) and just
created an infinite loop to publish them. A sinlge instance of Rabbit
got overfed after publishing 2.5 million of these messages (on a
simple pizzabox setup).

So under the assumption that you may also use more than one logical
queue (by way of natural application partitioning), you may be
spreading the total system load over multiple queues that reside in
memory on different nodes.

In the degenerate case that you send 1 million messages per day to a
single instance, you still have a day and a bit to find some way to
drain the queues. Presumeably, if no SMS's were getting delivered to
the downstream consumers over the course of a day, somebody would
start to care about the fact that the system wasn't actually doing
something. This person would still have a fair amount of time to find
out what is going on and drain the queues before resource consumption
becomes acute.

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Re: RabbitMQ memory management

by Edwin Fine-3 :: Rate this Message:

| View Threaded | Show Only this Message

Hi Ben,

Actually, we have a bit of a different use case. Our company (my client, as it were) provides SMS aggregation services. Essentially, we provide a unified interface for SMS, MMS, USSD and other types of messages on behalf of our customers, who are content providers. The main value add is that they only have to deal with us and our interface, and not the interfaces of multiple cellular companies. Without disclosing confidential information, I can tell you that the way it works is as follows:
  1. SMS and other kinds of messages (MMS, USSD, and others) sent by the content providers (i.e. MT messages, not from cell phones) arrive at a non-Erlang front end where they are captured into a database.
  2. The messages are forwarded to the appropriate cell phone companies.
  3. At the same time, MO messages are sent from the cell phone companies which are destined for the content providers (our customers). These MO messages go thought the Erlang portion of the system.
  4. Status notifications are also sent to the Erlang portion of the system as the messages go through various processing states (e.g. Queued, Acknowledged, Delivered, Failed etc).
  5. The front-end sends these status notifications as fast as the Erlang back-end can take them. The back end then splits the messages into multiple RabbitMQ queues based on content and provider (e.g. SMS for company X goes to one queue, while MMS for company Y will go to another). Each queue has a consumer that delivers the status messages to a content-provider's web site\URL. The status notifications are essentially the final step in the process, confirming what the content provider initially sent. It is necessary to split the initial transmission from the status messages like this because delivery could take minutes or hours in some cases, and the sender can't wait around for that long to get a response. So the request is sent in one operation to us, and the actual response is sent in a separate operation back to them.
  6. The rationale behind having queues like this is to avoid a bottleneck where, if there was just one delivery process and a URL choked up, all deliveries would be held back unnecessarily.
  7. If one of the URLs is offline, or incorrectly specified, the associated queue will build up persistent messages until the situation is rectified. This is where the scenario we discussed could come into play. Where the real problem comes in is that sometimes a client will send a batch of a few hundred thousand messages. Ideally, this batch would be queued up by Rabbit and drained as the system is able to process the load. It may even be kept for off-peak times.
  8. I was hoping that I could use Rabbit the way I used to use MQ, which is as a database-backed queue. Now that I understand I cannot, I must make other design decisions. The batches, for example, will have to be stored in files or database, and trickled into the system at the correct rate. Now here is the second kicker: without any flow control, it is not trivial to figure out what the optimum rate is. Too slow, and the batch does not compete quickly enough. Too fast, and I risk excessive loading the Rabbit node's memory.
  9. Distributing the load across multiple Rabbit nodes may solve individual node's memory issues, but will place more pressure on the overall system's memory load. This will have cost implications because additional hardware will need to be purchased, and additional complexity added to distribute traffic to, and manage and monitor, the additional nodes. Sure, we have os-mon and SNMP and all that, but it has to be set up and configured, and ultimately someone has to sit and watch that. With more nodes, it just becomes more of an administrative burden, especially if traffic-wise, a single node would do the job just fine, but because of implementation-specific behavior, it will not be good enough without incurring some risk.
  10. The bottom line is, having all the persistent data needing to be in memory is a regrettable situation for the reasons outlined above, a situation which I accept, but the consequences of which I wish you, the developers, to be fully aware; not to feel bad or be beaten up, but simply to know and to use in decision-making processes as appropriate.
Thanks for your time. I hope the above information gives you a better feel for what I am trying to achieve, and perhaps will generate some more useful thoughts about how I can do so using your very excellent product (which I am committed to using, by the way).

Best regards,
Edwin Fine

On Sun, Sep 14, 2008 at 7:05 AM, Ben Hood <0x6e6562@...> wrote:
Alexis,

On Sat, Sep 13, 2008 at 10:07 AM, Alexis Richardson
<alexis.richardson@...> wrote:
> Note that case (a) is solved by 2 above.  Add more nodes.  How often would you
> have to add more nodes?  Due to 1, you can work this out based on your message
> size.  For almost all use cases the consumers will have to lag
> producers by several
> days.  Think about it.  And don't forget you can add more consumers.

Good point.

The main reason why I asked Edwin about his realsitic expectations
surrounding volumetrics was to find out what the breaking point was
for a simple OTS Rabbit installation to do a *very* naive reality
check.

So in the absense of better knowledge, I just thought to myself that
an SMS is roughly 160 bytes long (160 chars with an encoding that is
something less than 8 bit/char plus some routing headers) and just
created an infinite loop to publish them. A sinlge instance of Rabbit
got overfed after publishing 2.5 million of these messages (on a
simple pizzabox setup).

So under the assumption that you may also use more than one logical
queue (by way of natural application partitioning), you may be
spreading the total system load over multiple queues that reside in
memory on different nodes.

In the degenerate case that you send 1 million messages per day to a
single instance, you still have a day and a bit to find some way to
drain the queues. Presumeably, if no SMS's were getting delivered to
the downstream consumers over the course of a day, somebody would
start to care about the fact that the system wasn't actually doing
something. This person would still have a fair amount of time to find
out what is going on and drain the queues before resource consumption
becomes acute.

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss@...
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
< Prev | 1 - 2 | Next >