Socket error disconnect resource leak

View: New views
3 Messages — Rating Filter:   Alert me  

Socket error disconnect resource leak

by bwarren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We've got a scenario where our STOMP clients are on a (very bad) cellular connection and they disconnect and reconnect a lot.  StompConnect is built around the assumption that client connections are basically permanent until the service is stopped.

I've found a memory leak in the StompConnect code when a network interruption happens.  In the run() method on TcpTransport if it gets an error it calls the stop() method.  That's all fine, but the thread and socket are never nulled out and the TcpTransportServer is never notified that the transport object choked so the server object keeps it in the transport collection, and that collection never removes anything unless you call stop() on the server object.

I was able to fix the problem by adding a callback to the server object when the transport catches the exception to tell the server object to remove that transport from the collection.

Re: Socket error disconnect resource leak

by Roger Hoover :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It may be useful to post this to the Apache ActiveMQ mailing list: users@...

On Mon, May 18, 2009 at 12:30 PM, bwarren <brad.warren@...> wrote:

We've got a scenario where our STOMP clients are on a (very bad) cellular
connection and they disconnect and reconnect a lot.  StompConnect is built
around the assumption that client connections are basically permanent until
the service is stopped.

I've found a memory leak in the StompConnect code when a network
interruption happens.  In the run() method on TcpTransport if it gets an
error it calls the stop() method.  That's all fine, but the thread and
socket are never nulled out and the TcpTransportServer is never notified
that the transport object choked so the server object keeps it in the
transport collection, and that collection never removes anything unless you
call stop() on the server object.

I was able to fix the problem by adding a callback to the server object when
the transport catches the exception to tell the server object to remove that
transport from the collection.
--
View this message in context: http://www.nabble.com/Socket-error-disconnect-resource-leak-tp23603883p23603883.html
Sent from the stomp - user mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email




Re: Socket error disconnect resource leak

by wbustraan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

How do we get this incorporated into a StompConnect release? I'm experiencing an issue related to this.

In the process of testing a STOMP connector, I wrote an application that connects via STOMP, sends a batch of messages, receives those same messages, and then attempts to compare the checksum on the original message and the received message.

During testing, I noticed that about 50% of the time, when I would run the application it would miss the first 3-6 messages and thought maybe the issue was with my client. However, in the log, I was getting exceptions like this:

Failed to process message due to: java.net.SocketException: Software caused connection abort: socket write error. Message: 

java.net.SocketException: Software caused connection abort: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at org.codehaus.stomp.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:106)
	at java.io.DataOutputStream.flush(DataOutputStream.java:106)
	at org.codehaus.stomp.tcp.TcpTransport.onStompFrame(TcpTransport.java:105)
	at org.codehaus.stomp.jms.ProtocolConverter.sendToStomp(ProtocolConverter.java:467)
	at org.codehaus.stomp.jms.StompSession.sendToStomp(StompSession.java:83)
	at org.codehaus.stomp.jms.StompSubscription.onMessage(StompSubscription.java:92)
	at com.ibm.mq.jms.MQMessageConsumer.receiveAsync(MQMessageConsumer.java:3020)
	at com.ibm.mq.jms.SessionAsyncHelper.run(SessionAsyncHelper.java:412)
	at java.lang.Thread.run(Thread.java:619)

This was very confusing because they were happening during the sending phase of the app, not during the receiving phase, yet the exceptions seem to be happening while StompConnect is attempting to send a message through a socket.

What appears to be happening is that, when I run the client the first time, StompConnect registers a subscription for the queue and starts sending messages out the connected socket to the client. But, when the client exits, StompConnect doesn't close and clean up the socket. So, the next time I run the client and send more messages to the queue, StompConnect pulls the first few messages off the queue and attempts to send them out the old, disconnected socket, which fails, hence the "socket write error".

Unfortunately, those failed messages are disappearing from the queue permanently, so never get delivered to the second instance of the client. After a few messages, StompConnect figures out that the old socket is defunct and starts delivering messages to the new subscription.

bwarren wrote:

We've got a scenario where our STOMP clients are on a (very bad) cellular connection and they disconnect and reconnect a lot. StompConnect is built around the assumption that client connections are basically permanent until the service is stopped.

I've found a memory leak in the StompConnect code when a network interruption happens. In the run() method on TcpTransport if it gets an error it calls the stop() method. That's all fine, but the thread and socket are never nulled out and the TcpTransportServer is never notified that the transport object choked so the server object keeps it in the transport collection, and that collection never removes anything unless you call stop() on the server object.

I was able to fix the problem by adding a callback to the server object when the transport catches the exception to tell the server object to remove that transport from the collection.