deadlock on com.swiftmq.tools.concurrent.Semaphore?

View: New views
6 Messages — Rating Filter:   Alert me  

deadlock on com.swiftmq.tools.concurrent.Semaphore?

by Leoš Bitto :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

we are using JBoss 4.0.5, Spring Framework 2.5.3 and SwiftMQ 6.2.1. We do not use org.springframework.jms.core.JmsTemplate because of the performance issues. I know that SwiftMQ 7.3 offers com.swiftmq.jms.springsupport.SingleSharedConnectionFactory - we do not use that. When using XA transactions with large objects (many aggregated messages processed together), this happens sometimes:

WARN  [TransactionImpl] XAException: tx=TransactionImpl:XidImpl[FormatId=257, GlobalId=someserver.somedomain/580977207/153, BranchQual=, localId=153] errorCode=XA_UNKNOWN(0)
javax.transaction.xa.XAException: Request time out (70000) ms!
        at com.swiftmq.jms.v610.XAResourceImpl.end(Unknown Source)
        at org.jboss.tm.TransactionImpl$Resource.endResource(TransactionImpl.java:2143)
        at org.jboss.tm.TransactionImpl$Resource.endResource(TransactionImpl.java:2118)
        at org.jboss.tm.TransactionImpl.endResources(TransactionImpl.java:1462)
        at org.jboss.tm.TransactionImpl.beforePrepare(TransactionImpl.java:1116)
        at org.jboss.tm.TransactionImpl.commit(TransactionImpl.java:324)
        at org.jboss.tm.TxManager.commit(TxManager.java:240)
        at org.springframework.transaction.jta.UserTransactionAdapter.commit(UserTransactionAdapter.java:76)
        at org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1028)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:709)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:678)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)

This is a part of the thread dump few moments before the exception occured:

"Thread_aoEngine_2" prio=1 tid=0xa6847240 nid=0x1465 in Object.wait() [0xa5357000..0xa5357eb0]
        at java.lang.Object.wait(Native Method)
        - waiting on <0xb3d78f58> (a com.swiftmq.tools.concurrent.Semaphore)
        at java.lang.Object.wait(Object.java:474)
        at com.swiftmq.tools.util.UninterruptableWaiter.doWait(Unknown Source)
        at com.swiftmq.tools.concurrent.Semaphore.waitHere(Unknown Source)
        - locked <0xb3d78f58> (a com.swiftmq.tools.concurrent.Semaphore)
        at com.swiftmq.tools.requestreply.RequestRegistry.request(Unknown Source)
        at com.swiftmq.jms.v610.SessionImpl.requestBlockable(Unknown Source)
        at com.swiftmq.jms.v610.XASessionImpl.request(Unknown Source)
        at com.swiftmq.jms.v610.XAResourceImpl.end(Unknown Source)
        - locked <0xb24decf0> (a com.swiftmq.jms.v610.XAResourceImpl)
        at org.jboss.tm.TransactionImpl$Resource.endResource(TransactionImpl.java:2143)
        at org.jboss.tm.TransactionImpl$Resource.endResource(TransactionImpl.java:2118)
        at org.jboss.tm.TransactionImpl.endResources(TransactionImpl.java:1462)
        at org.jboss.tm.TransactionImpl.beforePrepare(TransactionImpl.java:1116)
        at org.jboss.tm.TransactionImpl.commit(TransactionImpl.java:324)
        at org.jboss.tm.TxManager.commit(TxManager.java:240)
        at org.springframework.transaction.jta.UserTransactionAdapter.commit(UserTransactionAdapter.java:76)
        at org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1028)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:709)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:678)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)

When we use smaller objects in the XA transactions, this error did not happen yet. Is there anything we could do to prevent this behaviour? I have no idea when, why and which thread is supposed to call notify on that com.swiftmq.tools.concurrent.Semaphore.

Thanks in advance for any hints.


Leoš Bitto

Re: deadlock on com.swiftmq.tools.concurrent.Semaphore?

by IIT Software :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It is not a deadlock. Rather it seems it is a buffer issue. Extend of the network buffers is what takes that long and leads to a request timeout.

Look here.

Re: deadlock on com.swiftmq.tools.concurrent.Semaphore?

by Leoš Bitto :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the sugestion. However, after I have configured the SwiftMQ router as suggested, nothing changed. Some XA transactions still cannot be commited with the same exception and thread dump. The messages which are sent to the JMS queue are about 260 kilobytes long, which is way smaller than 10 megabytes described in the document you have referred to. Additionally, that document states "SwiftMQ can handle the transfer of very large messages (10 MB and more) out of the box with its default configuration" - that is not my case. Most of the XA transactions are performed (from starting to a succesful commit) in 5 seconds or less, just some of them time out when performing the commit. All the transactions do the same kind of work. When looking at the thread dump, it seems like a deadlock - however, without the SwiftMQ source code, it is obviously difficult for me to judge it.

Re: deadlock on com.swiftmq.tools.concurrent.Semaphore?

by IIT Software :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You wrote you see this behavior with large transactions only. Messages sent within an XA transaction are internally stored at the client-side XA session (which is bound to a XAResource) and transferred to the router when XAResource.end is called which disassociates the XAResource from the current Xid. If you send many messages within a XA transaction, this XA-end-request can become quite large. If you use the default network buffer sizes, the extension of the client output and router input buffers requires much time with the default 64 KB extends. This can lead to a request timeout.

What you see in the stack trace is a request timeout on XA-end in the upper stack and a normal wait for completion of XA-end in the stack below. There is no deadlock. The Semaphore is used to get asynchronously notified when a router-side operation is completed.

However, it would be wrong if the XA-end was called on the same XAResource. I can't see that because the upper call has the timeout.

Another possibility is that the request timeout was caused by an OutOfMemory (due to the large tx) at the router side. In that case a thread dies and a reply isn't send which leads to a request timeout on the client side.

Can you check the router logs for an OOM?

Are you able to reproduce the behavior with a single XA tx?


Re: deadlock on com.swiftmq.tools.concurrent.Semaphore?

by Leos Bitto :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

FYI: the behaviour I described gone away when the application has been rewritten to use only one connection, with multiple sessions. Before it used a new connection for each session, which apparently does not work well with the JMS implementation by SwiftMQ.

Re: deadlock on com.swiftmq.tools.concurrent.Semaphore?

by IIT Software :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Leos Bitto wrote:
... which apparently does not work well with the JMS implementation by SwiftMQ.
We have no open issues here and if there would be issues, this would be fixed ASAP.

First, there were no deadlock so the title of your post is misleading. Next, a request timeout can happen under some circumstances (n/w buffer extension, OOM). Since you didn't answer my questions above, I assume you are not willing or able to go further into this issue. On the other hand, if it works with a single connection, you save n/w buffer space (avoid OOM) and save CPU time because there is only a single connection where the n/w buffers need to be extended. So you don't see this behavior anymore.