« Return to Thread: swiftmq HA hangs after switching to standby instance

swiftmq HA hangs after switching to standby instance

by Yu L :: Rate this Message:

Reply to Author | View in Thread

Hi,

This is Swiftmq 7.4.0 HA running on two solaris 9 boxes(ip addresses 212/213). One is primary, the other is backup. First, I started the primary, then started the backup swiftmq. The first became "Active/Active", and the second was "Standby/Standby". Primary was running on 212 box, as well as all jms clients. Backup swiftmq was running on 213 box without any jms clients running on it.

Then I shut down primary swiftmq by "stop.sh" on 212, I saw backup changed its state to "standalone/standalone" on 213. My jms clients on 212 was still working because its provider url was set up to point to both instances. This was from log message:

 "2009-04-08 11:26:01,156 DEBUG [main] (MDriverSwiftMQ.java:66) - providerUrl = smqp://admin:resolve@192.168.1.212:4004/host2=192.168.1.213;port2=4004;type=com.sw
iftmq.net.JSSESocketFactory;reconnect=true;retrydelay=5000;maxretries=720;keepalive=5000;timeout=5000;"

Then I restarted primary swiftmq on 212, I saw all tcp connections were switched back to 212's port 4004, which is primary swiftmq. My jms clients on 212 were still responding and working. So far so good.

Then I repeated this process a couple of times: kill primary, wait for backup swiftmq to pick up, test my jms clients, restart primary...

Repeated about 3 or 4 times, sometimes more, sometime less, when primary swiftmq was down and backup was in "standalone/standalone" mode, all my jms clients would hang and stop to respond. Even if I restarted my jms client application, it would still hang and stop to responde. It seems it hang when jms client was doing JNDI lookup with swifmq. Here is the last log message from my jms client when tried to restart it:

2009-04-08 19:48:23,842 DEBUG [main] (MServer.java:45) - JMS platform: SWIFTMQ
2009-04-08 19:48:23,844  INFO [main] (MServer.java:86) - Initializing JMS service
2009-04-08 19:48:23,854 DEBUG [main] (MDriverSwiftMQ.java:66) - providerUrl = smqp://admin:resolve@localhost:4004/host2=192.168.1.213;port2=4004;type=com.swiftmq.net.JSSESocketFactory;reconnect=true;retrydelay=5000;maxretries=720;keepalive=5000;timeout=5000;

So basically, when Backup swiftmq instance was running in standalone mode after primary was dead, JNDI lookup as well as other jms calls at client side sometimes simply hang. But after I restarted backup swiftmq instance again without starting primary, all my jms client would recover and be able to proceed and connect to backup instance running in standalone mode.

Attached are primary and backup swiftmq instances log files, as well as routerconfig.xml files.

Any suggestion about this problem? thanks very much.

Yuinfo.logrouterconfig.xmlerror.logwarning.loginfo.logrouterconfig.xmlerror.logwarning.log




 « Return to Thread: swiftmq HA hangs after switching to standby instance