|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Performance issues with 7.4.0 ClusterProblem:
After ~10 days of constant load (per day ~20mio messages) the producer-throughput of the HA cluster is decreased to 150-200 messages per second which is much less than the regular load. We have recognized that the cpu load of the Node2 (passive) VM is at 20% (see node2_cpu_slow.png) and the allocated memory is at 2GB (see node2_vm_slow.png). After a restart of the Standby SMQ on Node2 we see a initial load of 4-5% even if we load the queue with ~800 messages per second. In progress.xls progress.xls you can see the average throuput right before the restart and right after the restart. Setup/Background: Our regular throughput is at 400 messages per second. We have one single-threaded producer which could load a clean HA-SMQ at ~1.400 messages per second. The consumer service is a multi-threaded spring-application which uses the springsupport.jar. We use a Active-Standby file-replicated 7.4.0 HA-cluster of two similar nodes with a Quadcore XEON X5450 @3,00GHz / 4GB RAM and a 256MB Raid-Controller with two 146GB SAS-Disks. The SMQ-store is on a Ext2 filesystem. Both nodes are connected with a crossover network cabel on a 1Gbit fullduplex interface. Both VM’s can access up to 2,5GB memory. (Node1 = Active SMQ and Node2 = Standby SMQ) Thanks for your help, Michael Some JVM Screenshots: Fast processing - CPU Passive Standby ![]() Fast processing - VM Passive Standby ![]() Slow processing - CPU Passive Standby ![]() Slow processing - VM Passive Standby ![]() |
|
|
Re: Performance issues with 7.4.0 ClusterIn the VM Summary of the slow processing I only see 668 MB of used heap. Although the graph shows peaks up to 2 GB, the memory is released thereafter. So it's not a memory leak. I see also 8.096 total threads started on this VM summary (live are 44). I don't know why the started threads are stopped at all because they are pooled and should be reused.
If you want you can shoot a thread dump (kill -3) if the standby consumes again memory and cpu to see which threads are working so hard. If we don't find out what the problem is, I suggest you submit a Gold incident here so that we can dive into it. |
|
|
Re: Performance issues with 7.4.0 ClusterToday we had the issue again:
- Standby HA Node shows 1.3 GB in old gen and we got an OOM: INFO | jvm 1 | 2009/07/24 13:07:17 | java.lang.OutOfMemoryError: Java heap space INFO | jvm 1 | 2009/07/24 13:07:17 | Got OutOfMemoryError: INFO | jvm 1 | 2009/07/24 13:07:17 | ThreadGroup: hacontroller.stagecontroller INFO | jvm 1 | 2009/07/24 13:07:17 | ActiveTask : PipelineQueue, dispatchToken=sys$hacontroller.stagecontroller INFO | jvm 1 | 2009/07/24 13:07:17 | Stack Trace: INFO | jvm 1 | 2009/07/24 13:07:17 | java.lang.OutOfMemoryError: Java heap space INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.tools.collection.IntRingBuffer.add(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard.cache.StableStore.a(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard.cache.StableStore.free(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard_ha.v600.StandbyVisitor.visit(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard_ha.v600.protocol.PageDBFreeRequest.accept(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard_ha.v600.StandbyVisitor.visit(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard_ha.v600.protocol.TransactionEndRequest.accept(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.store.standard_ha.v600.SinkProxyImpl.newReplicationItem(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.hacontroller.standard.v600.stage.StandbyStage.visit(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.hacontroller.standard.v600.smqpha.UpdateDeliveryRequest.accept(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.hacontroller.standard.v600.stage.StandbyStage.process(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.hacontroller.standard.stage.StageController.visit(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.hacontroller.standard.stage.po.PORequestReceived.accept(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.tools.pipeline.PipelineQueue.process(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.tools.queue.SingleProcessorQueue.dequeue(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.tools.pipeline.PipelineQueue$QueueProcessor.run(Unknown Source) INFO | jvm 1 | 2009/07/24 13:07:17 | at com.swiftmq.impl.threadpool.standard.PoolThread.run(Unknown Source) At same time producers can't longer deliver any Msgs to active node also, which is a quite unexpected behavior for a cluster/HA solution at all. Restarting Standby having the OOM solves issue on Active node also. Today I analysed an older HeapDump which was generated on same host/cluster with same issue and seems there is a bug, please see attachment. SmwiftMQ_74_HA_HeapDump_Standby.pdf If needed / helpful I can send (138 MB) by ftp the compressed HeapDump. One more srceenshot from different HeapAnalyser: ![]() Thanks, Michael |
|
|
Re: Performance issues with 7.4.0 ClusterIt seems there is a leak in the free page list of the page.db at the STANDBY. We will try to reproduce it to figure out whether it is a bug or a misconfiguration.
|
|
|
Re: Performance issues with 7.4.0 ClusterWe've stress tested SwiftMQ HA 7.4.0 but were not be able to reproduce it so far.
Do you use the "shrink" job on your ACTIVE instance? If yes, on which schedule do you use it? |
|
|
Re: Performance issues with 7.4.0 ClusterThe config is quite default, we just configured our queues. There is no Job scheduled at all, please see screenshot. Please note as mentioned earlier it takes a number of days (8-20) until the issue happens.
![]() Also I would like to note that it doen't matter which of both nodes/servers is active/standby, it happend in both cases. I can send you routerconfig.xml and/or HeapDump if you need for further analyses. Best regards, Michael |
|
|
Re: Performance issues with 7.4.0 ClusterJust to complete this thread:
Fixed in 7.5.3, released today: http://www.swiftmq.com/products/releasenotes/v753/index.html Thanks, Michael |
| Free embeddable forum powered by Nabble | Forum Help |