CMS collection keep working during holiday

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

01/10 is our holiday and my stock quote application was in leisure time on 01/10.

I am surprised that during non-busy hours, my application keep draw around 15-20% cpu loading due to continueously CMS collections.

It seems that -XX:CMSInitiatingOccupancyFraction=70 will not take effect during non-busy hours. Please find the attached memory chart from jconsole and gc log.

jvm options:
%JAVA_HOME%\jre\bin\java -server -verbose:gc -XX:+PrintGCDetails -Xloggc:powerserver_gc.log -showversion -Djava.security.policy=.\java.policy -Xms4G -Xmx4G -XX:NewSize=1G -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90 -Dcom.sun.management.jmxremote.port=18002 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

Any idea? Please advise.






62446.219: [GC 62446.219: [ParNew: 873856K->174720K(873856K), 0.2927771 secs] 2109161K->1491505K(4019584K), 0.2928679 secs] [Times: user=1.64 sys=0.02, real=0.30 secs]
63537.932: [GC 63537.932: [ParNew: 873856K->174720K(873856K), 0.2936245 secs] 2190641K->1573411K(4019584K), 0.2937111 secs] [Times: user=1.66 sys=0.03, real=0.31 secs]
64576.245: [GC 64576.245: [ParNew: 873856K->174720K(873856K), 0.2805543 secs] 2272547K->1652492K(4019584K), 0.2806343 secs] [Times: user=1.52 sys=0.05, real=0.28 secs]
65665.444: [GC 65665.444: [ParNew: 873856K->174720K(873856K), 0.3126828 secs] 2351628K->1733865K(4019584K), 0.3127696 secs] [Times: user=1.61 sys=0.08, real=0.31 secs]
66761.009: [GC 66761.009: [ParNew: 873856K->174720K(873856K), 0.3386468 secs] 2433001K->1815987K(4019584K), 0.3387269 secs] [Times: user=1.64 sys=0.17, real=0.36 secs]
67839.979: [GC 67839.979: [ParNew: 873856K->174720K(873856K), 0.3359659 secs] 2515123K->1897775K(4019584K), 0.3360704 secs] [Times: user=1.72 sys=0.09, real=0.34 secs]
68936.295: [GC 68936.295: [ParNew: 873856K->174720K(873856K), 0.3439460 secs] 2596911K->1979557K(4019584K), 0.3440326 secs] [Times: user=1.58 sys=0.19, real=0.34 secs]
70019.873: [GC 70019.873: [ParNew: 873856K->174720K(873856K), 0.3758143 secs] 2678693K->2065201K(4019584K), 0.3759026 secs] [Times: user=1.81 sys=0.13, real=0.38 secs]
71123.565: [GC 71123.566: [ParNew: 873856K->174719K(873856K), 0.3515356 secs] 2764337K->2147170K(4019584K), 0.3516129 secs] [Times: user=1.47 sys=0.11, real=0.34 secs]
72227.180: [GC 72227.180: [ParNew: 873855K->174720K(873856K), 0.3775019 secs] 2846306K->2230692K(4019584K), 0.3775869 secs] [Times: user=1.64 sys=0.19, real=0.38 secs]
73334.657: [GC 73334.657: [ParNew: 873856K->174719K(873856K), 0.3546221 secs] 2929828K->2313182K(4019584K), 0.3547042 secs] [Times: user=1.61 sys=0.27, real=0.36 secs]
74434.184: [GC 74434.184: [ParNew: 873855K->174720K(873856K), 0.3884261 secs] 3012318K->2395247K(4019584K), 0.3885106 secs] [Times: user=1.63 sys=0.20, real=0.39 secs]
74434.573: [GC [1 CMS-initial-mark: 2220527K(3145728K)] 2396315K(4019584K), 0.2436187 secs] [Times: user=0.25 sys=0.00, real=0.23 secs]
74434.817: [CMS-concurrent-mark-start]
74437.338: [CMS-concurrent-mark: 2.521/2.521 secs] [Times: user=5.06 sys=0.16, real=2.53 secs]
74437.338: [CMS-concurrent-preclean-start]
74437.369: [CMS-concurrent-preclean: 0.031/0.031 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
74437.374: [GC[YG occupancy: 176540 K (873856 K)]74437.374: [Rescan (parallel) , 0.1324063 secs]74437.507: [weak refs processing, 0.1494715 secs] [1 CMS-remark: 2220527K(3145728K)] 2397068K(4019584K), 0.3048030 secs] [Times: user=0.42 sys=0.00, real=0.31 secs]
74437.694: [CMS-concurrent-sweep-start]
74438.946: [CMS-concurrent-sweep: 1.248/1.252 secs] [Times: user=2.19 sys=0.05, real=1.25 secs]
74438.946: [CMS-concurrent-reset-start]
74438.961: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
75525.633: [GC 75525.633: [ParNew: 873856K->174719K(873856K), 0.3452037 secs] 2940674K->2323239K(4019584K), 0.3453060 secs] [Times: user=1.73 sys=0.11, real=0.34 secs]
76628.997: [GC 76628.997: [ParNew: 873855K->174720K(873856K), 0.3680628 secs] 3022375K->2405373K(4019584K), 0.3681462 secs] [Times: user=1.88 sys=0.13, real=0.38 secs]
76629.366: [GC [1 CMS-initial-mark: 2230653K(3145728K)] 2407331K(4019584K), 0.2614019 secs] [Times: user=0.27 sys=0.00, real=0.27 secs]
76629.627: [CMS-concurrent-mark-start]
76632.373: [CMS-concurrent-mark: 2.746/2.746 secs] [Times: user=5.22 sys=0.34, real=2.75 secs]
76632.373: [CMS-concurrent-preclean-start]
76632.383: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
76632.391: [GC[YG occupancy: 176683 K (873856 K)]76632.391: [Rescan (parallel) , 0.1555281 secs]76632.546: [weak refs processing, 0.0003456 secs] [1 CMS-remark: 2230653K(3145728K)] 2407337K(4019584K), 0.1559517 secs] [Times: user=0.36 sys=0.00, real=0.16 secs]
76632.562: [CMS-concurrent-sweep-start]
76633.784: [CMS-concurrent-sweep: 1.222/1.222 secs] [Times: user=1.31 sys=0.00, real=1.22 secs]
76633.784: [CMS-concurrent-reset-start]
76633.799: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77722.107: [GC 77722.107: [ParNew: 873856K->174719K(873856K), 0.3346014 secs] 3005732K->2390264K(4019584K), 0.3346905 secs] [Times: user=1.61 sys=0.13, real=0.33 secs]
77722.442: [GC [1 CMS-initial-mark: 2215544K(3145728K)] 2391202K(4019584K), 0.2670180 secs] [Times: user=0.28 sys=0.00, real=0.28 secs]
77722.710: [CMS-concurrent-mark-start]
77725.468: [CMS-concurrent-mark: 2.758/2.758 secs] [Times: user=5.66 sys=0.11, real=2.77 secs]
77725.468: [CMS-concurrent-preclean-start]
77725.478: [CMS-concurrent-preclean: 0.010/0.011 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
77725.478: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77730.692: [CMS-concurrent-abortable-preclean: 0.340/5.213 secs] [Times: user=3.24 sys=0.00, real=5.22 secs]
77730.699: [GC[YG occupancy: 181794 K (873856 K)]77730.700: [Rescan (parallel) , 0.1554305 secs]77730.855: [weak refs processing, 0.0002146 secs] [1 CMS-remark: 2215544K(3145728K)] 2397339K(4019584K), 0.1557162 secs] [Times: user=0.27 sys=0.00, real=0.16 secs]
77730.870: [CMS-concurrent-sweep-start]
77732.117: [CMS-concurrent-sweep: 1.247/1.247 secs] [Times: user=1.26 sys=0.00, real=1.24 secs]
77732.117: [CMS-concurrent-reset-start]
77732.132: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
77734.136: [GC [1 CMS-initial-mark: 2208798K(3145728K)] 2392348K(4019584K), 0.2699426 secs] [Times: user=0.26 sys=0.00, real=0.27 secs]
77734.406: [CMS-concurrent-mark-start]
77737.261: [CMS-concurrent-mark: 2.851/2.856 secs] [Times: user=5.27 sys=0.34, real=2.86 secs]
77737.261: [CMS-concurrent-preclean-start]
77737.272: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
77737.272: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77742.481: [CMS-concurrent-abortable-preclean: 0.339/5.209 secs] [Times: user=3.30 sys=0.00, real=5.22 secs]
77742.487: [GC[YG occupancy: 191586 K (873856 K)]77742.487: [Rescan (parallel) , 0.1631453 secs]77742.651: [weak refs processing, 0.0001524 secs] [1 CMS-remark: 2208798K(3145728K)] 2400385K(4019584K), 0.1633895 secs] [Times: user=0.28 sys=0.00, real=0.17 secs]
77742.666: [CMS-concurrent-sweep-start]
77743.864: [CMS-concurrent-sweep: 1.198/1.198 secs] [Times: user=1.22 sys=0.00, real=1.20 secs]
77743.864: [CMS-concurrent-reset-start]
77743.879: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77745.883: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2398172K(4019584K), 0.2750780 secs] [Times: user=0.25 sys=0.00, real=0.27 secs]
77746.159: [CMS-concurrent-mark-start]
77748.880: [CMS-concurrent-mark: 2.721/2.721 secs] [Times: user=5.36 sys=0.09, real=2.73 secs]
77748.880: [CMS-concurrent-preclean-start]
77748.890: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
77748.890: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77754.112: [CMS-concurrent-abortable-preclean: 0.467/5.222 secs] [Times: user=3.33 sys=0.00, real=5.23 secs]
77754.118: [GC[YG occupancy: 197250 K (873856 K)]77754.118: [Rescan (parallel) , 0.1562518 secs]77754.274: [weak refs processing, 0.0001109 secs] [1 CMS-remark: 2203641K(3145728K)] 2400891K(4019584K), 0.1564752 secs] [Times: user=0.36 sys=0.00, real=0.17 secs]
77754.290: [CMS-concurrent-sweep-start]
77755.425: [CMS-concurrent-sweep: 1.135/1.135 secs] [Times: user=1.14 sys=0.00, real=1.14 secs]
77755.425: [CMS-concurrent-reset-start]
77755.441: [CMS-concurrent-reset: 0.016/0.016 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77757.445: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2405052K(4019584K), 0.2801865 secs] [Times: user=0.26 sys=0.00, real=0.28 secs]
77757.725: [CMS-concurrent-mark-start]
77760.493: [CMS-concurrent-mark: 2.768/2.768 secs] [Times: user=5.52 sys=0.11, real=2.80 secs]
77760.493: [CMS-concurrent-preclean-start]
77760.504: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
77760.504: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77765.719: [CMS-concurrent-abortable-preclean: 0.339/5.216 secs] [Times: user=3.23 sys=0.03, real=5.22 secs]
77765.725: [GC[YG occupancy: 203843 K (873856 K)]77765.725: [Rescan (parallel) , 0.1573253 secs]77765.883: [weak refs processing, 0.0000936 secs] [1 CMS-remark: 2203641K(3145728K)] 2407484K(4019584K), 0.1575000 secs] [Times: user=0.38 sys=0.00, real=0.17 secs]
77765.898: [CMS-concurrent-sweep-start]
77767.132: [CMS-concurrent-sweep: 1.234/1.234 secs] [Times: user=1.25 sys=0.00, real=1.23 secs]
77767.132: [CMS-concurrent-reset-start]
77767.147: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77769.151: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2409821K(4019584K), 0.2889787 secs] [Times: user=0.26 sys=0.00, real=0.28 secs]
77769.440: [CMS-concurrent-mark-start]
77772.168: [CMS-concurrent-mark: 2.728/2.728 secs] [Times: user=5.36 sys=0.13, real=2.75 secs]
77772.168: [CMS-concurrent-preclean-start]
77772.178: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
77772.178: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77777.294: [CMS-concurrent-abortable-preclean: 0.344/5.116 secs] [Times: user=3.39 sys=0.00, real=5.13 secs]
77777.302: [GC[YG occupancy: 218737 K (873856 K)]77777.302: [Rescan (parallel) , 0.1573837 secs]77777.460: [weak refs processing, 0.0000945 secs] [1 CMS-remark: 2203641K(3145728K)] 2422379K(4019584K), 0.1575605 secs] [Times: user=0.39 sys=0.00, real=0.17 secs]
77777.475: [CMS-concurrent-sweep-start]
77778.609: [CMS-concurrent-sweep: 1.134/1.134 secs] [Times: user=1.14 sys=0.00, real=1.14 secs]
77778.609: [CMS-concurrent-reset-start]
77778.625: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77780.628: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2424717K(4019584K), 0.2928494 secs] [Times: user=0.28 sys=0.00, real=0.30 secs]
77780.921: [CMS-concurrent-mark-start]
77783.638: [CMS-concurrent-mark: 2.717/2.717 secs] [Times: user=5.33 sys=0.13, real=2.73 secs]
77783.638: [CMS-concurrent-preclean-start]
77783.654: [CMS-concurrent-preclean: 0.016/0.016 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
77783.654: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77788.832: [CMS-concurrent-abortable-preclean: 0.386/5.178 secs] [Times: user=3.36 sys=0.00, real=5.19 secs]
77788.840: [GC[YG occupancy: 226606 K (873856 K)]77788.840: [Rescan (parallel) , 0.1651500 secs]77789.005: [weak refs processing, 0.0000958 secs] [1 CMS-remark: 2203641K(3145728K)] 2430247K(4019584K), 0.1653619 secs] [Times: user=0.30 sys=0.00, real=0.17 secs]
77789.021: [CMS-concurrent-sweep-start]
77790.051: [CMS-concurrent-sweep: 1.031/1.031 secs] [Times: user=1.05 sys=0.00, real=1.03 secs]
77790.051: [CMS-concurrent-reset-start]
77790.067: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
77792.070: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2430289K(4019584K), 0.2979185 secs] [Times: user=0.28 sys=0.00, real=0.30 secs]
77792.368: [CMS-concurrent-mark-start]
77795.085: [CMS-concurrent-mark: 2.717/2.717 secs] [Times: user=5.34 sys=0.20, real=2.72 secs]
77795.085: [CMS-concurrent-preclean-start]
77795.095: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77795.095: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77800.269: [CMS-concurrent-abortable-preclean: 0.484/5.174 secs] [Times: user=3.28 sys=0.00, real=5.17 secs]
77800.275: [GC[YG occupancy: 233897 K (873856 K)]77800.275: [Rescan (parallel) , 0.1660870 secs]77800.442: [weak refs processing, 0.0001040 secs] [1 CMS-remark: 2203641K(3145728K)] 2437538K(4019584K), 0.1662851 secs] [Times: user=0.31 sys=0.00, real=0.17 secs]
77800.457: [CMS-concurrent-sweep-start]
77801.459: [CMS-concurrent-sweep: 1.002/1.002 secs] [Times: user=1.06 sys=0.00, real=1.00 secs]
77801.459: [CMS-concurrent-reset-start]
77801.474: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77803.477: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2438755K(4019584K), 0.3026378 secs] [Times: user=0.30 sys=0.00, real=0.30 secs]
77803.780: [CMS-concurrent-mark-start]
77806.515: [CMS-concurrent-mark: 2.735/2.735 secs] [Times: user=5.41 sys=0.11, real=2.73 secs]
77806.515: [CMS-concurrent-preclean-start]
77806.525: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
77806.525: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77811.723: [CMS-concurrent-abortable-preclean: 0.374/5.197 secs] [Times: user=3.33 sys=0.00, real=5.22 secs]
77811.726: [GC[YG occupancy: 237551 K (873856 K)]77811.726: [Rescan (parallel) , 0.1518526 secs]77811.878: [weak refs processing, 0.0000969 secs] [1 CMS-remark: 2203641K(3145728K)] 2441193K(4019584K), 0.1520584 secs] [Times: user=0.38 sys=0.00, real=0.16 secs]
77811.893: [CMS-concurrent-sweep-start]
77812.890: [CMS-concurrent-sweep: 0.997/0.997 secs] [Times: user=1.08 sys=0.00, real=1.00 secs]
77812.890: [CMS-concurrent-reset-start]
77812.905: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
77814.909: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2445359K(4019584K), 0.3071683 secs] [Times: user=0.28 sys=0.00, real=0.30 secs]
77815.216: [CMS-concurrent-mark-start]
77817.955: [CMS-concurrent-mark: 2.739/2.739 secs] [Times: user=5.33 sys=0.13, real=2.75 secs]
77817.955: [CMS-concurrent-preclean-start]
77817.965: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
77817.965: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 77823.178: [CMS-concurrent-abortable-preclean: 0.482/5.213 secs] [Times: user=3.27 sys=0.00, real=5.20 secs]
77823.184: [GC[YG occupancy: 243681 K (873856 K)]77823.184: [Rescan (parallel) , 0.1612718 secs]77823.345: [weak refs processing, 0.0001191 secs] [1 CMS-remark: 2203641K(3145728K)] 2447322K(4019584K), 0.1615056 secs] [Times: user=0.31 sys=0.00, real=0.17 secs]
77823.361: [CMS-concurrent-sweep-start]
77824.356: [CMS-concurrent-sweep: 0.995/0.995 secs] [Times: user=1.08 sys=0.00, real=1.00 secs]
77824.356: [CMS-concurrent-reset-start]
77824.371: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
77825.194: [GC [1 CMS-initial-mark: 2203641K(3145728K)] 2449146K(4019584K), 0.3122333 secs] [Times: user=0.30 sys=0.00, real=0.31 secs]
77825.506: [CMS-concurrent-mark-start]
77828.241: [CMS-concurrent-mark: 2.735/2.735 secs] [Times: user=5.31 sys=0.17, real=2.73 secs]
77828.241: [CMS-concurrent-preclean-start]
77828.252: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
77828.252: [CMS-concurrent-abortable-preclean-start]

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Anyone help? Someone has exactly same case to me and he posted the case at Jon Masamitsu's blog:

http://blogs.sun.com/jonthecollector/entry/jdk6_gc_release_notes

quote:
Trying the new CMS in 1.6 and it seems to work nicely for a while but then it starts kicking out these CMS aborts. I can find nothing anywhere talking about what is happening. This happens over and over until a full GC happens. From what I can tell the CMS collector never completes and is therefore not doing anything except taking CPU time. CMS: abort preclean due to time 407.606: [CMS-concurrent-abortable-preclean: 0.149/5.058 secs]

The CMS will be triggered by -XX:CMSInitiatingOccupancyFraction=70 but it seems that it has no effect and continuously CMS until a Full GC (caused by 'concurrent mode failure'). In my case, it always happen in non-busy hours but the CMS behave normally in busy / peak hours.

I think it is not related to 'cms abort preclean' but it's really taking CPU time but do nothing (around 4 hours) and it has chance to have a big GC when the old-gen is full.

Best Regards,
Ken







Re: CMS collection keep working during holiday

by Jon Masamitsu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 10/05/08 10:38, Ken--@... wrote:

> Anyone help? Someone has exactly same case to me and he posted the case at
> Jon Masamitsu's blog:
>
> http://blogs.sun.com/jonthecollector/entry/jdk6_gc_release_notes
>
> quote:
> Trying the new CMS in 1.6 and it seems to work nicely for a while but then
> it starts kicking out these CMS aborts. I can find nothing anywhere talking
> about what is happening. This happens over and over until a full GC happens.
>>From what I can tell the CMS collector never completes and is therefore not
> doing anything except taking CPU time. CMS: abort preclean due to time
> 407.606: [CMS-concurrent-abortable-preclean: 0.149/5.058 secs]

This "abort" message stops a second precleaning phase.  This second
precleaning phase is only meant to run for a lesser amount of time
and the "abort" does not indicate any problem.  That phase is meant
to be "aborted".  It's rather an unfortunate choice of words
on our part.

>
> The CMS will be triggered by -XX:CMSInitiatingOccupancyFraction=70 but it
> seems that it has no effect but continue keeping CMS until a Full GC (caused
> by 'concurrent mode failure'). In my case, it always happen in non-busy
> hours but the CMS behave normally in busy / peak hours.
>
> I think it is not related to 'cms abort preclean' but it's really taking CPU
> time but do nothing (around 4 hours) and it has chance to have a big GC when
> the old-gen is full.

The CMS collections cycles are being started because the old (tenured)
generation does have > 70% occupancy (at least in this fragment from
the log that your sent earlier).

76632.391: [GC[YG occupancy: 176683 K (873856 K)]76632.391: [Rescan
(parallel) , 0.1555281 secs]76632.546: [weak refs processing, 0.0003456
secs] [1 CMS-remark: 2230653K(3145728K)] 2407337K(4019584K), 0.1559517 secs]

Looking at [1 CMS-remark: 2230653K(3145728K)] the 2230653K is the
amount of the old generation that is being used and 3145728K is
the total old generation size and 2230653K/3145728K is just over
70%.

I didn't find a concurrent mode failure in your mail from 10/1.  Can
you sent a log showing the output with the concurrent mode failure?


>
> Best Regards,
> Ken
>
>
>
>
>
>
>

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for your quick response. The application started at 1/10 was crash this morning. I lost the gc log file as I quickly restart the application to avoid service interruption to our clients.

From the hotspot error file(attached), it show that it crash when old generation 99% full. Anyway, it's another problem.

hs_err_pid2476.log

I have another application with similar JVM options set and have exactly behavior:

JVM Options:

-Djava.security.policy=./java.policy -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintHeapAtGC -Xloggc:hkexquoteserver_gc.log -Xms4G -Xmx4G -XX:NewSize=1G -XX:MaxNewSize=1G -XX:+PrintTenuringDistribution -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90 -Dcom.sun.management.jmxremote.port=18003 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

GC Logs:
gc.zip

JConsole:





- long lasting CMSs from 426789 to 504376
- CMS cannot free space in old generation within this period. What I expected is to free old-gen memory to around 500M full at 426789.
- concurrent mode failure at 504379

Thanks and Best Regards,
Ken






Re: CMS collection keep working during holiday

by kirk-17 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ken,

Can I suggest that you back off on some of the switch settings. I think
that GC might work better if ergonomics where allowed to adjust some of
the sizes. This may prevent the horrible GC you are experiencing. Is
suspect that an attempt at compaction in this condition (which the
switch settings would exacerbate) would be devastating. I see
SurvivorRatio=4 and then TargetSurvivorRatio=90. These completely
conflicting settings. I don't know which one would win but I would guess
SurvivorRatio=4. Some switch settings are fixing values where as other
switch setting work as suggestions for ergonomics. Fixing values limit
what ergonomics can do for you and ergonomics are designed to adjust
configurations to help you meet your pause time and GC throughput goals.
Parallel in new is default for 1.5+. For a first run I'd use....

-Xmx4 -XX:+UseConcMarkSweepGC + diagnostics + security settings

Regards,
Kirk

 


Ken--@... wrote:

> Thanks for your quick response. The application started at 1/10 was crash
> this morning. I lost the gc log file as I quickly restart the application to
> avoid service interruption to our clients.
>
> >From the hotspot error file(attached), it show that it crash when old
> generation 99% full. Anyway, it's another problem.
>
> http://www.nabble.com/file/p19841670/hs_err_pid2476.log hs_err_pid2476.log
>
> I have another application with similar JVM options set and have exactly
> behavior:
>
> JVM Options:
>
> -Djava.security.policy=./java.policy -XX:+PrintGCDetails
> -XX:+PrintGCTaskTimeStamps -XX:+PrintHeapAtGC -Xloggc:hkexquoteserver_gc.log
> -Xms4G -Xmx4G -XX:NewSize=1G -XX:MaxNewSize=1G
> -XX:+PrintTenuringDistribution -XX:+DisableExplicitGC
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4
> -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90
> -Dcom.sun.management.jmxremote.port=18003
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
>
> GC Logs:
> http://www.nabble.com/file/p19841670/gc.zip gc.zip
>
> JConsole:
> http://www.nabble.com/file/p19841670/jconsole.jpg 
>
> http://www.nabble.com/file/p19841670/jconsole2.jpg 
>
>
> - long lasting CMSs from 426789 to 504376
> - CMS cannot free space in old generation within this period. What I
> expected is to free old-gen memory to around 500M full at 426789.
> - concurrent mode failure at 504379
>
> Thanks and Best Regards,
> Ken
>
>
>
>
>
>
>  


Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just for info. I double check with the vm options, I did edit the startup bat to disable auto duty cycle of CMS at 2/10 morning.

Regards,
Ken

Ken--@newsgroupstats.hk wrote:
Thanks for your quick response. The application started at 1/10 was crash this morning. I lost the gc log file as I quickly restart the application to avoid service interruption to our clients.

From the hotspot error file(attached), it show that it crash when old generation 99% full. Anyway, it's another problem.

hs_err_pid2476.log

I have another application with similar JVM options set and have exactly behavior:

JVM Options:

-Djava.security.policy=./java.policy -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintHeapAtGC -Xloggc:hkexquoteserver_gc.log -Xms4G -Xmx4G -XX:NewSize=1G -XX:MaxNewSize=1G -XX:+PrintTenuringDistribution -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90 -Dcom.sun.management.jmxremote.port=18003 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

GC Logs:
gc.zip

JConsole:





- long lasting CMSs from 426789 to 504376
- CMS cannot free space in old generation within this period. What I expected is to free old-gen memory to around 500M full at 426789.
- concurrent mode failure at 504379

Thanks and Best Regards,
Ken





Re: CMS collection keep working during holiday

by Jon Masamitsu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 10/06/08 10:58, Ken--@... wrote:
> Just for info. I double check with the vm options, I did edit the startup bat
> to disable auto duty cycle of CMS at 2/10 morning.

How did you change the flags for 2/10 from those below?

>
> Regards,
> Ken
>
>
> Ken--@... wrote:
>> Thanks for your quick response. The application started at 1/10 was crash
>> this morning. I lost the gc log file as I quickly restart the application
>> to avoid service interruption to our clients.
>>
>> From the hotspot error file(attached), it show that it crash when old
>> generation 99% full. Anyway, it's another problem.
>>
>>  http://www.nabble.com/file/p19841670/hs_err_pid2476.log
>> hs_err_pid2476.log
>>
>> I have another application with similar JVM options set and have exactly
>> behavior:
>>
>> JVM Options:
>>
>> -Djava.security.policy=./java.policy -XX:+PrintGCDetails
>> -XX:+PrintGCTaskTimeStamps -XX:+PrintHeapAtGC
>> -Xloggc:hkexquoteserver_gc.log -Xms4G -Xmx4G -XX:NewSize=1G
>> -XX:MaxNewSize=1G -XX:+PrintTenuringDistribution -XX:+DisableExplicitGC
>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90
>> -Dcom.sun.management.jmxremote.port=18003
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false
>>


Re: CMS collection keep working during holiday

by Jon Masamitsu :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ken,

As you've indicated the CMS concurrent collections look like they are not
behaving correctly.  I'll see if I can come up with a simple test case
to reproduce this behavior.   I'll get back to you if I learn anything.

Jon

Ken--@... wrote:

> Just for info. I double check with the vm options, I did edit the startup bat
> to disable auto duty cycle of CMS at 2/10 morning.
>
> Regards,
> Ken
>
>
> Ken--@... wrote:
>  
>> Thanks for your quick response. The application started at 1/10 was crash
>> this morning. I lost the gc log file as I quickly restart the application
>> to avoid service interruption to our clients.
>>
>> From the hotspot error file(attached), it show that it crash when old
>> generation 99% full. Anyway, it's another problem.
>>
>>  http://www.nabble.com/file/p19841670/hs_err_pid2476.log
>> hs_err_pid2476.log
>>
>> I have another application with similar JVM options set and have exactly
>> behavior:
>>
>> JVM Options:
>>
>> -Djava.security.policy=./java.policy -XX:+PrintGCDetails
>> -XX:+PrintGCTaskTimeStamps -XX:+PrintHeapAtGC
>> -Xloggc:hkexquoteserver_gc.log -Xms4G -Xmx4G -XX:NewSize=1G
>> -XX:MaxNewSize=1G -XX:+PrintTenuringDistribution -XX:+DisableExplicitGC
>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90
>> -Dcom.sun.management.jmxremote.port=18003
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false
>>
>> GC Logs:
>>  http://www.nabble.com/file/p19841670/gc.zip gc.zip
>>
>> JConsole:
>>  http://www.nabble.com/file/p19841670/jconsole.jpg 
>>
>>  http://www.nabble.com/file/p19841670/jconsole2.jpg 
>>
>>
>> - long lasting CMSs from 426789 to 504376
>> - CMS cannot free space in old generation within this period. What I
>> expected is to free old-gen memory to around 500M full at 426789.
>> - concurrent mode failure at 504379
>>
>> Thanks and Best Regards,
>> Ken
>>
>>
>>
>>
>>
>>
>>
>>    
>
>  


Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> How did you change the flags for 2/10 from those below?

I changed the jvm options from :

-verbose:gc -XX:+PrintGCDetails -Xloggc:powerserver_gc.log -Djava.security.policy=.\java.policy -Xms4G -Xmx4G -XX:NewSize=1G -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90 -Dcom.sun.management.jmxremote.port=18002 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

to :

-verbose:gc -XX:+PrintGCDetails -Xloggc:powerserver_gc.log -Djava.security.policy=.\java.policy -Xms4G -Xmx4G -XX:NewSize=1G -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=70 -XX:-CMSIncrementalPacing -XX:CMSIncrementalDutyCycle=10 -XX:TargetSurvivorRatio=90 -Dcom.sun.management.jmxremote.port=18002 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false


just added 2 more flag: XX:-CMSIncrementalPacing -XX:CMSIncrementalDutyCycle=10

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Follow up to the jconsole above. This gif is captured today. You can see the first CMS after a FULL GC is working. The 2nd one is working too but the memory free from 2nd generation is much less than the first CMS. The memory free by 3rd CMS is further decreased.

The 5th CMS cannot free enough memory to make old-gen occupancy less than 70%. So, continue non-stopping CMS happen again until a Full GC (triggered by Concurrent Mode Failure).








Re: CMS collection keep working during holiday

by y.s.ramakrishna :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Could you run with the additional option +PrintReferenceGC and see if
it sheds additional light on the problem? (Please share the resulting
GC logs.)

If at all possible, you might try rev'ing down to 6.0 and seeing
if the same behaviour reproduces or not?

I understand that some or all of this may not be possible in a
production setting.

-- ramki

----- Original Message -----
From: "Ken--@..." <dragonken@...>
Date: Tuesday, October 7, 2008 10:17 pm
Subject: Re: CMS collection keep working during holiday
To: hotspot-gc-dev@...


> Follow up to the jconsole above. This gif is captured today. You can
> see the
> first CMS after a FULL GC is working. The 2nd one is working too but the
> memory free from 2nd generation is much less than the first CMS. The memory
> free by 3rd CMS is further decreased.
>
> The 5th CMS cannot free enough memory to make occanpcy in old-gen to less
> than 70%. So, continue non-stopping CMS happen again until a Full GC
> (triggered by Concurrent Mode Failure).
>
> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
>
>
>
>
>
>
> --
> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
> at Nabble.com.
>

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks to your quick response. I will do it now but it require 1 - 2 days to reproduce the result.

It seems that my case is very special.

What the application is simply a stock quote client server and store daily stock quote and chart data in memory. The lived objects need around 80M memory at day open and 500M memory at day end.



y.s.ramakrishna wrote:
Could you run with the additional option +PrintReferenceGC and see if
it sheds additional light on the problem? (Please share the resulting
GC logs.)

If at all possible, you might try rev'ing down to 6.0 and seeing
if the same behaviour reproduces or not?

I understand that some or all of this may not be possible in a
production setting.

-- ramki

----- Original Message -----
From: "Ken--@newsgroupstats.hk" <dragonken@gmail.com>
Date: Tuesday, October 7, 2008 10:17 pm
Subject: Re: CMS collection keep working during holiday
To: hotspot-gc-dev@openjdk.java.net


> Follow up to the jconsole above. This gif is captured today. You can
> see the
> first CMS after a FULL GC is working. The 2nd one is working too but the
> memory free from 2nd generation is much less than the first CMS. The memory
> free by 3rd CMS is further decreased.
>
> The 5th CMS cannot free enough memory to make occanpcy in old-gen to less
> than 70%. So, continue non-stopping CMS happen again until a Full GC
> (triggered by Concurrent Mode Failure).
>
> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
>
>
>
>
>
>
> --
> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
> at Nabble.com.
>

Re: CMS collection keep working during holiday

by kirk-17 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ken--@... wrote:

> Follow up to the jconsole above. This gif is captured today. You can see the
> first CMS after a FULL GC is working. The 2nd one is working too but the
> memory free from 2nd generation is much less than the first CMS. The memory
> free by 3rd CMS is further decreased.
>
> The 5th CMS cannot free enough memory to make occanpcy in old-gen to less
> than 70%. So, continue non-stopping CMS happen again until a Full GC
> (triggered by Concurrent Mode Failure).
>
> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
>
>
>
>
>
>
>  
If GC is failing to recover memory, you either have a memory leak or
long loitering objects. I'd suggest visualvm or netbeans memory
profiling using the generations option to find out what objects are
surviving

Regards,
Kirk

Re: CMS collection keep working during holiday

by y.s.ramakrishna :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ken --

You do see of course that in the gc.zip that you made available,
there were 4 promotion failures, followed by compacting collections,
as listed below in chronological order:-

163116.821: [GC 163116.821: [ParNew: 873856K->174720K(873856K), 0.4830286 secs] 1508484K->961405K(4019584K), 0.4831137 secs] [Times: user=1.59 sys=0.01, real=0.
49 secs]
...
165310.759: [GC 165310.759: [ParNew (promotion failed): 873856K->873856K(873856K), 0.8919148 secs]165311.651:[CMS: 1911965K->785641K(3145728K), 2.9784355 secs] 2733287K->785641K(4019584K), 3.8704903 secs] [Times: user=4.80 sys=0.16, real=3.88 secs]
...
231411.554: [GC 231411.554: [ParNew: 873856K->174720K(873856K), 0.5915618 secs] 1676990K->1122182K(4019584K), 0.5916525 secs] [Times: user=1.45 sys=0.00, real=0
.59 secs]
...
233946.300: [GC 233946.300: [ParNew (promotion failed): 873856K->873856K(873856K), 0.9428034 secs]233947.243: [CMS: 2172134K->570964K(3145728K), 2.7655578 secs] 2969591K->570964K(4019584K), 3.7084973 secs] [Times: user=5.17 sys=0.09, real=3.70 secs]
...
247864.029: [GC 247864.029: [ParNew (promotion failed): 873856K->873855K(873856K), 0.8641325 secs]247864.893: [CMS: 1845477K->499108K(3145728K), 2.2010342 secs] 2628538K->499108K(4019584K), 3.0652834 secs] [Times: user=4.30 sys=0.06, real=3.06 secs]
...
248187.856: [GC 248187.856: [ParNew: 699136K->174719K(873856K), 0.3926926 secs] 1050016K->610565K(4019584K), 0.3927797 secs] [Times: user=1.45 sys=0.03, real=0.
39 secs]
...
265611.034: [GC 265611.034: [ParNew (promotion failed): 873856K->873856K(873856K), 0.6459451 secs]265611.680: [CMS: 1854348K->502128K(3145728K), 2.2556597 secs] 2714667K->502128K(4019584K), 2.9017069 secs] [Times: user=4.19 sys=0.08, real=2.89 secs]
...
502712.808: [GC 502712.808: [ParNew: 873856K->174720K(873856K), 1.2972687 secs] 3806995K->3167193K(4019584K), 1.2973536 secs] [Times: user=1.75 sys=0.00, real=1
.30 secs]
...
504379.249: [GC 504379.249: [ParNew: 873856K->873856K(873856K), 0.0000241 secs]504379.249: [CMS504379.358: [CMS-concurrent-abortable-preclean: 0.209/2.471 secs] [Times: user=1.28 sys=0.00, real=2.48 secs] (concurrent mode failure): 2938816K->424257K(3145728K), 5.2334306 secs] 3812672K->424257K(4019584K), 5.2335621 secs] [Times:
 user=5.11 sys=0.02, real=5.23 secs]

In other words, CMS often does not manage to clear the allocated
memory. But the CMS collection preceding the promotion failure
indicates heap usage only a little more than what you see after
the mark-compact collection. There is almost certainly something
fishy going on here. It may be related to Reference objects and their
referents being split across generations, but it's not completely clear.

[Perhaps you can check if your application makes use of Soft/Weak/References
in some funny way, so as to perhaps cause the Reference and referent to be
split between the old and new generations?]

Now one thing we know about the difference between CMS and a full mark-compact
collection is that CMS treats the young gen as a source of roots for the
CMS and perm gen collection, whereas this is not so for the latter.
In pathological cases, a dead cycle spanning the young gen and the
old gen would remain uncollected by CMS but be collected by the mark-compact
collection. Now, I don't think that would be happening in your case, but I
do see that the frequency of ParNew collections in your case is fairly low --
about 1 every 6 minutes or so. With a tenuring threshold of 11, it's possible
for a dead cycle to live for a full hour before it gets promoted into the
old generation whence it would be eligible for collection.

In some cases, the CMS initiating occupancy fraction has not been hit
(and a CMS collection not even attempte) before a promotion failure
occurs. This would appear to call for more frequent CMS collections
following ParNew collections, even something drastic like:-

    CMSInitiatingOccupancyFraction=25

This would of course cause your back to back CMS collections that
would peg your CPU, in order to avoid that, you might disable the
CMS "timeout" via:-

     CMSWaitDuration=1200000

i.e. make it 20 minutes (=120000 ms).

In addition, I find that your scavenges are unnecessarily copying
survivors back-and-forth 11 times or so, I'd just scrunch that down
to:-

      MaxTenuringThreshold=1 (or at most 2)

This may or may not address the problem you are seeing here but
it's at least a start towards eliminating at least some of
the suspects and of course of eliminating the "busy work"
that CMS is doing during the holidays.

-- ramki

----- Original Message -----
From: "Ken--@..." <dragonken@...>
Date: Tuesday, October 7, 2008 11:14 pm
Subject: Re: CMS collection keep working during holiday
To: hotspot-gc-dev@...


> Thanks to your quick response. I will do it now but it require 1 - 2
> days to
> reproduce the result.
>
> It seems that my case is very special.
>
> What the application is simply a stock quote client server and store daily
> stock quote and chart data in memory. The lived objects need around 80M
> memory at day open and 500M memory at day end.
>
>
>
>
> y.s.ramakrishna wrote:
> >
> >
> > Could you run with the additional option +PrintReferenceGC and see if
> > it sheds additional light on the problem? (Please share the resulting
> > GC logs.)
> >
> > If at all possible, you might try rev'ing down to 6.0 and seeing
> > if the same behaviour reproduces or not?
> >
> > I understand that some or all of this may not be possible in a
> > production setting.
> >
> > -- ramki
> >
> > ----- Original Message -----
> > From: "Ken--@..." <dragonken@...>
> > Date: Tuesday, October 7, 2008 10:17 pm
> > Subject: Re: CMS collection keep working during holiday
> > To: hotspot-gc-dev@...
> >
> >
> >> Follow up to the jconsole above. This gif is captured today. You
> can
> >> see the
> >> first CMS after a FULL GC is working. The 2nd one is working too
> but the
> >> memory free from 2nd generation is much less than the first CMS. The
> >> memory
> >> free by 3rd CMS is further decreased.
> >>
> >> The 5th CMS cannot free enough memory to make occanpcy in old-gen
> to less
> >> than 70%. So, continue non-stopping CMS happen again until a Full GC
> >> (triggered by Concurrent Mode Failure).
> >>
> >> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> >> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> archive
> >> at Nabble.com.
> >>
> >
> >
>
> --
> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
> at Nabble.com.
>

Re: CMS collection keep working during holiday

by y.s.ramakrishna :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Let me see if i understood that:-

> What the application is simply a stock quote client server and store daily
> stock quote and chart data in memory. The lived objects need around 80M
> memory at day open and 500M memory at day end.

So you expect that the long-lived objects that will eventually get
promoted into the old generation, will start out at about 80 MB
at the start of the trading day and will end up about 500 MB at the
end of the trading day.

What happens at the end of the trading day? Do you discard the entire
table, so that the set of long-lived objects would be expected to shrink
back to 80 MB after the trading day ends?

If I understood your problem correctly (and correct me if i am wrong)
you find that the old gen occupancy does not shrink down with
concurrent collections at the end of the trading day, but shrinks only
upon a mark-compact collection (which typically happens as a result of
a promotion failure).

(Loud thinking: I am assuming that the same effect would be achieved
by calling a System.gc() at the end of the trading day after the
table of quotes has been dropped. Of course, we need to figure out
why CMS may not reclaim the table even though it has been dropped.]

Please correct any misunderstanding of the situation.
-- ramki
 

>
> y.s.ramakrishna wrote:
> >
> >
> > Could you run with the additional option +PrintReferenceGC and see if
> > it sheds additional light on the problem? (Please share the resulting
> > GC logs.)
> >
> > If at all possible, you might try rev'ing down to 6.0 and seeing
> > if the same behaviour reproduces or not?
> >
> > I understand that some or all of this may not be possible in a
> > production setting.
> >
> > -- ramki
> >
> > ----- Original Message -----
> > From: "Ken--@..." <dragonken@...>
> > Date: Tuesday, October 7, 2008 10:17 pm
> > Subject: Re: CMS collection keep working during holiday
> > To: hotspot-gc-dev@...
> >
> >
> >> Follow up to the jconsole above. This gif is captured today. You
> can
> >> see the
> >> first CMS after a FULL GC is working. The 2nd one is working too
> but the
> >> memory free from 2nd generation is much less than the first CMS. The
> >> memory
> >> free by 3rd CMS is further decreased.
> >>
> >> The 5th CMS cannot free enough memory to make occanpcy in old-gen
> to less
> >> than 70%. So, continue non-stopping CMS happen again until a Full GC
> >> (triggered by Concurrent Mode Failure).
> >>
> >> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> >> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> archive
> >> at Nabble.com.
> >>
> >
> >
>
> --
> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
> at Nabble.com.
>

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks to your quick response.

I will firstly produce the result with +XX:+PrictReferece and then alter the vm options to your set for further testing.

> [Perhaps you can check if your application makes use of Soft/Weak/References
> in some funny way, so as to perhaps cause the Reference and referent to be
> split between the old and new generations?]

I didn't specially play soft / weak references. Even I don't exactly know what is weak reference. Your comment alert me that I have called javolution's api (javolution.text.Text and javolution.text.TypeFormat.parseFloat).

In the past, we parse the incoming data feed by calling a lot of Float.parseFloat(String). Last year I found that parseFloat create many temp objects resulting many garbages to be collected. After google I found javolution api. It really have better performance.

Javolution's Text Object create instance by ObjectFactory which allocate in stack context. http://javolution.org/api/javolution/context/StackContext.html. I am not sure it's cased by javoltion api.

Anyway or method to shoot it? Is adding -XX:+PrintReference help?

y.s.ramakrishna wrote:
Hi Ken --

You do see of course that in the gc.zip that you made available,
there were 4 promotion failures, followed by compacting collections,
as listed below in chronological order:-

163116.821: [GC 163116.821: [ParNew: 873856K->174720K(873856K), 0.4830286 secs] 1508484K->961405K(4019584K), 0.4831137 secs] [Times: user=1.59 sys=0.01, real=0.
49 secs]
...
165310.759: [GC 165310.759: [ParNew (promotion failed): 873856K->873856K(873856K), 0.8919148 secs]165311.651:[CMS: 1911965K->785641K(3145728K), 2.9784355 secs] 2733287K->785641K(4019584K), 3.8704903 secs] [Times: user=4.80 sys=0.16, real=3.88 secs]
...
231411.554: [GC 231411.554: [ParNew: 873856K->174720K(873856K), 0.5915618 secs] 1676990K->1122182K(4019584K), 0.5916525 secs] [Times: user=1.45 sys=0.00, real=0
.59 secs]
...
233946.300: [GC 233946.300: [ParNew (promotion failed): 873856K->873856K(873856K), 0.9428034 secs]233947.243: [CMS: 2172134K->570964K(3145728K), 2.7655578 secs] 2969591K->570964K(4019584K), 3.7084973 secs] [Times: user=5.17 sys=0.09, real=3.70 secs]
...
247864.029: [GC 247864.029: [ParNew (promotion failed): 873856K->873855K(873856K), 0.8641325 secs]247864.893: [CMS: 1845477K->499108K(3145728K), 2.2010342 secs] 2628538K->499108K(4019584K), 3.0652834 secs] [Times: user=4.30 sys=0.06, real=3.06 secs]
...
248187.856: [GC 248187.856: [ParNew: 699136K->174719K(873856K), 0.3926926 secs] 1050016K->610565K(4019584K), 0.3927797 secs] [Times: user=1.45 sys=0.03, real=0.
39 secs]
...
265611.034: [GC 265611.034: [ParNew (promotion failed): 873856K->873856K(873856K), 0.6459451 secs]265611.680: [CMS: 1854348K->502128K(3145728K), 2.2556597 secs] 2714667K->502128K(4019584K), 2.9017069 secs] [Times: user=4.19 sys=0.08, real=2.89 secs]
...
502712.808: [GC 502712.808: [ParNew: 873856K->174720K(873856K), 1.2972687 secs] 3806995K->3167193K(4019584K), 1.2973536 secs] [Times: user=1.75 sys=0.00, real=1
.30 secs]
...
504379.249: [GC 504379.249: [ParNew: 873856K->873856K(873856K), 0.0000241 secs]504379.249: [CMS504379.358: [CMS-concurrent-abortable-preclean: 0.209/2.471 secs] [Times: user=1.28 sys=0.00, real=2.48 secs] (concurrent mode failure): 2938816K->424257K(3145728K), 5.2334306 secs] 3812672K->424257K(4019584K), 5.2335621 secs] [Times:
 user=5.11 sys=0.02, real=5.23 secs]

In other words, CMS often does not manage to clear the allocated
memory. But the CMS collection preceding the promotion failure
indicates heap usage only a little more than what you see after
the mark-compact collection. There is almost certainly something
fishy going on here. It may be related to Reference objects and their
referents being split across generations, but it's not completely clear.

[Perhaps you can check if your application makes use of Soft/Weak/References
in some funny way, so as to perhaps cause the Reference and referent to be
split between the old and new generations?]

Now one thing we know about the difference between CMS and a full mark-compact
collection is that CMS treats the young gen as a source of roots for the
CMS and perm gen collection, whereas this is not so for the latter.
In pathological cases, a dead cycle spanning the young gen and the
old gen would remain uncollected by CMS but be collected by the mark-compact
collection. Now, I don't think that would be happening in your case, but I
do see that the frequency of ParNew collections in your case is fairly low --
about 1 every 6 minutes or so. With a tenuring threshold of 11, it's possible
for a dead cycle to live for a full hour before it gets promoted into the
old generation whence it would be eligible for collection.

In some cases, the CMS initiating occupancy fraction has not been hit
(and a CMS collection not even attempte) before a promotion failure
occurs. This would appear to call for more frequent CMS collections
following ParNew collections, even something drastic like:-

    CMSInitiatingOccupancyFraction=25

This would of course cause your back to back CMS collections that
would peg your CPU, in order to avoid that, you might disable the
CMS "timeout" via:-

     CMSWaitDuration=1200000

i.e. make it 20 minutes (=120000 ms).

In addition, I find that your scavenges are unnecessarily copying
survivors back-and-forth 11 times or so, I'd just scrunch that down
to:-

      MaxTenuringThreshold=1 (or at most 2)

This may or may not address the problem you are seeing here but
it's at least a start towards eliminating at least some of
the suspects and of course of eliminating the "busy work"
that CMS is doing during the holidays.

-- ramki

----- Original Message -----
From: "Ken--@newsgroupstats.hk" <dragonken@gmail.com>
Date: Tuesday, October 7, 2008 11:14 pm
Subject: Re: CMS collection keep working during holiday
To: hotspot-gc-dev@openjdk.java.net


> Thanks to your quick response. I will do it now but it require 1 - 2
> days to
> reproduce the result.
>
> It seems that my case is very special.
>
> What the application is simply a stock quote client server and store daily
> stock quote and chart data in memory. The lived objects need around 80M
> memory at day open and 500M memory at day end.
>
>
>
>
> y.s.ramakrishna wrote:
> >
> >
> > Could you run with the additional option +PrintReferenceGC and see if
> > it sheds additional light on the problem? (Please share the resulting
> > GC logs.)
> >
> > If at all possible, you might try rev'ing down to 6.0 and seeing
> > if the same behaviour reproduces or not?
> >
> > I understand that some or all of this may not be possible in a
> > production setting.
> >
> > -- ramki
> >
> > ----- Original Message -----
> > From: "Ken--@newsgroupstats.hk" <dragonken@gmail.com>
> > Date: Tuesday, October 7, 2008 10:17 pm
> > Subject: Re: CMS collection keep working during holiday
> > To: hotspot-gc-dev@openjdk.java.net
> >
> >
> >> Follow up to the jconsole above. This gif is captured today. You
> can
> >> see the
> >> first CMS after a FULL GC is working. The 2nd one is working too
> but the
> >> memory free from 2nd generation is much less than the first CMS. The
> >> memory
> >> free by 3rd CMS is further decreased.
> >>
> >> The 5th CMS cannot free enough memory to make occanpcy in old-gen
> to less
> >> than 70%. So, continue non-stopping CMS happen again until a Full GC
> >> (triggered by Concurrent Mode Failure).
> >>
> >> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> >> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> archive
> >> at Nabble.com.
> >>
> >
> >
>
> --
> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
> at Nabble.com.
>

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks to your quick response.

You are correct.

I use this option set to find out how large is my lived objects:

-Xmx1536M -Xms1536M -XX:NewSize=256M -XX:MaxNewSize=256M -XX:+ExplicitGCInvokesConcurrent

(Yes, pressing Full-GC in jconsole or call System.gc() in application can have same effect.)

The application will daily reset all lived objects at 9:00am(holiday or non-trading day will not). Day open at 9:30am and day end at 4:30pm. From jconsole, I can see the heap chart raise steadily from 80M(9:30am) to 5xxM(4:30pm), if use the vm options above.


Y Srinivas Ramakrishna wrote:
Let me see if i understood that:-

> What the application is simply a stock quote client server and store daily
> stock quote and chart data in memory. The lived objects need around 80M
> memory at day open and 500M memory at day end.

So you expect that the long-lived objects that will eventually get
promoted into the old generation, will start out at about 80 MB
at the start of the trading day and will end up about 500 MB at the
end of the trading day.

What happens at the end of the trading day? Do you discard the entire
table, so that the set of long-lived objects would be expected to shrink
back to 80 MB after the trading day ends?

If I understood your problem correctly (and correct me if i am wrong)
you find that the old gen occupancy does not shrink down with
concurrent collections at the end of the trading day, but shrinks only
upon a mark-compact collection (which typically happens as a result of
a promotion failure).

(Loud thinking: I am assuming that the same effect would be achieved
by calling a System.gc() at the end of the trading day after the
table of quotes has been dropped. Of course, we need to figure out
why CMS may not reclaim the table even though it has been dropped.]

Please correct any misunderstanding of the situation.
-- ramki
 
>
> y.s.ramakrishna wrote:
> >
> >
> > Could you run with the additional option +PrintReferenceGC and see if
> > it sheds additional light on the problem? (Please share the resulting
> > GC logs.)
> >
> > If at all possible, you might try rev'ing down to 6.0 and seeing
> > if the same behaviour reproduces or not?
> >
> > I understand that some or all of this may not be possible in a
> > production setting.
> >
> > -- ramki
> >
> > ----- Original Message -----
> > From: "Ken--@newsgroupstats.hk" <dragonken@gmail.com>
> > Date: Tuesday, October 7, 2008 10:17 pm
> > Subject: Re: CMS collection keep working during holiday
> > To: hotspot-gc-dev@openjdk.java.net
> >
> >
> >> Follow up to the jconsole above. This gif is captured today. You
> can
> >> see the
> >> first CMS after a FULL GC is working. The 2nd one is working too
> but the
> >> memory free from 2nd generation is much less than the first CMS. The
> >> memory
> >> free by 3rd CMS is further decreased.
> >>
> >> The 5th CMS cannot free enough memory to make occanpcy in old-gen
> to less
> >> than 70%. So, continue non-stopping CMS happen again until a Full GC
> >> (triggered by Concurrent Mode Failure).
> >>
> >> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> >> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> archive
> >> at Nabble.com.
> >>
> >
> >
>
> --
> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
> at Nabble.com.
>

Re: CMS collection keep working during holiday

by Ken--@newsgroupstats.hk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have started same applications in a separated old 32bits windows with your additional suggested options

VM arguments: 
-Djava.security.policy=.\java.policy -Xms1536M -Xmx1536M -XX:NewSize=256M -XX:+PrintGCDetails -XX:+PrintGCTaskTimeStamps -XX:+PrintHeapAtGC -Xloggc:hkexquoteserver_gc.log -XX:+PrintReferenceGC -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=4 -XX:CMSInitiatingOccupancyFraction=70 -XX:TargetSurvivorRatio=90 -Dcom.sun.management.jmxremote.port=18002 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

gc_20081008.zip

I will upload the gc log again once the problem is reproduced.

Thanks to all.

Regards,
Ken

Re: CMS collection keep working during holiday

by kirk-17 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Srinivas, Ken,

Nice analysis. Maybe I've missed it but I've not seen anything that
rules out a simple object leak. Until the leak has been ruled out would
you prefer to keep objects out of tenured rather than quickly promoting
them to a space that is already under enormous pressure? Do you see
extra copying as even being relevant to this problem?

Regards,
Kirk

Y Srinivas Ramakrishna wrote:

> Hi Ken --
>
> You do see of course that in the gc.zip that you made available,
> there were 4 promotion failures, followed by compacting collections,
> as listed below in chronological order:-
>
> 163116.821: [GC 163116.821: [ParNew: 873856K->174720K(873856K), 0.4830286 secs] 1508484K->961405K(4019584K), 0.4831137 secs] [Times: user=1.59 sys=0.01, real=0.
> 49 secs]
> ...
> 165310.759: [GC 165310.759: [ParNew (promotion failed): 873856K->873856K(873856K), 0.8919148 secs]165311.651:[CMS: 1911965K->785641K(3145728K), 2.9784355 secs] 2733287K->785641K(4019584K), 3.8704903 secs] [Times: user=4.80 sys=0.16, real=3.88 secs]
> ...
> 231411.554: [GC 231411.554: [ParNew: 873856K->174720K(873856K), 0.5915618 secs] 1676990K->1122182K(4019584K), 0.5916525 secs] [Times: user=1.45 sys=0.00, real=0
> .59 secs]
> ...
> 233946.300: [GC 233946.300: [ParNew (promotion failed): 873856K->873856K(873856K), 0.9428034 secs]233947.243: [CMS: 2172134K->570964K(3145728K), 2.7655578 secs] 2969591K->570964K(4019584K), 3.7084973 secs] [Times: user=5.17 sys=0.09, real=3.70 secs]
> ...
> 247864.029: [GC 247864.029: [ParNew (promotion failed): 873856K->873855K(873856K), 0.8641325 secs]247864.893: [CMS: 1845477K->499108K(3145728K), 2.2010342 secs] 2628538K->499108K(4019584K), 3.0652834 secs] [Times: user=4.30 sys=0.06, real=3.06 secs]
> ...
> 248187.856: [GC 248187.856: [ParNew: 699136K->174719K(873856K), 0.3926926 secs] 1050016K->610565K(4019584K), 0.3927797 secs] [Times: user=1.45 sys=0.03, real=0.
> 39 secs]
> ...
> 265611.034: [GC 265611.034: [ParNew (promotion failed): 873856K->873856K(873856K), 0.6459451 secs]265611.680: [CMS: 1854348K->502128K(3145728K), 2.2556597 secs] 2714667K->502128K(4019584K), 2.9017069 secs] [Times: user=4.19 sys=0.08, real=2.89 secs]
> ...
> 502712.808: [GC 502712.808: [ParNew: 873856K->174720K(873856K), 1.2972687 secs] 3806995K->3167193K(4019584K), 1.2973536 secs] [Times: user=1.75 sys=0.00, real=1
> .30 secs]
> ...
> 504379.249: [GC 504379.249: [ParNew: 873856K->873856K(873856K), 0.0000241 secs]504379.249: [CMS504379.358: [CMS-concurrent-abortable-preclean: 0.209/2.471 secs] [Times: user=1.28 sys=0.00, real=2.48 secs] (concurrent mode failure): 2938816K->424257K(3145728K), 5.2334306 secs] 3812672K->424257K(4019584K), 5.2335621 secs] [Times:
>  user=5.11 sys=0.02, real=5.23 secs]
>
> In other words, CMS often does not manage to clear the allocated
> memory. But the CMS collection preceding the promotion failure
> indicates heap usage only a little more than what you see after
> the mark-compact collection. There is almost certainly something
> fishy going on here. It may be related to Reference objects and their
> referents being split across generations, but it's not completely clear.
>
> [Perhaps you can check if your application makes use of Soft/Weak/References
> in some funny way, so as to perhaps cause the Reference and referent to be
> split between the old and new generations?]
>
> Now one thing we know about the difference between CMS and a full mark-compact
> collection is that CMS treats the young gen as a source of roots for the
> CMS and perm gen collection, whereas this is not so for the latter.
> In pathological cases, a dead cycle spanning the young gen and the
> old gen would remain uncollected by CMS but be collected by the mark-compact
> collection. Now, I don't think that would be happening in your case, but I
> do see that the frequency of ParNew collections in your case is fairly low --
> about 1 every 6 minutes or so. With a tenuring threshold of 11, it's possible
> for a dead cycle to live for a full hour before it gets promoted into the
> old generation whence it would be eligible for collection.
>
> In some cases, the CMS initiating occupancy fraction has not been hit
> (and a CMS collection not even attempte) before a promotion failure
> occurs. This would appear to call for more frequent CMS collections
> following ParNew collections, even something drastic like:-
>
>     CMSInitiatingOccupancyFraction=25
>
> This would of course cause your back to back CMS collections that
> would peg your CPU, in order to avoid that, you might disable the
> CMS "timeout" via:-
>
>      CMSWaitDuration=1200000
>
> i.e. make it 20 minutes (=120000 ms).
>
> In addition, I find that your scavenges are unnecessarily copying
> survivors back-and-forth 11 times or so, I'd just scrunch that down
> to:-
>
>       MaxTenuringThreshold=1 (or at most 2)
>
> This may or may not address the problem you are seeing here but
> it's at least a start towards eliminating at least some of
> the suspects and of course of eliminating the "busy work"
> that CMS is doing during the holidays.
>
> -- ramki
>
> ----- Original Message -----
> From: "Ken--@..." <dragonken@...>
> Date: Tuesday, October 7, 2008 11:14 pm
> Subject: Re: CMS collection keep working during holiday
> To: hotspot-gc-dev@...
>
>
>  
>> Thanks to your quick response. I will do it now but it require 1 - 2
>> days to
>> reproduce the result.
>>
>> It seems that my case is very special.
>>
>> What the application is simply a stock quote client server and store daily
>> stock quote and chart data in memory. The lived objects need around 80M
>> memory at day open and 500M memory at day end.
>>
>>
>>
>>
>> y.s.ramakrishna wrote:
>>    
>>> Could you run with the additional option +PrintReferenceGC and see if
>>> it sheds additional light on the problem? (Please share the resulting
>>> GC logs.)
>>>
>>> If at all possible, you might try rev'ing down to 6.0 and seeing
>>> if the same behaviour reproduces or not?
>>>
>>> I understand that some or all of this may not be possible in a
>>> production setting.
>>>
>>> -- ramki
>>>
>>> ----- Original Message -----
>>> From: "Ken--@..." <dragonken@...>
>>> Date: Tuesday, October 7, 2008 10:17 pm
>>> Subject: Re: CMS collection keep working during holiday
>>> To: hotspot-gc-dev@...
>>>
>>>
>>>      
>>>> Follow up to the jconsole above. This gif is captured today. You
>>>>        
>> can
>>    
>>>> see the
>>>> first CMS after a FULL GC is working. The 2nd one is working too
>>>>        
>> but the
>>    
>>>> memory free from 2nd generation is much less than the first CMS. The
>>>> memory
>>>> free by 3rd CMS is further decreased.
>>>>
>>>> The 5th CMS cannot free enough memory to make occanpcy in old-gen
>>>>        
>> to less
>>    
>>>> than 70%. So, continue non-stopping CMS happen again until a Full GC
>>>> (triggered by Concurrent Mode Failure).
>>>>
>>>> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
>>>> Sent from the OpenJDK Hotspot Garbage Collection mailing list
>>>>        
>> archive
>>    
>>>> at Nabble.com.
>>>>
>>>>        
>>>      
>> --
>> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
>> Sent from the OpenJDK Hotspot Garbage Collection mailing list archive
>> at Nabble.com.
>>
>>    
>
>  


Re: CMS collection keep working during holiday

by y.s.ramakrishna :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi Kirk --

> Nice analysis. Maybe I've missed it but I've not seen anything that
> rules out a simple object leak. Until the leak has been ruled out
> would
> you prefer to keep objects out of tenured rather than quickly
> promoting
> them to a space that is already under enormous pressure? Do you see
> extra copying as even being relevant to this problem?

The extra copying is relevant only in as much as avoiding it will
reduce the scavenge pauses to approximately 1/5 (or lower) of
their current value. However, you are right that this will not
change by much the promotion _rate_ into the tenured gen.
(it will increase by a tad, but not by much.) Thus, tenuring
after aging for a single scavenge appears to be a win-win.

Now the reason I suggested the quick tenuring was because
i suspected that the difference between CMS and mark-compact
collections is showing up because the new gen is treated
as a source of roots into the old gen and perhaps the quicker
tenuring will reduce the # "false roots" if you will
(see my reference to Reference objects and to cycles
spanning across the generations; the idea is to avoid
carrying around such potential garbage).

You are right though that perhaps it's a simple object leak.
What makes it strange, and makes it somewhat unlikely to be a simple
memory leak is how a mark-compact collection reclaims the
space whereas a concurrent collection does not (even accounting
for some small amount of floating garbage because of concurrent
marking -- of which there should be little in the present case
of long inter-scavenge durations and frequent back-to-back CMS
collections).

As regards pressure on the tenured gen: the logs actually indicate
that the promotion rate into the old gen is not that great (note
that each scavenge promotes only about 10-11 MB or so of data
into the old gen and scavenges happen only about once every 6 minutes,
making for a promotion rate of about 30 KB/s which is fairly low.
As you will have noted the CMS cycles are essentially "busy work"
cycles (in that the work is unnecessary and does not accomplish
anything; thus must not be mistaken for the need for a lot of
work). It is true, however, that there is some "pressure"
in the sense that the occupancy, as described by Ken, keeps
increasing over the course of the day ...

-- ramki

>
> Regards,
> Kirk
>
> Y Srinivas Ramakrishna wrote:
> > Hi Ken --
> >
> > You do see of course that in the gc.zip that you made available,
> > there were 4 promotion failures, followed by compacting collections,
> > as listed below in chronological order:-
> >
> > 163116.821: [GC 163116.821: [ParNew: 873856K->174720K(873856K),
> 0.4830286 secs] 1508484K->961405K(4019584K), 0.4831137 secs] [Times:
> user=1.59 sys=0.01, real=0.
> > 49 secs]
> > ...
> > 165310.759: [GC 165310.759: [ParNew (promotion failed):
> 873856K->873856K(873856K), 0.8919148 secs]165311.651:[CMS:
> 1911965K->785641K(3145728K), 2.9784355 secs]
> 2733287K->785641K(4019584K), 3.8704903 secs] [Times: user=4.80
> sys=0.16, real=3.88 secs]
> > ...
> > 231411.554: [GC 231411.554: [ParNew: 873856K->174720K(873856K),
> 0.5915618 secs] 1676990K->1122182K(4019584K), 0.5916525 secs] [Times:
> user=1.45 sys=0.00, real=0
> > .59 secs]
> > ...
> > 233946.300: [GC 233946.300: [ParNew (promotion failed):
> 873856K->873856K(873856K), 0.9428034 secs]233947.243: [CMS:
> 2172134K->570964K(3145728K), 2.7655578 secs]
> 2969591K->570964K(4019584K), 3.7084973 secs] [Times: user=5.17
> sys=0.09, real=3.70 secs]
> > ...
> > 247864.029: [GC 247864.029: [ParNew (promotion failed):
> 873856K->873855K(873856K), 0.8641325 secs]247864.893: [CMS:
> 1845477K->499108K(3145728K), 2.2010342 secs]
> 2628538K->499108K(4019584K), 3.0652834 secs] [Times: user=4.30
> sys=0.06, real=3.06 secs]
> > ...
> > 248187.856: [GC 248187.856: [ParNew: 699136K->174719K(873856K),
> 0.3926926 secs] 1050016K->610565K(4019584K), 0.3927797 secs] [Times:
> user=1.45 sys=0.03, real=0.
> > 39 secs]
> > ...
> > 265611.034: [GC 265611.034: [ParNew (promotion failed):
> 873856K->873856K(873856K), 0.6459451 secs]265611.680: [CMS:
> 1854348K->502128K(3145728K), 2.2556597 secs]
> 2714667K->502128K(4019584K), 2.9017069 secs] [Times: user=4.19
> sys=0.08, real=2.89 secs]
> > ...
> > 502712.808: [GC 502712.808: [ParNew: 873856K->174720K(873856K),
> 1.2972687 secs] 3806995K->3167193K(4019584K), 1.2973536 secs] [Times:
> user=1.75 sys=0.00, real=1
> > .30 secs]
> > ...
> > 504379.249: [GC 504379.249: [ParNew: 873856K->873856K(873856K),
> 0.0000241 secs]504379.249: [CMS504379.358:
> [CMS-concurrent-abortable-preclean: 0.209/2.471 secs] [Times:
> user=1.28 sys=0.00, real=2.48 secs] (concurrent mode failure):
> 2938816K->424257K(3145728K), 5.2334306 secs]
> 3812672K->424257K(4019584K), 5.2335621 secs] [Times:
> >  user=5.11 sys=0.02, real=5.23 secs]
> >
> > In other words, CMS often does not manage to clear the allocated
> > memory. But the CMS collection preceding the promotion failure
> > indicates heap usage only a little more than what you see after
> > the mark-compact collection. There is almost certainly something
> > fishy going on here. It may be related to Reference objects and their
> > referents being split across generations, but it's not completely clear.
> >
> > [Perhaps you can check if your application makes use of Soft/Weak/References
> > in some funny way, so as to perhaps cause the Reference and referent
> to be
> > split between the old and new generations?]
> >
> > Now one thing we know about the difference between CMS and a full mark-compact
> > collection is that CMS treats the young gen as a source of roots for
> the
> > CMS and perm gen collection, whereas this is not so for the latter.
> > In pathological cases, a dead cycle spanning the young gen and the
> > old gen would remain uncollected by CMS but be collected by the mark-compact
> > collection. Now, I don't think that would be happening in your case,
> but I
> > do see that the frequency of ParNew collections in your case is
> fairly low --
> > about 1 every 6 minutes or so. With a tenuring threshold of 11, it's
> possible
> > for a dead cycle to live for a full hour before it gets promoted
> into the
> > old generation whence it would be eligible for collection.
> >
> > In some cases, the CMS initiating occupancy fraction has not been hit
> > (and a CMS collection not even attempte) before a promotion failure
> > occurs. This would appear to call for more frequent CMS collections
> > following ParNew collections, even something drastic like:-
> >
> >     CMSInitiatingOccupancyFraction=25
> >
> > This would of course cause your back to back CMS collections that
> > would peg your CPU, in order to avoid that, you might disable the
> > CMS "timeout" via:-
> >
> >      CMSWaitDuration=1200000
> >
> > i.e. make it 20 minutes (=120000 ms).
> >
> > In addition, I find that your scavenges are unnecessarily copying
> > survivors back-and-forth 11 times or so, I'd just scrunch that down
> > to:-
> >
> >       MaxTenuringThreshold=1 (or at most 2)
> >
> > This may or may not address the problem you are seeing here but
> > it's at least a start towards eliminating at least some of
> > the suspects and of course of eliminating the "busy work"
> > that CMS is doing during the holidays.
> >
> > -- ramki
> >
> > ----- Original Message -----
> > From: "Ken--@..." <dragonken@...>
> > Date: Tuesday, October 7, 2008 11:14 pm
> > Subject: Re: CMS collection keep working during holiday
> > To: hotspot-gc-dev@...
> >
> >
> >  
> >> Thanks to your quick response. I will do it now but it require 1 -
> 2
> >> days to
> >> reproduce the result.
> >>
> >> It seems that my case is very special.
> >>
> >> What the application is simply a stock quote client server and
> store daily
> >> stock quote and chart data in memory. The lived objects need around
> 80M
> >> memory at day open and 500M memory at day end.
> >>
> >>
> >>
> >>
> >> y.s.ramakrishna wrote:
> >>    
> >>> Could you run with the additional option +PrintReferenceGC and see
> if
> >>> it sheds additional light on the problem? (Please share the resulting
> >>> GC logs.)
> >>>
> >>> If at all possible, you might try rev'ing down to 6.0 and seeing
> >>> if the same behaviour reproduces or not?
> >>>
> >>> I understand that some or all of this may not be possible in a
> >>> production setting.
> >>>
> >>> -- ramki
> >>>
> >>> ----- Original Message -----
> >>> From: "Ken--@..." <dragonken@...>
> >>> Date: Tuesday, October 7, 2008 10:17 pm
> >>> Subject: Re: CMS collection keep working during holiday
> >>> To: hotspot-gc-dev@...
> >>>
> >>>
> >>>      
> >>>> Follow up to the jconsole above. This gif is captured today. You
>
> >>>>        
> >> can
> >>    
> >>>> see the
> >>>> first CMS after a FULL GC is working. The 2nd one is working too
>
> >>>>        
> >> but the
> >>    
> >>>> memory free from 2nd generation is much less than the first CMS.
> The
> >>>> memory
> >>>> free by 3rd CMS is further decreased.
> >>>>
> >>>> The 5th CMS cannot free enough memory to make occanpcy in old-gen
>
> >>>>        
> >> to less
> >>    
> >>>> than 70%. So, continue non-stopping CMS happen again until a Full
> GC
> >>>> (triggered by Concurrent Mode Failure).
> >>>>
> >>>> http://www.nabble.com/file/p19872463/jconsole_20081008.jpg 
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872463.html
> >>>> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> >>>>        
> >> archive
> >>    
> >>>> at Nabble.com.
> >>>>
> >>>>        
> >>>      
> >> --
> >> View this message in context: http://www.nabble.com/CMS-collection-keep-working-during-holiday-tp19773575p19872900.html
> >> Sent from the OpenJDK Hotspot Garbage Collection mailing list
> archive
> >> at Nabble.com.
> >>
> >>    
> >
> >  
>
< Prev | 1 - 2 - 3 | Next >