|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
|
|
|
|
|
|
Re: rrdcached performance problemThorsten von Eicken wrote:
>> 37.1 % of the time it spent in “handle_request_update” the daemon is >> actually waiting for “realloc”. This is (to me) very unexpected and a >> schoolbook example of “measure before you optimize”. >> >> I think we can get rid of this bottleneck by writing a specialized >> version of “rrd_add_strdup” which reallocates powers of ten. Something >> like: >> >> [...] >> >> It'd be great if you could give the attached patch a try > spends all of its time (more than 99 %) in “realloc”. So in consequence > Test is running, including Kevin's simplification... Thanks for the > help!!! thing that still makes me a bit uncomfortable is that at the end of the second hour of run-time there was a cpu spike which caused collectd to grow rapidly. (Still using -w 3600 -z 3600 -f 7200, I put a load of ~50k tree nodes right from the start.) You can see the graphs at http://www.voneicken.com/dl/rrd/ look for the rrdcached-6* series. It flattens out nicely after the spike, but it's one of those things that tend to bite sooner or later. I'm not sure what to do about it. Thorsten _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problemOn Fri, Oct 30, 2009 at 09:19:30AM -0700, Thorsten von Eicken wrote:
> Thorsten von Eicken wrote: > >> 37.1 % of the time it spent in ?handle_request_update? the daemon is > >> actually waiting for ?realloc?. This is (to me) very unexpected and a > >> schoolbook example of ?measure before you optimize?. > >> > >> I think we can get rid of this bottleneck by writing a specialized > >> version of ?rrd_add_strdup? which reallocates powers of ten. Something > >> like: > >> > >> [...] > >> > >> It'd be great if you could give the attached patch a try > > spends all of its time (more than 99 %) in ?realloc?. So in consequence > > Test is running, including Kevin's simplification... Thanks for the > > help!!! > Things are again looking much better, almost great I should say! The one > thing that still makes me a bit uncomfortable is that at the end of the > second hour of run-time there was a cpu spike which caused collectd to > grow rapidly. (Still using -w 3600 -z 3600 -f 7200, I put a load of ~50k > tree nodes right from the start.) You can see the graphs at > http://www.voneicken.com/dl/rrd/ look for the rrdcached-6* series. It > flattens out nicely after the spike, but it's one of those things that > tend to bite sooner or later. I'm not sure what to do about it. Thorsten, What's the start time on that daemon? If it was started slightly before the graphs (~21:30-21:35), then it's possible that the CPU increase is associated with the first flush (-f). Something is queueing a lot of files; I would only expect that to result from the flush process if writing had stopped for a set of your RRDs. For example, if you have a large number of RRD files in the journal that aren't being re-written afterwards, then the first flush would contain all those. Did you use journal on startup for this test? If you could profile during that "hump" it would be very instructive... -- kevin brintnall =~ /kbrint@.../ _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problemkevin brintnall wrote:
> On Fri, Oct 30, 2009 at 09:19:30AM -0700, Thorsten von Eicken wrote: > >> Thorsten von Eicken wrote: >> >>>> 37.1 % of the time it spent in ?handle_request_update? the daemon is >>>> actually waiting for ?realloc?. This is (to me) very unexpected and a >>>> schoolbook example of ?measure before you optimize?. >>>> >>>> I think we can get rid of this bottleneck by writing a specialized >>>> version of ?rrd_add_strdup? which reallocates powers of ten. Something >>>> like: >>>> >>>> [...] >>>> >>>> It'd be great if you could give the attached patch a try >>>> >>> spends all of its time (more than 99 %) in ?realloc?. So in consequence >>> Test is running, including Kevin's simplification... Thanks for the >>> help!!! >>> >> Things are again looking much better, almost great I should say! The one >> thing that still makes me a bit uncomfortable is that at the end of the >> second hour of run-time there was a cpu spike which caused collectd to >> grow rapidly. (Still using -w 3600 -z 3600 -f 7200, I put a load of ~50k >> tree nodes right from the start.) You can see the graphs at >> http://www.voneicken.com/dl/rrd/ look for the rrdcached-6* series. It >> flattens out nicely after the spike, but it's one of those things that >> tend to bite sooner or later. I'm not sure what to do about it. >> > > Thorsten, > > What's the start time on that daemon? If it was started slightly before > the graphs (~21:30-21:35), then it's possible that the CPU increase is > associated with the first flush (-f). > "write-data_sets" graph, you can see very clearly how the flushing starts at 22:38-22:39 and the first hour of flushing ends 23:46-23:47. The cpu spike starts to build at 23:26-23:28 so it's not on a clean boundary at all.If you look at the if-packets graph you can see how inbound traffic is 100% stable throughout the whole run. > Something is queueing a lot of files; I would only expect that to result > from the flush process if writing had stopped for a set of your RRDs. For > example, if you have a large number of RRD files in the journal that > aren't being re-written afterwards, then the first flush would contain all > those. > > Did you use journal on startup for this test? > I cleared the journals before starting rrdcached. > If you could profile during that "hump" it would be very instructive... > Yeah, no kidding. The question is how to reproduce this given the callgrind slow-down. I may try later next week when I hope to have some more time. Thorsten _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problemOn Fri, Oct 30, 2009 at 09:47:43AM -0700, Thorsten von Eicken wrote:
> > What's the start time on that daemon? If it was started slightly before > > the graphs (~21:30-21:35), then it's possible that the CPU increase is > > associated with the first flush (-f). > > > It was right at the start of the graphs. If you look at the > "write-data_sets" graph, you can see very clearly how the flushing > starts at 22:38-22:39 and the first hour of flushing ends 23:46-23:47. > The cpu spike starts to build at 23:26-23:28 so it's not on a clean > boundary at all.If you look at the if-packets graph you can see how > inbound traffic is 100% stable throughout the whole run. OK.. That's what I thought :| > > Something is queueing a lot of files; I would only expect that to > > result from the flush process if writing had stopped for a set of your > > RRDs. For example, if you have a large number of RRD files in the > > journal that aren't being re-written afterwards, then the first flush > > would contain all those. When the queue length jumps up, try issuing the "QUEUE" command to the daemon. Maybe the list of queued files (or their order) will be instructive. -- kevin brintnall =~ /kbrint@.../ _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problemQuick follow-up. I decided to add another 3k updates per second (extra
30k tree nodes) to my test run. See results in http://www.voneicken.com/dl/rrd/rrdcached-7.png What's interesting is that the server got somewhat overloaded sitting a lot in I/O wait. By and large the flush queue length remained under control, except when doing backups (10pm, 8:30am). Memory usage by rrdcached and collectd remained under control, but there is a long term upward-trending slope to rrdcached's memory usage which is not good. Possibly related to the power-of-two allocator patch that Florian provided. The graph I find the most interesting one is the disk sdk disk ops (3rd from the end). Before adding the last chunk of traffic the disk load was write-dominated, which means that rrds were mostly cached in memory (5-6 GB left after the processes). After adding the extra load the disk load became read-dominated indicating that the rrd working set exceeded memory. Thorsten Thorsten von Eicken wrote: > Thorsten von Eicken wrote: > >>> 37.1 % of the time it spent in “handle_request_update” the daemon is >>> actually waiting for “realloc”. This is (to me) very unexpected and a >>> schoolbook example of “measure before you optimize”. >>> >>> I think we can get rid of this bottleneck by writing a specialized >>> version of “rrd_add_strdup” which reallocates powers of ten. Something >>> like: >>> >>> [...] >>> >>> It'd be great if you could give the attached patch a try >>> >> spends all of its time (more than 99 %) in “realloc”. So in consequence >> Test is running, including Kevin's simplification... Thanks for the >> help!!! >> > Things are again looking much better, almost great I should say! The one > thing that still makes me a bit uncomfortable is that at the end of the > second hour of run-time there was a cpu spike which caused collectd to > grow rapidly. (Still using -w 3600 -z 3600 -f 7200, I put a load of ~50k > tree nodes right from the start.) You can see the graphs at > http://www.voneicken.com/dl/rrd/ look for the rrdcached-6* series. It > flattens out nicely after the spike, but it's one of those things that > tend to bite sooner or later. I'm not sure what to do about it. > Thorsten > > _______________________________________________ > rrd-developers mailing list > rrd-developers@... > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers > > _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problemOn Sat, Oct 31, 2009 at 09:52:15AM -0700, Thorsten von Eicken wrote:
> Quick follow-up. I decided to add another 3k updates per second (extra > 30k tree nodes) to my test run. See results in > http://www.voneicken.com/dl/rrd/rrdcached-7.png > What's interesting is that the server got somewhat overloaded sitting a > lot in I/O wait. By and large the flush queue length remained under > control, except when doing backups (10pm, 8:30am). Memory usage by > rrdcached and collectd remained under control, but there is a long term > upward-trending slope to rrdcached's memory usage which is not good. > Possibly related to the power-of-two allocator patch that Florian > provided. The graph I find the most interesting one is the disk sdk disk > ops (3rd from the end). Before adding the last chunk of traffic the disk > load was write-dominated, which means that rrds were mostly cached in > memory (5-6 GB left after the processes). After adding the extra load > the disk load became read-dominated indicating that the rrd working set > exceeded memory. Thorsten, If you're becoming read dominated, you should consider lowering your file update/sec rate by increasing your -w/-f timers. This just trades one kind of cache memory (f/s blocks) for another (update strings). I'm sending a linear chunk allocator along for allocating cache_item_t.values in operator-defined block sizes.. I'd appreciate if you'd test it with your load to see if it reduces your CPU usage related to frequent realloc(). -- kevin brintnall =~ /kbrint@.../ > Thorsten > > > Thorsten von Eicken wrote: > > Thorsten von Eicken wrote: > > > >>> 37.1 % of the time it spent in ?handle_request_update? the daemon is > >>> actually waiting for ?realloc?. This is (to me) very unexpected and a > >>> schoolbook example of ?measure before you optimize?. > >>> > >>> I think we can get rid of this bottleneck by writing a specialized > >>> version of ?rrd_add_strdup? which reallocates powers of ten. Something > >>> like: > >>> > >>> [...] > >>> > >>> It'd be great if you could give the attached patch a try > >>> > >> spends all of its time (more than 99 %) in ?realloc?. So in consequence > >> Test is running, including Kevin's simplification... Thanks for the > >> help!!! > >> > > Things are again looking much better, almost great I should say! The one > > thing that still makes me a bit uncomfortable is that at the end of the > > second hour of run-time there was a cpu spike which caused collectd to > > grow rapidly. (Still using -w 3600 -z 3600 -f 7200, I put a load of ~50k > > tree nodes right from the start.) You can see the graphs at > > http://www.voneicken.com/dl/rrd/ look for the rrdcached-6* series. It > > flattens out nicely after the spike, but it's one of those things that > > tend to bite sooner or later. I'm not sure what to do about it. > > Thorsten > > > > _______________________________________________ > > rrd-developers mailing list > > rrd-developers@... > > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers > > > > > > _______________________________________________ > rrd-developers mailing list > rrd-developers@... > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problemkevin brintnall wrote:
> On Sat, Oct 31, 2009 at 09:52:15AM -0700, Thorsten von Eicken wrote: > >> Quick follow-up. I decided to add another 3k updates per second (extra >> 30k tree nodes) to my test run. See results in >> http://www.voneicken.com/dl/rrd/rrdcached-7.png >> What's interesting is that the server got somewhat overloaded sitting a >> lot in I/O wait. By and large the flush queue length remained under >> control, except when doing backups (10pm, 8:30am). Memory usage by >> rrdcached and collectd remained under control, but there is a long term >> upward-trending slope to rrdcached's memory usage which is not good. >> Possibly related to the power-of-two allocator patch that Florian >> provided. The graph I find the most interesting one is the disk sdk disk >> ops (3rd from the end). Before adding the last chunk of traffic the disk >> load was write-dominated, which means that rrds were mostly cached in >> memory (5-6 GB left after the processes). After adding the extra load >> the disk load became read-dominated indicating that the rrd working set >> exceeded memory. >> > > Thorsten, > > If you're becoming read dominated, you should consider lowering your file > update/sec rate by increasing your -w/-f timers. This just trades one > kind of cache memory (f/s blocks) for another (update strings). > getting a bit uncomfortable. I haven't re-tested the journal reading yet, but at some point restarting rrdcached becomes really difficult. But regardless, I suspect there are 3 performance regimes: - working set much smaller than memory - cpu scales linearly/smoothly with update rate - working set much larger than memory - it's all about random small disk I/O throughput - working set similar to memory - relatively sharp (?) transition from cpu/memory speed to disk speed It's pretty clear that the performance when operating in memory is very nice and smooth, but when memory is exceeded things become bumpy. It degrades nicely and recovers well: I saw 9k queue items at one point and that got worked off over close to an hour but did work itself out as opposed to spinning out of control. But it doesn't look like a safe operating regime. So basically gotta keep the rrd working set in memory. > I'm sending a linear chunk allocator along for allocating > cache_item_t.values in operator-defined block sizes.. I'd appreciate if > you'd test it with your load to see if it reduces your CPU usage related > to frequent realloc(). > > What value of -m do you suggest? -w divided by step size? Thanks, Thorsten _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
|
|
Re: rrdcached performance problem> It's pretty clear that the performance when operating in memory is very
> nice and smooth, but when memory is exceeded things become bumpy. It > degrades nicely and recovers well: I saw 9k queue items at one point and > that got worked off over close to an hour but did work itself out as > opposed to spinning out of control. 9k/hour seems awfully slow. Even my old development box can flush 55k/hour. Maybe more entries are being enqueued at the same time... > What value of -m do you suggest? -w divided by step size? That's the maximum; any more than that is just waste. IIRC you had around ~400 cached values per RRD file. So, if you use '-m 400' you'll only have call realloc() one time. However, the 401st value will cause a large realloc()... The ideal value is highly dependent on your malloc() implementation.. I would start with a power of two large enough to get you relatively few realloc() but small enough that you don't waste too much... In your case, maybe '-m 64'. -- kevin brintnall =~ /kbrint@.../ _______________________________________________ rrd-developers mailing list rrd-developers@... https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers |
| Free embeddable forum powered by Nabble | Forum Help |