|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next > |
|
|
Re: [xpressive] Performance Tuning?>> Do you think the overhead of calling through boost::bind could be
>> comparable to the length of time it takes to run the function? > > I don't know -- haven't looked -- where boost::bind was added or > why, but any unnecessary overhead should be eliminated unless it is > proven to be insignificant. It seems that it's not a problem. [....] >> I suggest something that simply iterates over the test data but >> does not check for correctness of parsing. Although it won't >> make a fat lot of difference in this case at least it's then >> consistent - you're timing the parsers not the tests for >> equality. The correctness test could then be done later once >> the timings are complete. > > I should have done that from the beginning. It would be best, I > think, to run through the data once for each test, to verify the > result. Then, timing runs can be done without any checks. Both > should run from main() each time so that any optimizations are > validated before considering the performance. Sure. >> Does the size of the test data set matter? In other words do you >> notice similar speedups if the test data will all fit in cache? > > Wouldn't that give less representative performance results? I guess it depends what you're trying to represent when measuring the performance. If you're trying to represent the [totally contrived] case of parsing short segments of text that all fit in cache surely it's better? Or is your question more regarding the distribution of the actual values under test rather than the length? My suggestion is merely the curiosity of seeing if there is a cache dependent effect or not rather than a 'representative' measurement. I don't for a minute think it's a good idea to permanently modify someone's hard thought through and exhaustive testing. As a tongue-in-cheek aside - there's potentially some merit in having a parser that's particularly fast for numbers starting with "123". http://en.wikipedia.org/wiki/Benford%27s_law Regards, -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Edward Grace wrote:
> > >> Does the size of the test data set matter? In other words do you > >> notice similar speedups if the test data will all fit in cache? > > > > Wouldn't that give less representative performance results? > > I guess it depends what you're trying to represent when > measuring the performance. If you're trying to represent the > [totally contrived] case of parsing short segments of text that > all fit in cache surely it's better? I was presuming to suggest that the more representative test was that of parsing across the spectrum of possible inputs such code likely will parse in a real finance program rather than contriving small tests to focus on isolated characteristics. > Or is your question more regarding the distribution of the > actual values under test rather than the length? I don't think the lengths will vary much beyond what appears in the test input other than, possibly, the lack of really short inputs. > My suggestion is merely the curiosity of seeing if there is a > cache dependent effect or not rather than a 'representative' > measurement. I don't for a minute think it's a good idea to > permanently modify someone's hard thought through and > exhaustive testing. I was afraid that the test code might be permanently altered and lose sight of what I think is a fairly representative set of test inputs. _____ Rob Stewart robert.stewart@... Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Tue, Jul 28, 2009 at 7:35 AM, Edward Grace<ej.grace@...> wrote:
> Perhaps you could try a more canonical test - running > "ejg_uint_parser_0_0_4.cpp" > > http://tinyurl.com/lro5ok > > That does not make use of boost::bind - but tries to avoid the optimizer > getting rid of void(void) functions by using globals. I have no clue why I am still awake, brain is already shut down, code is a blurry blob to me right now, but I still compiled and ran your latest file. :) The results: initializing input strings... Calibrating overhead......done Timer overhead (t_c) ~= : 12 Jitter ~= : 8.43769e-015 qi_parse vs atoi : 1381.05 1385.57 1389.6% faster. qi_parse vs strtol : 1365.82 1389.6 1597.94% faster. strtol vs atoi : 1.03202 1.04052 10.6428% faster. qi_parse vs qi_parse : 0 0 0% faster. Checking that the results are correct... atoi is behaving itself! strtol is behaving itself! qi is behaving itself! All done! _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>>> Wouldn't that give less representative performance results?
>> >> I guess it depends what you're trying to represent when >> measuring the performance. If you're trying to represent the >> [totally contrived] case of parsing short segments of text that >> all fit in cache surely it's better? > > I was presuming to suggest that the more representative test was > that of parsing across the spectrum of possible inputs such code > likely will parse in a real finance program Benford's law it is then... ;-) > rather than > contriving small tests to focus on isolated characteristics. Sure. > I was afraid that the test code might be permanently altered and > lose sight of what I think is a fairly representative set of > test inputs. No - no not at all... I fully understand your concern -- I'm just being curious! -ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>> That does not make use of boost::bind - but tries to avoid the
>> optimizer >> getting rid of void(void) functions by using globals. > > I have no clue why I am still awake, brain is already shut down, code > is a blurry blob to me right now, but I still compiled and ran your > latest file. :) Thanks. > The results: > > initializing input strings... > Calibrating overhead......done > Timer overhead (t_c) ~= : 12 > Jitter ~= : 8.43769e-015 > qi_parse vs atoi : 1381.05 1385.57 1389.6% faster. > qi_parse vs strtol : 1365.82 1389.6 1597.94% faster. > strtol vs atoi : 1.03202 1.04052 10.6428% faster. > qi_parse vs qi_parse : 0 0 0% faster. Hmmm.... So you observe the same as me. Spirit2 ~1000% faster than the basic libraries as opposed to 25% faster from the previous tests. "Curiouser and curiouser" said Alice. -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Tue, Jul 28, 2009 at 8:49 AM, Edward Grace<ej.grace@...> wrote:
>>> That does not make use of boost::bind - but tries to avoid the optimizer >>> getting rid of void(void) functions by using globals. >> >> I have no clue why I am still awake, brain is already shut down, code >> is a blurry blob to me right now, but I still compiled and ran your >> latest file. :) > > Thanks. > >> The results: >> >> initializing input strings... >> Calibrating overhead......done >> Timer overhead (t_c) ~= : 12 >> Jitter ~= : 8.43769e-015 >> qi_parse vs atoi : 1381.05 1385.57 1389.6% faster. >> qi_parse vs strtol : 1365.82 1389.6 1597.94% faster. >> strtol vs atoi : 1.03202 1.04052 10.6428% faster. >> qi_parse vs qi_parse : 0 0 0% faster. > > Hmmm.... So you observe the same as me. Spirit2 ~1000% faster than the > basic libraries as opposed to 25% faster from the previous tests. > > "Curiouser and curiouser" said Alice. Heh, when my brain is functional again, if I remember (or if I am reminded by this email) I will run it through a profiler and see what is going on as well as looking at the disassembly to see what kind of code it all creates. It is definitely odd though, if Spirit2.1 really is faster, then you would think the compiler library programmers would do a better job then what they are currently doing. I am really curious to find out what is going on, hopefully delving into assembly later will shed some light. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>
> Heh, when my brain is functional again, if I remember (or if I am > reminded by this email) I will run it through a profiler and see what > is going on as well as looking at the disassembly to see what kind of > code it all creates. That would be very instructive - not something I'm really much good at anymore.... > Cheers, -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Tue, Jul 28, 2009 at 9:00 AM, Edward Grace<ej.grace@...> wrote:
>> >> Heh, when my brain is functional again, if I remember (or if I am >> reminded by this email) I will run it through a profiler and see what >> is going on as well as looking at the disassembly to see what kind of >> code it all creates. > > That would be very instructive - not something I'm really much good at > anymore.... I used to do it all the time, editing disassembly to make programs do other things was always fun. :) Been a while though. I do not think I have the cognitive capabilities to be able to read assembler at the moment, but I went ahead and profiled the application, results are attached to preserve formatting. CS:EIP Symbol + Offset 64-bit Total % Timer samples 0x409dd0 ejg::statistics::ks_test<std::_Vector_iterator<double,std::allocator<double> >,double> 12.37 11661 0x40a400 std::_Unguarded_partition<double *> 9.74 9183 0x40d330 strtoxl 9.71 9157 0x408f40 ejg::generic_timer<_LARGE_INTEGER,long>::measure_execution_result<_LARGE_INTEGER (__cdecl*)(void)> 3.75 3533 0x401a50 wrap_qi_parse 3.47 3272 0x40aa80 std::_Insertion_sort1<double *,double> 0.94 884 0x401910 wrap_atoi 0.66 625 0x40d55b strtol 0.6 569 0x4019b0 wrap_strtol 0.43 401 0x40d2ae _LocaleUpdate::_LocaleUpdate 0.39 364 0x40d823 atol 0.21 194 0x40a7f0 std::_Median<double *> 0.18 174 0x4099d0 std::_Sort<double *,int> 0.08 77 0x413db4 TrailUpVec 0.05 47 0x401900 elapsed 0.03 27 0x409d20 ejg::generic_timer<_LARGE_INTEGER,long>::time_raw_atom<void (__cdecl*)(void)> 0.03 27 0x40d834 atoi 0.03 26 0x40d839 memmove_s 0.03 25 0x413c40 memmove 0.02 19 0x4018e0 getticks 0.01 14 0x40a6d0 ejg::statistics::qks<double> 0.01 6 0x413ee4 UnwindDownVec 0.01 8 0x4203fa __libm_sse2_exp 0.01 5 0x4068d0 std::vector<double,std::allocator<double> >::_Construct_n 0 1 0x40cc1c clock 0 3 0x40cf80 ceil 0 3 0x40cfc0 _ceil_pentium4 0 2 0x411970 _aulldiv 0 2 0x4119e0 _allmul 0 1 0x413d48 UnwindUpVec 0 2 0x413f50 TrailDownVec 0 4 31 functions, 718 instructions, Total: 40316 samples, 100.00% of samples in the module, 42.76% of total session samples _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Tue, Jul 28, 2009 at 9:05 AM, OvermindDL1<overminddl1@...> wrote:
> On Tue, Jul 28, 2009 at 9:00 AM, Edward Grace<ej.grace@...> wrote: >>> >>> Heh, when my brain is functional again, if I remember (or if I am >>> reminded by this email) I will run it through a profiler and see what >>> is going on as well as looking at the disassembly to see what kind of >>> code it all creates. >> >> That would be very instructive - not something I'm really much good at >> anymore.... > > I used to do it all the time, editing disassembly to make programs do > other things was always fun. :) > Been a while though. > > I do not think I have the cognitive capabilities to be able to read > assembler at the moment, but I went ahead and profiled the > application, results are attached to preserve formatting. > Well, atoi's assembly is a function call, no inlining there... And strtol in assembly is a function call too, no inlining or anything fancy. And heh, Spirit is completely inlined except for the call to boost::spirit::qi::detail::extract_int<int,10,1,-1,etc... However that function is only called the very first time that function is called. Remember that Spirit increments the passed in iterator, so all the start iterators in that vector ended up being incremented to be equal to the last iterator. :p I change the wrap_qi_parse function to this (introduced a temporary so the temporary is incremented instead of the thing you have stored in the vector): void wrap_qi_parse() { for (int i = 0; i < BUFFER_SIZE; ++i) { char const *iter = f[i]; qi::parse(iter, l[i], int_, v[i]); } } When I compile and run the tests now, I get this: initializing input strings... Calibrating overhead......done Timer overhead (t_c) ~= : 12 Jitter ~= : 8.43769e-015 qi_parse vs atoi : 170.429 170.438 170.482% faster. qi_parse vs strtol : 167.589 167.601 167.668% faster. strtol vs atoi : 1.04669 1.05746 1.06165% faster. qi_parse vs qi_parse : 0 0 0% faster. Checking that the results are correct... atoi is behaving itself! strtol is behaving itself! qi is behaving itself! All done! That is a *lot* more reasonable, although Spirit is still most definitely faster then the built-in functions. :) Heh, what do you know, my shut down brain still had some living brain cells. I have actually found that I, oddly, do my best debugging when really tired, at least when I was younger, guess I still do. :) _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>>>
> And heh, Spirit is completely inlined except for the call to > boost::spirit::qi::detail::extract_int<int,10,1,-1,etc... > However that function is only called the very first time that function > is called. Remember that Spirit increments the passed in iterator, so > all the start iterators in that vector ended up being incremented to > be equal to the last iterator. :p When you write the above, do you mean ** -> my change > However that function is only called the very first time *qi::parse* > is called. Remember that Spirit increments the passed in iterator, so > In other words only the first call does anything - repeated calls to qi::parse do nothing because the iterator is already at the end? Good catch. > I change the wrap_qi_parse function to this (introduced a temporary so > the temporary is incremented instead of the thing you have stored in > the vector): > void wrap_qi_parse() { > for (int i = 0; i < BUFFER_SIZE; ++i) > { > char const *iter = f[i]; > qi::parse(iter, l[i], int_, v[i]); > } > > } > > When I compile and run the tests now, I get this: > initializing input strings... > Calibrating overhead......done > Timer overhead (t_c) ~= : 12 > Jitter ~= : 8.43769e-015 > qi_parse vs atoi : 170.429 170.438 170.482% faster. > qi_parse vs strtol : 167.589 167.601 167.668% faster. > strtol vs atoi : 1.04669 1.05746 1.06165% faster. > qi_parse vs qi_parse : 0 0 0% faster. > > > > Checking that the results are correct... > atoi is behaving itself! > strtol is behaving itself! > qi is behaving itself! Good stuff. Thanks. I might just re-jig that to try and make use of boost::bind instead of that bad roll-your-own wrapper function. What's needed is a 'fire and forget' wrapping technique that can be applied to arbitrary functions and still give sensible answers. > That is a *lot* more reasonable, although Spirit is still most > definitely faster then the built-in functions. :) That's good though - one up for Boost! > Heh, what do you know, my shut down brain still had some living brain > cells. I have actually found that I, oddly, do my best debugging when > really tired, at least when I was younger, guess I still do. :) It's amazing what the promise of imminent sleep can do to motivate a battered cerebral cortex. It's only playing along because it knows it's the only way it'll get to rest! -ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?> When I compile and run the tests now, I get this:
> initializing input strings... > Calibrating overhead......done > Timer overhead (t_c) ~= : 12 > Jitter ~= : 8.43769e-015 > qi_parse vs atoi : 170.429 170.438 170.482% faster. > qi_parse vs strtol : 167.589 167.601 167.668% faster. > strtol vs atoi : 1.04669 1.05746 1.06165% faster. > qi_parse vs qi_parse : 0 0 0% faster. Hi OvermindDL1, When you've woken up would you mind taking a quick squiz at the following (anyone else - please feel free) ejg_uint_parser_0_0_4_bind_1.cpp in the following part of Boost Vault, http://tinyurl.com/lro5ok It's an attempt to crystallise the boo-boo you pointed out. I've tried to do everything without ghastly global variables - it's also a salient lesson on const correctness. If that'd been observed in the first place the iterator cock-up wouldn't have happened. ================== $ ./ejg_uint_parser Enter buffer size: 10000 initializing input strings... Checking that the parsers are functioning correctly... atoi is behaving itself! strtol is behaving itself! qi is behaving itself! Proceeding to timing tests.Calibrating overhead......done Timer overhead (t_c) ~= : 117.426 Jitter ~= : 25.9133 qi_parse vs atoi : 86.0764 86.3074 86.4471% faster. qi_parse vs strtol : 71.9253 72.1881 72.5288% faster. strtol vs atoi : 8.0502 8.26097 8.47215% faster. qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster. All done! ==================== On my platform this is entirely consistent with the simple one-liner modification you mentioned to the previous code. Take home message - yes Spirit really *is* faster. ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Edward Grace wrote:
>> That is a *lot* more reasonable, although Spirit is still most >> definitely faster then the built-in functions. :) > > That's good though - one up for Boost! My latest benchmarks for integers and floating points reveal a 3x speed over atoi/strtol and friend C functions. You mentioned a need to parse small numbers very quickly? Spirit does. The tests I have take that into consideration too. If you guys want to take a peek, it's in the Boost trunk in libs/benchmarks. Some numbers: /////////////////////////////////////////////////////////////////////////// atoi_test: 0.9265067422 [s] {checksum: d5b76d60} strtol_test: 1.0766213977 [s] {checksum: d5b76d60} spirit_int_test: 0.3097019879 [s] {checksum: d5b76d60} /////////////////////////////////////////////////////////////////////////// atof_test: 7.3012049917 [s] {checksum: 3b7d82b0} strtod_test: 8.0042894122 [s] {checksum: 3b7d82b0} spirit_double_test: 2.6729373333 [s] {checksum: 3b7d82b0} This time, I am using the benchmarking harness by David Abrahams, Matthias Troyer, Michael Gauckler. It's the one used in timing the overhead of Boost.Parameter. You are welcome to try it out with your timer, Edward. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On 28 Jul 2009, at 18:17, Joel de Guzman wrote: > Edward Grace wrote: > >>> That is a *lot* more reasonable, although Spirit is still most >>> definitely faster then the built-in functions. :) >> >> That's good though - one up for Boost! > > My latest benchmarks for integers and floating points reveal > a 3x speed over atoi/strtol and friend C functions. You > mentioned a need to parse small numbers very quickly? Spirit > does. The tests I have take that into consideration too. If you > guys want to take a peek, it's in the Boost trunk in libs/benchmarks. Sure. Can you give me an exact url? The last time I went on a code hunt in SVN I found the wrong thing. > Some numbers: > > ////////////////////////////////////////////////////////////////////// > ///// > atoi_test: 0.9265067422 [s] {checksum: d5b76d60} > strtol_test: 1.0766213977 [s] {checksum: d5b76d60} > spirit_int_test: 0.3097019879 [s] {checksum: d5b76d60} > > ////////////////////////////////////////////////////////////////////// > ///// > atof_test: 7.3012049917 [s] {checksum: 3b7d82b0} > strtod_test: 8.0042894122 [s] {checksum: 3b7d82b0} > spirit_double_test: 2.6729373333 [s] {checksum: 3b7d82b0} > > This time, I am using the benchmarking harness by David Abrahams, > Matthias Troyer, Michael Gauckler. This? http://tinyurl.com/kk858o There's some interesting trickery in there by the looks of things for eliminating the optimiser nastiness - that's not something I've thought about much I'll take a look. In the comments, 42 // operation to at least update the L1 cache. *** Note: This 43 // concern is specific to the particular application at which 44 // we're targeting the test. *** that seems quite important but a little opaque out of context. One thing I take exception to is the (effective) use of the mean as a measurement of central tendency - perhaps their trickery has eliminated the heavy tail. I'll have to take a look and see how it compares to my approach. How do your relative timings compare if you repeat them while (say) watching a DVD? [*] > It's the one used in timing > the overhead of Boost.Parameter. You are welcome to try it out > with your timer, Edward. Likewise, can you be specific about where to look? Thanks for flagging this up by the way - who knows maybe I've been reinventing the wheel! Cheers, -ed [*] This may seem a perverse question. I'm interested in robust performance measurement, in other words accurately working out which function is fastest while the machine is under choppy loading -- not a sanitised testing environment - so the fastest function can be selected by the code itself. ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?on Tue Jul 28 2009, Edward Grace <ej.grace-AT-imperial.ac.uk> wrote: > This? > > http://tinyurl.com/kk858o > > There's some interesting trickery in there by the looks of things for eliminating the > optimiser nastiness - that's not something I've thought about much I'll take a look. > > In the comments, > > 42 // operation to at least update the L1 cache. *** Note: This > 43 // concern is specific to the particular application at which > 44 // we're targeting the test. *** > > that seems quite important but a little opaque out of context. It means the application we were going to use this technique on was going to update a long series of accumulators, which will certainly not fit in a few registers. We wanted to force the test to reflect that reality. > One thing I take exception to is the (effective) use of the mean as a > measurement of central tendency Why? > - perhaps their trickery has eliminated the heavy tail. Why would one assume there is a heavy tail in the first place? -- Dave Abrahams BoostPro Computing http://www.boostpro.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?I'm really appreciative of the testing you are all doing with parsing code
and hope that at the end we can see both how fast and how clear and maintainable the various styles of parser code become (focus on accuracy, speed then perhaps more on the clarity). Its great to see the enthusiasm and results. Looking forward to more tips once we have a fair test directly pitting spirit with expressive and other parsing methodologies. I also am encouraged by what Edward's timer might add though I'm a little wary of possible inclusion of code following FFTW's license as it may be incompatible with Boost. Edward, I do wonder about the statistical significance metrics you provide (great idea by the way). I'm wondering if your code assumes a normal timing duration distribution and if so, can it 'measure' whether the distribution of results conform to this assumption. In my experience, timing benchmarks can suffer from having outliers (usually OS-induced by not having a RTOS) that make the distribution less than 'normal'. Would it be possible to establish that the given sample set of timings correspond to a normal distribution (or perhaps discard a certain percentage of outliers if necessary). I'm no statistics person, but I have seen cases where 10,000 timing samples have been biased by 10 samples that probably relate to a program issue or a severe OS issue. e.g. normally 1 ms per iteration, 10 @ 1-10s each _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>>
>> 42 // operation to at least update the L1 cache. *** Note: This >> 43 // concern is specific to the particular application at which >> 44 // we're targeting the test. *** >> >> that seems quite important but a little opaque out of context. > > It means the application we were going to use this technique on was > going to update a long series of accumulators, which will certainly > not > fit in a few registers. We wanted to force the test to reflect that > reality. Ah, I did some hunting and found things for the iterative calculation of the mean - presumably this is an embodiment of the original application. How various caches interact is not something I've given a great deal of thought to. In practice it can presumably make or break the real 'speed' of something. I can see that there's a great difference in timing a large bunch of iterations of the same thing one after another and the actual typical execution characteristics of the same function in a different context. This kind of thing is something I have to shrug my shoulders at - I can see it's a potential issue but don't know enough to comment. >> One thing I take exception to is the (effective) use of the mean as a >> measurement of central tendency > > Why? The mean is not a robust measure of central tendency. Robust, in this context, means that a single arbitrarily large observation can shift the estimator (the mean in this case) arbitrarily far. The median on the other hand is robust, a single arbitrarily large observation may not move the median at all. Technically the 'breakdown point' of the mean is zero. For the obligatory Wikipedia overview see: http://tinyurl.com/l4vldr >> - perhaps their trickery has eliminated the heavy tail. > > Why would one assume there is a heavy tail in the first place? Informally speaking one may expect the OS to very-very occasionally interrupt a running function that nominally takes (say) 100 cycles and run of to do some disk IO. That delay may will run into hundreds of millions of clock cycles. Consequently that observation may differ from all the others by (say) six orders of magnitude. Most probably of course that will not occur; that's a long tail - very rare events of extreme magnitude. A bit like returns in stock market crashes only less frequent. ;-) Less informally I've measured it! On a log-log histogram plot of ~10^7 measurements even trivial functions display a definite hint of Pareto-like [ http://tinyurl.com/ngnyk2 ] behaviour in the wings. -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>
Paul, [Excuse the following - perhaps it's an attack of logorrhea ] > I also am encouraged by what Edward's timer might add though I'm a > little > wary of possible inclusion of code following FFTW's license as it > may be > incompatible with Boost. I'm pretty sure the FFTW license is Boost compatible (judging by the comments in the header and the BSD-like adoption of FFTW in many commercial and open source projects) however my aim is for a robust performance framework that is chronometer agnostic - there's no need to use FFTW's timer. The getticks() macro just happens to be a handy cross platform implementation! There are other concerns that I need to address regarding the licensing and provenance of this code. For now I am simply keen to see if it works for other people and in other real world situations - hence the application to the boost::spirit / xpressive debate. After all, you can read books on nonparametric statistical inference until you are paralysed with inaction - the only real way to know if it works is to try it and get other people involved and apply it in real-world situations. From that I'm sure I will learn a lot and then be in a situation to weed out unnecessary cruft (e.g. bootstrap resampling) and add useful stuff (potentially the cache related code mentioned earlier). After that it's a case of reviewing everything that's left after I've done my weeding and making sure it's appropriate. > Edward, I do wonder about the statistical significance metrics you > provide > (great idea by the way). > I'm wondering if your code assumes a normal timing duration > distribution and > if so, can it 'measure' whether the distribution of results conform > to this > assumption. It makes few assumptions about any underlying probability density function (not even that it's strongly stationary) since it's based on nonparametric techniques. Key to this is that, while the underlying distribution may not be normal or even known, the distribution of certain statistics estimated from a sample *is* well known. For example, the median is asymptotically normal (with certain constraints on the PDF). In other words if you take a large sample and calculate the median - then repeat this ad nauseam, the resulting medians will be approximately normally distributed. For a concrete example this is put to use in the calculation of the confidence bounds - by using the Wilcoxon signed-rank test for which the W+ statistic has a well known distribution. We can therefore determine if one function is faster than another in a (hopefully) statistically significant way. > In my experience, timing benchmarks can suffer from having outliers I don't view these as 'outliers' in the normal usage of 'measurement error' - they are measurements that are just as valid as any other - one just has to ask "what is this measuring - the function under test or the OS?". Often it's the latter but there is (as I see it) no clear border-line between classifying any given measurement as one and not the other. Consequently I view 'throwing away' (trimming) measurements with great suspicion. > (usually > OS-induced by not having a RTOS) RTOS? Real-time OS?? Do these eliminate this effect? > that make the distribution less than > 'normal'. The distribution would not be normal even in principle. It must have a well defined minimum, but (in principle) has no upper bound - the (gross) lack of symmetry rules out normality in a heartbeat! > Would it be possible to establish that the given sample set of timings > correspond to a normal distribution (or perhaps discard a certain > percentage > of outliers if necessary). It is possible to determine if the sample corresponds to a known distribution - the Kolmogorov-Smirnov test. However which one should we check? There is no fundamental requirement for the measurements to belong to any family - even vaguely. In fact it may not even be smooth! > I'm no statistics person, but I have seen cases > where 10,000 timing samples have been biased by 10 samples that > probably > relate to a program issue or a severe OS issue. e.g. normally 1 ms per > iteration, 10 @ 1-10s each Yes - that issue has a dramatic impact on the robustness of (or lack thereof) of the mean. Often of course these things are even vaguely IID (independent and identically distributed) - burst like behaviour can lead to strongly non-stationary behaviour. For example some garbage collector might go nuts and skew your timing. That's why, in my view, it's so important to compare relative timings and to obtain confidence bounds. Pragmatically one might argue that when you're doing these things at the console during development and making decisions it's not really a major issue. One can look at the numbers and say "hmm, fishy - I don't trust that" and tinker around until the result is in line with your expectation. That's part of the art that is experimental science and many of us do it everyday without even realising. How to codify that opinion into something that's autonomous and capable of e.g. selecting the 'correct' function in a high-load multi- user environment in the real world is far from clear however. That is my goal. -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Tue, Jul 28, 2009 at 11:09 AM, Edward Grace<ej.grace@...> wrote:
>> When I compile and run the tests now, I get this: >> initializing input strings... >> Calibrating overhead......done >> Timer overhead (t_c) ~= : 12 >> Jitter ~= : 8.43769e-015 >> qi_parse vs atoi : 170.429 170.438 170.482% faster. >> qi_parse vs strtol : 167.589 167.601 167.668% faster. >> strtol vs atoi : 1.04669 1.05746 1.06165% faster. >> qi_parse vs qi_parse : 0 0 0% faster. > > Hi OvermindDL1, > > When you've woken up would you mind taking a quick squiz at the following > (anyone else - please feel free) > > ejg_uint_parser_0_0_4_bind_1.cpp > > in the following part of Boost Vault, http://tinyurl.com/lro5ok > > It's an attempt to crystallise the boo-boo you pointed out. I've tried to > do everything without ghastly global variables - it's also a salient lesson > on const correctness. If that'd been observed in the first place the > iterator cock-up wouldn't have happened. > > > ================== > $ ./ejg_uint_parser > Enter buffer size: 10000 > initializing input strings... > > > > Checking that the parsers are functioning correctly... > atoi is behaving itself! > strtol is behaving itself! > qi is behaving itself! > > Proceeding to timing tests.Calibrating overhead......done > Timer overhead (t_c) ~= : 117.426 > Jitter ~= : 25.9133 > qi_parse vs atoi : 86.0764 86.3074 86.4471% faster. > qi_parse vs strtol : 71.9253 72.1881 72.5288% faster. > strtol vs atoi : 8.0502 8.26097 8.47215% faster. > qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster. > > > All done! > ==================== > > > On my platform this is entirely consistent with the simple one-liner > modification you mentioned to the previous code. > > Take home message - yes Spirit really *is* faster. Enter buffer size: 10000 initializing input strings... Checking that the parsers are functioning correctly... atoi is behaving itself! strtol is behaving itself! qi is behaving itself! Proceeding to timing tests.Calibrating overhead......done Timer overhead (t_c) ~= : 12 Jitter ~= : 8.43769e-015 qi_parse vs atoi : 160.834 187.892 197.781% faster. qi_parse vs strtol : 152.088 173.709 197.184% faster. strtol vs atoi : 5.34019 7.29527 9.82952% faster. qi_parse vs qi_parse : -3.12862 -0.194198 1.53912% faster. All done! MSVC definitely compiles templates code better then GCC it seems (you said you were using GCC yes?). Also, I think I might know why QI is faster then atoi/strtol. atoi/strtol handle local as I recall, QI does not... _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?OvermindDL1 wrote:
> > Also, I think I might know why QI is faster then atoi/strtol. > atoi/strtol handle local as I recall, QI does not... Sorry, I can't parse that. What do you mean by "handle local"? Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Tue, Jul 28, 2009 at 8:14 PM, Joel de
Guzman<joel@...> wrote: > OvermindDL1 wrote: >> >> Also, I think I might know why QI is faster then atoi/strtol. >> atoi/strtol handle local as I recall, QI does not... > > Sorry, I can't parse that. What do you mean by "handle local"? Er, locale* _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |