|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next > |
|
|
Re: [xpressive] Performance Tuning?OvermindDL1 wrote:
> On Tue, Jul 28, 2009 at 8:14 PM, Joel de > Guzman<joel@...> wrote: >> OvermindDL1 wrote: >>> Also, I think I might know why QI is faster then atoi/strtol. >>> atoi/strtol handle local as I recall, QI does not... >> Sorry, I can't parse that. What do you mean by "handle local"? > > Er, locale* Sure, that's one. The reason is because spirit numerics are customizable. And they are even more customizable than the locales provide. For example, you can tweak the floating point parsers to handle numbers like: 1,234,567,890 or 1.234567890 exp -200 with the same high performance. And, surely, you can customize them to handle locales. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Paul Baxter wrote:
> I'm really appreciative of the testing you are all doing with parsing > code and hope that at the end we can see both how fast and how clear and > maintainable the various styles of parser code become (focus on > accuracy, speed then perhaps more on the clarity). > > Its great to see the enthusiasm and results. Looking forward to more > tips once we have a fair test directly pitting spirit with expressive > and other parsing methodologies. Keep i mind that we are testing just one small aspect of parsing/ text processing. It just so happens that spirit is designed to be optimal in this *specific* use case. xpressive will definitely shine in other areas (e.g. searching, replacing). So, as always, use the right tool for the job. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Joel de Guzman wrote:
> Paul Baxter wrote: >> I'm really appreciative of the testing you are all doing with parsing >> code and hope that at the end we can see both how fast and how clear >> and maintainable the various styles of parser code become (focus on >> accuracy, speed then perhaps more on the clarity). >> >> Its great to see the enthusiasm and results. Looking forward to more >> tips once we have a fair test directly pitting spirit with expressive >> and other parsing methodologies. > > Keep i mind that we are testing just one small aspect of parsing/ > text processing. It just so happens that spirit is designed to be > optimal in this *specific* use case. xpressive will definitely > shine in other areas (e.g. searching, replacing). So, as always, > use the right tool for the job. IIRC, xpressive was faster than Spirit 1. Spirit 2 has leapfrogged xpressive in performance. That pleases me -- a regex engine with exhaustive backtracking semantics has no business being faster than a parser generator. :-) Kudos to Joel and Hartmut. Sorry I haven't kept up with this discussion. Things in my day job have taken a turn for the busy. -- Eric Niebler BoostPro Computing http://www.boostpro.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Edward Grace wrote:
> On 28 Jul 2009, at 18:17, Joel de Guzman wrote: >> This time, I am using the benchmarking harness by David Abrahams, >> Matthias Troyer, Michael Gauckler. > > This? > > http://tinyurl.com/kk858o > > There's some interesting trickery in there by the looks of things for > eliminating the optimiser nastiness - that's not something I've thought > about much I'll take a look. The technique of using an accumulator to keep the whole system "plugged" is one of the ideas you can use versus the need for global variables, volatile, etc. This comment and code returning from main says a lot: // This is ultimately responsible for preventing all the test code // from being optimized away. Change this to return 0 and you // unplug the whole test's life support system. return test::live_code != 0; Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>> This?
>> >> http://tinyurl.com/kk858o >> >> There's some interesting trickery in there by the looks of things for >> eliminating the optimiser nastiness - that's not something I've >> thought >> about much I'll take a look. > > The technique of using an accumulator to keep the whole system > "plugged" is one of the ideas you can use versus the need for > global variables, volatile, etc. I'm clearly going to have to ponder this in some depth. While I've thought about trying to get the maths (statistics) right, I've not really given the nitty gritty of the machine's operation a great deal of thought. Then again, that's the whole point of engaging in discussion on this mailing list! > This comment and code returning > from main says a lot: > > // This is ultimately responsible for preventing all the test > code > // from being optimized away. Change this to return 0 and you > // unplug the whole test's life support system. > return test::live_code != 0; > Indeed, I'd spotted that. It's a nice comment! ;-) -ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>> Proceeding to timing tests.Calibrating overhead......done
>> Timer overhead (t_c) ~= : 117.426 >> Jitter ~= : 25.9133 >> qi_parse vs atoi : 86.0764 86.3074 86.4471% faster. >> qi_parse vs strtol : 71.9253 72.1881 72.5288% faster. >> strtol vs atoi : 8.0502 8.26097 8.47215% faster. >> qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster. >> >> >> All done! >> ==================== >> >> >> On my platform this is entirely consistent with the simple one-liner >> modification you mentioned to the previous code. >> >> Take home message - yes Spirit really *is* faster. > > Enter buffer size: 10000 > initializing input strings... > > > > Checking that the parsers are functioning correctly... > atoi is behaving itself! > strtol is behaving itself! > qi is behaving itself! > > Proceeding to timing tests.Calibrating overhead......done > Timer overhead (t_c) ~= : 12 > Jitter ~= : 8.43769e-015 > qi_parse vs atoi : 160.834 187.892 197.781% faster. > qi_parse vs strtol : 152.088 173.709 197.184% faster. > strtol vs atoi : 5.34019 7.29527 9.82952% faster. > qi_parse vs qi_parse : -3.12862 -0.194198 1.53912% faster. > > All done! > [fixed with font required] Interesting. So, there is no difference between the speedup of strtol and atoi between your platform and mine (their confidence intervals overlap). strtol vs atoi OvermindDL1: [5.3 ---------x-------- 9.8] % Ed: [8.1 --x- 8.5] % on the other hand qi_parse is significantly faster under Windows on your architecture compared to atoi than OS X. qi_parse vs atoi OvermindDL1: [160.8 ---------x-- 197.8] % Ed: [86.0 -----x----- 86.4] % Likewise in both cases the timing of qi_parse against itself shows no difference since the confidence interval includes zero. qi_parse vs qi_parse OvermindDL1: [-3.13 ----------- 0 ----------- +1.54] % Ed: [-0.03 -- 0 -- +0.23] % > MSVC definitely compiles templates code better then GCC More warnings? > it seems (you > said you were using GCC yes?). Yes.... I wonder why am I starting to feel sheepish about that... -ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Edward Grace wrote:
>>> This? >>> >>> http://tinyurl.com/kk858o >>> >>> There's some interesting trickery in there by the looks of things for >>> eliminating the optimiser nastiness - that's not something I've thought >>> about much I'll take a look. >> >> The technique of using an accumulator to keep the whole system >> "plugged" is one of the ideas you can use versus the need for >> global variables, volatile, etc. > > I'm clearly going to have to ponder this in some depth. While I've > thought about trying to get the maths (statistics) right, I've not > really given the nitty gritty of the machine's operation a great deal of > thought. Well, there you have it. I'd love to have your expertise on statistics plus Matthias Troyer, et al. test harness be combined in an easy to use benchmarking library. Benchmarking is such a black art :-) ! Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Wed, Jul 29, 2009 at 4:20 AM, Edward Grace<ej.grace@...> wrote:
>>> Proceeding to timing tests.Calibrating overhead......done >>> Timer overhead (t_c) ~= : 117.426 >>> Jitter ~= : 25.9133 >>> qi_parse vs atoi : 86.0764 86.3074 86.4471% faster. >>> qi_parse vs strtol : 71.9253 72.1881 72.5288% faster. >>> strtol vs atoi : 8.0502 8.26097 8.47215% faster. >>> qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster. >>> >>> >>> All done! >>> ==================== >>> >>> >>> On my platform this is entirely consistent with the simple one-liner >>> modification you mentioned to the previous code. >>> >>> Take home message - yes Spirit really *is* faster. >> >> Enter buffer size: 10000 >> initializing input strings... >> >> >> >> Checking that the parsers are functioning correctly... >> atoi is behaving itself! >> strtol is behaving itself! >> qi is behaving itself! >> >> Proceeding to timing tests.Calibrating overhead......done >> Timer overhead (t_c) ~= : 12 >> Jitter ~= : 8.43769e-015 >> qi_parse vs atoi : 160.834 187.892 197.781% faster. >> qi_parse vs strtol : 152.088 173.709 197.184% faster. >> strtol vs atoi : 5.34019 7.29527 9.82952% faster. >> qi_parse vs qi_parse : -3.12862 -0.194198 1.53912% faster. >> >> All done! >> > > [fixed with font required] > > Interesting. So, there is no difference between the speedup of strtol and > atoi between your platform and mine (their confidence intervals overlap). > > > strtol vs atoi > > OvermindDL1: [5.3 ---------x-------- 9.8] % > > Ed: [8.1 --x- 8.5] % > > on the other hand qi_parse is significantly faster under Windows on your > architecture compared to atoi than OS X. > > qi_parse vs atoi > > OvermindDL1: [160.8 > ---------x-- 197.8] % > > Ed: [86.0 -----x----- 86.4] % > > > Likewise in both cases the timing of qi_parse against itself shows no > difference since the confidence interval includes zero. > > qi_parse vs qi_parse > > OvermindDL1: [-3.13 ----------- 0 ----------- +1.54] % > > Ed: [-0.03 -- 0 -- +0.23] % > >> MSVC definitely compiles templates code better then GCC > > More warnings? > >> it seems (you >> said you were using GCC yes?). > > Yes.... I wonder why am I starting to feel sheepish about that... Heh, I do have cygwin completely installed and fully updated on my computer here (the recommended beta version that uses gcc 4.3 as I recall) and I do have boost trunk (from a couple weeks ago anyway) in there and compiled for it. If you can tell me what command I need to type to compile the file with all necessary optimizations, I will do that here too. Since I will be running it on the exact same computer with the same OS, but different compilers, that will prove if it really is GCC being much slower then VC, or if they are near the same on my computer, then it is something else. So, what should I type assuming I have the ejg test file and cycle.h in the current directory with the ejg in ./other_includes/ejg for full optimizations and everything? _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning? - was (Re: [regex] "I want everything" should build "out of the box"(onWindows))>>
>>> it seems (you >>> said you were using GCC yes?). >> >> Yes.... I wonder why am I starting to feel sheepish about that... > > Heh, I do have cygwin completely installed and fully updated on my > computer here (the recommended beta version that uses gcc 4.3 as I > recall) and I do have boost trunk (from a couple weeks ago anyway) in > there and compiled for it. If you can tell me what command I need to > type to compile the file with all necessary optimizations, I will do > that here too. Hi, assuming: * Your homed directory is $HOME * Spirit2 is in $HOME/spirit21_root, which should contain boost/ spirit/actor.hpp * The latest Boost is in $HOME/boost_root, which should contain boost/any.hpp * cycle.h is in the same directory as the file ejg_uint_parser_0_0_4_bind_1.cpp * The ejg timer stuff is in $HOME/ejg_root, this should contain ejg/timer.hpp the following stanza will work in bash (note the backslashes to break the line), first we define some environment variables for legibility, then fire up g++, # ------ cut ----- SPIRIT2=$HOME/spirit21_root BOOST=$HOME/boost_root EJG=$HOME/ejg_root g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror \ -I$SPIRIT2 -I$BOOST -I$EJG -o ejg_uint_parser \ ejg_uint_parser_0_0_4_bind_1.cpp # ----- cut ------- The following is a synopsis of what the bits mean, in case it's not obvious. -DNDEBUG -> equivalent to #define NDEBUG, should switch off any debug parts of Boost -O3 -> Optimisation level 3 - pretty much all in! -ansi -> Require ANSI compliance of the language! -pedantic -> Really really mean it! -Wall -> Warn about everything (alegedly) -Wno-long-long -> Do not warn about long long not being a mandated C+ + standard type. -Werror -> Convert warnings to errors -I<blah> -> Include <blah> as a directory to search for include files along with the standard locations. -o <blah> -> Generate the binary file <blah> I often forget about -DNDEBUG - this can have a significant impact ~10% for Spirit2 over atoi. Presumably you define NDEBUG when compiling on Windows (or is that automatically assumed for 'Release' builds?). > Since I will be running it on the exact same computer > with the same OS, but different compilers, that will prove if it > really is GCC being much slower then VC, or if they are near the same > on my computer, then it is something else. I await with trepidation.... -ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>> I'm clearly going to have to ponder this in some depth. While I've
>> thought about trying to get the maths (statistics) right, I've not >> really given the nitty gritty of the machine's operation a great >> deal of >> thought. > > Well, there you have it. I'd love to have your expertise on statistics I claim none! I've been trying to learn as I go - perhaps a little knowledge is a dangerous thing! > plus Matthias Troyer, et al. test harness be combined in an easy to > use benchmarking library. Be careful what you wish for, it might come true... ;-) > Benchmarking is such a black art :-) ! I totally agree! Like presumably many others, this appeared to me to be a trivial problem - at first sight. In fact it's anything but straightforward. I bet there are half a dozen or so PhDs sitting in this particular dark recess of computing. -ed _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Wed, Jul 29, 2009 at 9:17 AM, Edward Grace<ej.grace@...> wrote:
>>> >>>> it seems (you >>>> said you were using GCC yes?). >>> >>> Yes.... I wonder why am I starting to feel sheepish about that... >> >> Heh, I do have cygwin completely installed and fully updated on my >> computer here (the recommended beta version that uses gcc 4.3 as I >> recall) and I do have boost trunk (from a couple weeks ago anyway) in >> there and compiled for it. If you can tell me what command I need to >> type to compile the file with all necessary optimizations, I will do >> that here too. > > Hi, assuming: > > * Your homed directory is $HOME > * Spirit2 is in $HOME/spirit21_root, which should contain > boost/spirit/actor.hpp > * The latest Boost is in $HOME/boost_root, which should contain > boost/any.hpp > * cycle.h is in the same directory as the file > ejg_uint_parser_0_0_4_bind_1.cpp > * The ejg timer stuff is in $HOME/ejg_root, this should contain > ejg/timer.hpp > > the following stanza will work in bash (note the backslashes to break the > line), first we define some environment variables for legibility, then fire > up g++, > > # ------ cut ----- > > SPIRIT2=$HOME/spirit21_root > BOOST=$HOME/boost_root > EJG=$HOME/ejg_root > > g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror \ > -I$SPIRIT2 -I$BOOST -I$EJG -o ejg_uint_parser \ > ejg_uint_parser_0_0_4_bind_1.cpp > > > # ----- cut ------- > > The following is a synopsis of what the bits mean, in case it's not obvious. > > -DNDEBUG -> equivalent to #define NDEBUG, should switch off any debug parts > of Boost > -O3 -> Optimisation level 3 - pretty much all in! > -ansi -> Require ANSI compliance of the language! > -pedantic -> Really really mean it! > -Wall -> Warn about everything (alegedly) > -Wno-long-long -> Do not warn about long long not being a mandated C++ > standard type. > -Werror -> Convert warnings to errors > -I<blah> -> Include <blah> as a directory to search for include files along > with the standard locations. > -o <blah> -> Generate the binary file <blah> > > I often forget about -DNDEBUG - this can have a significant impact ~10% for > Spirit2 over atoi. Presumably you define NDEBUG when compiling on Windows > (or is that automatically assumed for 'Release' builds?). > >> Since I will be running it on the exact same computer >> with the same OS, but different compilers, that will prove if it >> really is GCC being much slower then VC, or if they are near the same >> on my computer, then it is something else. > > I await with trepidation.... Very useful info, thanks. I altered slightly to support my setup, since boost trunk is in the /etc/usr/lib or something like that, it is picked up globally. I tried compiling it multiple times, got the same warning each time, never could find the output file. Here is my latest attempt, I tried fully qualified names, did not help: OvermindDL1@overmind /cygdrive/s/cygwin/home/OvermindDL1/ejg_test $ g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror -I/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/other_includes -o /cygdrive/s/cygwin/home/OvermindDL1/ejg_test/ejg_uint_parser ejg_uint_parser_0_0_4_bind_1.cpp /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h: In copyconstructor `std::vector<_Tp, _Alloc>::vector(const std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc = std::allocator<std::string>]': /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h:82: warning: '__cur' might be used uninitialized in this function So I 'guess' it worked? But I cannot find any file it created at all... Hmm, I am going to try removing -Wall. Yep! It worked! Executing it now, results: Enter buffer size: 10000 initializing input strings... Checking that the parsers are functioning correctly... atoi is behaving itself! strtol is behaving itself! qi is behaving itself! Proceeding to timing tests.Calibrating overhead......done Timer overhead (t_c) ~= : 9.67512 Jitter ~= : 7.47951 qi_parse vs atoi : 78.1417 81.6005 86.3038% faster. qi_parse vs strtol : 76.5284 85.2148 86.8329% faster. strtol vs atoi : -4.67955 -2.60676 -0.488886% faster. qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster. All done! I ran the Visual Studio build again, still got about the same as I got in my last email, so yea, Visual Studio is a lot better on templates then GCC is. I wonder why GCC performs so much worse... _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Wed, 29 Jul 2009 18:48:49 -0600, OvermindDL1 <overminddl1@...>
wrote: > Proceeding to timing tests.Calibrating overhead......done > Timer overhead (t_c) ~= : 9.67512 > Jitter ~= : 7.47951 > qi_parse vs atoi : 78.1417 81.6005 86.3038% faster. > qi_parse vs strtol : 76.5284 85.2148 86.8329% faster. > strtol vs atoi : -4.67955 -2.60676 -0.488886% faster. > qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster. Folks, how do you get these 86% ? What is the formula? 'cause I know I am missing something, since F(78.14,81.6) somehow gives 0.86, but what is the F then... Thank you very much, Andrey _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?D'oh... after looking into the code I realized that this is actually
min%-median%-max% speedup. Somehow I thought that only the last number is meanful. Would be nice to have something like "min=78% med=81% max=86% speedup over 100000 iterations" Oh well, I guess I was the only one puzzled :) Sorry for the noise. A. On Wed, 29 Jul 2009 20:10:46 -0600, Andrey Tcherepanov <moyt63c02@...> wrote: > On Wed, 29 Jul 2009 18:48:49 -0600, OvermindDL1 <overminddl1@...> > wrote: > >> Proceeding to timing tests.Calibrating overhead......done >> Timer overhead (t_c) ~= : 9.67512 >> Jitter ~= : 7.47951 >> qi_parse vs atoi : 78.1417 81.6005 86.3038% faster. >> qi_parse vs strtol : 76.5284 85.2148 86.8329% faster. >> strtol vs atoi : -4.67955 -2.60676 -0.488886% faster. >> qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster. > > Folks, how do you get these 86% ? What is the formula? > 'cause I know I am missing something, since F(78.14,81.6) somehow gives > 0.86, but what is the F then... > > Thank you very much, > > Andrey > > _______________________________________________ > Unsubscribe & other changes: > http://lists.boost.org/mailman/listinfo.cgi/boost > _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Edward Grace skrev:
>>> I'm clearly going to have to ponder this in some depth. While I've >>> thought about trying to get the maths (statistics) right, I've not >>> really given the nitty gritty of the machine's operation a great deal of >>> thought. >> >> Well, there you have it. I'd love to have your expertise on statistics > > I claim none! I've been trying to learn as I go - perhaps a little > knowledge is a dangerous thing! > >> plus Matthias Troyer, et al. test harness be combined in an easy to >> use benchmarking library. > > Be careful what you wish for, it might come true... ;-) > >> Benchmarking is such a black art :-) ! > > I totally agree! Like presumably many others, this appeared to me to be > a trivial problem - at first sight. In fact it's anything but > straightforward. I bet there are half a dozen or so PhDs sitting in this > particular dark recess of computing. > Well, I normally use a tool like vtune running the code on real data. Do you think such a tool is unreliable? -Thorsten _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Thorsten Ottosen wrote:
> Edward Grace skrev: >>>> I'm clearly going to have to ponder this in some depth. While I've >>>> thought about trying to get the maths (statistics) right, I've not >>>> really given the nitty gritty of the machine's operation a great >>>> deal of >>>> thought. >>> >>> Well, there you have it. I'd love to have your expertise on statistics >> >> I claim none! I've been trying to learn as I go - perhaps a little >> knowledge is a dangerous thing! >> >>> plus Matthias Troyer, et al. test harness be combined in an easy to >>> use benchmarking library. >> >> Be careful what you wish for, it might come true... ;-) >> >>> Benchmarking is such a black art :-) ! >> >> I totally agree! Like presumably many others, this appeared to me to >> be a trivial problem - at first sight. In fact it's anything but >> straightforward. I bet there are half a dozen or so PhDs sitting in >> this particular dark recess of computing. >> > > Well, I normally use a tool like vtune running the code on real data. Do > you think such a tool is unreliable? I'd like my benchmarks to be freely verifiable by anyone. It has to be cross-platform, free and can be bundled with Boost. Regards, -- Joel de Guzman http://www.boostpro.com http://spirit.sf.net _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?>> I often forget about -DNDEBUG - this can have a significant impact
>> ~10% for >> Spirit2 over atoi. Presumably you define NDEBUG when compiling on >> Windows >> (or is that automatically assumed for 'Release' builds?). >> >>> Since I will be running it on the exact same computer >>> with the same OS, but different compilers, that will prove if it >>> really is GCC being much slower then VC, or if they are near the >>> same >>> on my computer, then it is something else. >> >> I await with trepidation.... > > Very useful info, thanks. I altered slightly to support my setup, > since boost trunk is in the /etc/usr/lib or something like that, it is > picked up globally. I tried compiling it multiple times, got the same > warning each time, never could find the output file. With -Werror any warnings will be interpreted as errors. If there are compilation errors it won't build the object / executable. > Here is my > latest attempt, I tried fully qualified names, did not help: > OvermindDL1@overmind /cygdrive/s/cygwin/home/OvermindDL1/ejg_test > $ g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror > -I/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/other_includes -o > /cygdrive/s/cygwin/home/OvermindDL1/ejg_test/ejg_uint_parser > ejg_uint_parser_0_0_4_bind_1.cpp > /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/ > stl_uninitialized.h: > In copyconstructor `std::vector<_Tp, _Alloc>::vector(const > std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc = > std::allocator<std::string>]': > /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/ > stl_uninitialized.h:82: > warning: '__cur' might be used uninitialized in this function Hmm.. I've never come across that before. It looks like it's flagging an error relating to something in the setup of the iterators for parse_qi. > So I 'guess' it worked? But I cannot find any file it created at > all... > > Hmm, I am going to try removing -Wall. > Yep! It worked! As a future tip it's better to remove -Werror if needed - that way you will still get any warnings - but they won't be mapped to errors. > Proceeding to timing tests.Calibrating overhead......done > Timer overhead (t_c) ~= : 9.67512 > Jitter ~= : 7.47951 > qi_parse vs atoi : 78.1417 81.6005 86.3038% faster. > qi_parse vs strtol : 76.5284 85.2148 86.8329% faster. > strtol vs atoi : -4.67955 -2.60676 -0.488886% faster. > qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster. > > All done! Hmm. Very similar to my results - except for you strtol is slower than atoi. > I ran the Visual Studio build again, still got about the same as I got > in my last email, so yea, Visual Studio is a lot better on templates > then GCC is. I wonder why GCC performs so much worse... Indeed - an open issue. Perhaps I will play with some other options (march etc.) to see if there's something I've overlooked. Well, I have to say I'm gaining more respect for the MS compiler - I last used it a very long time ago, back then it was awful! -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?Thorsten Ottosen wrote:
> Edward Grace skrev: >>>> I'm clearly going to have to ponder this in some depth. While I've >>>> thought about trying to get the maths (statistics) right, I've not >>>> really given the nitty gritty of the machine's operation a great >>>> deal of thought. >>> >>> Well, there you have it. I'd love to have your expertise on >>> statistics >> >> I claim none! I've been trying to learn as I go - perhaps a little >> knowledge is a dangerous thing! >> >>> plus Matthias Troyer, et al. test harness be combined in an easy to >>> use benchmarking library. >> >> Be careful what you wish for, it might come true... ;-) >> >>> Benchmarking is such a black art :-) ! >> >> I totally agree! Like presumably many others, this appeared to me to >> be a trivial problem - at first sight. In fact it's anything but >> straightforward. I bet there are half a dozen or so PhDs sitting in >> this particular dark recess of computing. >> > > Well, I normally use a tool like vtune running the code on real data. > Do > you think such a tool is unreliable? > Personally I perform several runs of the same benchmark and then take the minimum time as the time I report. This excludes all outliers due to OS and such. If a car company reported the average (mean or median) 0 to 60 mph for a given car when their test driver was trying to see how fast it was everyone would think they were crazy. Why should the fact that he pushed the gas too hard on some of the trials and spun out and not hard enough on others count against the car? I also typically don't have the luxury of benchmarking on an unloaded machine, so I have to make sure to do things fairly, for instance, instead of runing many trials of A then many trials of B and taking the minimum for each I run many trials of A then B back to back and take the minimum. That way A and B on average see the same environement over the course of the experiement. If I run A and B in the same process I run A first then B and then B first and then A as a separate run to ensure that the order doesn't impact their performance. Usually benchmarking implies comparision, so the key concern is that the results are fair. Regards, Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: Boost.Plot? - Scalable Vector Graphics>>
Again this is a long one but I feel it's worth discussing these things in some detail --- only by truly understanding the problem can we try to solve it. (Something funny is going on regarding my access to this list - I keep getting things in strange order and with the wrong subject line. Anyhow - to add my oar in to this discussion.) >> >> Well, I normally use a tool like vtune running the code on real data. >> Do you think such a tool is unreliable? >> VTune is a sweet piece of kit, as are other profilers. I don't think they are unreliable - one just has to understand what they do and / or don't do and their limitations. For example, consider the 'fast' and 'slow' functions in my code which nominally differ in run-time by around 0.5%. Do you think the 'fire and forget' operation of a profiler would *reliably* detect this difference? [This is not a rhetorical question - I don't know and am genuinely curious]. It appears that my benchmarking does. Obviously this is a contrived example, no one is likely to care about 0.5% differences. For example, when you see that function 'abracadabra' takes 14.3323% of the run-time - how reliable is that number? When you compare it to the other function that takes 14.5231% is it actually faster? Are you sure? What happens if you profile again do you come to the same conclusion? From my point of view profilers are useless as they are for the off- line optimising of code. My original goal, as I have remarked before (somewhat obliquely) is that I wish to select the 'best' function or parameters dynamically at run-time. In other words, consider a set of functions (algorithms) which nominally do the same thing, I want to be able to do (I may be syntactically clumsy here) std::set<unary_function> sf; sf.insert(function_a); sf.insert(function_b); sf.insert(function_c)); .. .. .. function_pointer f = get_fastest_function(sf,test_argument); .. .. f(argument); // <--- This is now the fastest implementation of whatever we want. Here, the relative speed of the different functions could be 100% and differ for differing sizes of data such that it may not be possible to know a priori which is quickest. Likewise it may be run in a heterogeneous environment so that the knowledge about which function/ parameter combination works best on node "A" is exactly wrong for node "B". This may sound like a niche requirement - it is! > VTune is for tuning application performance. VTune is too big a > hammer for measuring benchmark runtime. Benchmarks are about wall > clock time of a piece of code as a metric for performance of > something else. How you measure that time and how you report those > measurements is a problem that is prone to error. Ok, I should have read the whole email before writing the above monologue -- agreed. Though I'd caution against a bare 'wall clock' times in favour of relative times (some times of course you may *have* to measure the wall-clock time). > Personally I perform several runs of the same benchmark and then > take the minimum time as the time I report. Be very cautious about that for a number of reasons. a) By reporting the minimum, without qualifying your statement, you are lying. For example if you say "My code takes 10.112 seconds to run." people are unlikely to ever observe that time - it is by construction atypical. b) If ever you do this (use the minimum or maximum) then you are unwittingly sampling an extreme value distribution: http://en.wikipedia.org/wiki/Extreme_value_distribution For example - say you 'know' that the measured times are normally distributed with mean (and median) zero and a variance of 10. If you take, say 100 measurements and then take the minimum - how is that minimum distributed? In other words if your repeat this - again and again - what does the histogram of the minimum look like? What you will find is that the dispersion (variance or otherwise) on the median is small say \sigma^2=0.15 whereas the dispersion on the minimum can be much larger \sigma^2=1.8 - *and* the distribution is highly asymmetrical. If your two times were nominally separated by around 0.5 you would not necessarily reliably determine which was fastest from the minima since the distributions of the minima could overlap. Attached are two examples, histograms of the observed minimum for many trials and the observed median for many trials of the above scenario. Notice that the histogram of the observed minimum is *far* broader than the observed median. I suggest you experiment with this - you may be (unpleasantly) surprised; it's somewhat counter-intuitive an appropriate MATLAB (ok- ok put away the pitchforks) snippet below % EV clear all; v_median=zeros(50000,1); % Lots of trials to get a decent looking histogram. v_min=v_median; for m=1:length(v_median) v=randn(100,1).*sqrt(10); % Normally distributed, try other distributions. v_median(m)=median(v); v_min(m)=min(v); end figure(1); hist(v_median,51); figure(2); hist(v_min,51); > This excludes all outliers due to OS and such. That may well be the aim, but I'm not sure it does get you what you want. This is where understanding of the maths (statistics) needs to be blended with an understanding of the machine. Your use of the minimum could make it *less* possible to infer something from the measurements. On the other hand you might be right - I don't know for sure -- but I am sceptical. > If a car company reported the average (mean or median) 0 to 60 mph > for a given car when their test driver was trying to see how fast > it was everyone would think they were crazy. That's all down to marketing. I imagine they *do* determine it, they just don't report it that way. Most people couldn't care less what these numbers actually mean - just that it's 'better' than the other one. In the infamous words of "Spinal Tap" - "it goes up to eleven!". > Why should the fact that he pushed the gas too hard on some of the > trials and spun out and not hard enough on others count against the > car? I imagine they test a number of different cars rather than just the same car a lot of times and that the reported time is some form of median. This would mean that half the cars will go a little faster - half a little slower than this 'typical' car. Almost certainly the testing criteria and their analysis is more sophisticated than that - speculation on my part -- anyone in the car industry? > I also typically don't have the luxury of benchmarking on an > unloaded machine, so I have to make sure to do things fairly, for > instance, instead of runing many trials of A then many trials of B > and taking the minimum for each I run many trials of A then B back > to back and take the minimum. That way A and B on average see the > same environement over the course of the experiement. Hopefully. As someone pointed out "It's a dark art"! > If I run A and B in the same process I run A first then B and then > B first and then A as a separate run to ensure that the order > doesn't impact their performance. Just to add to the mix, I suggest randomising the order in which you run them (that's what I do) that way you are very unlikely to accidentally synchronise yourself with some other process e.g. ABBBABAABAB etc... > Usually benchmarking implies comparision, so the key concern is > that the results are fair. Easy to write very hard to do! ;-) Regards, -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997 _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: Boost.Plot? - Scalable Vector Graphics (was [expressive] performance tuningEdward Grace wrote:
>> Personally I perform several runs of the same benchmark and then >> take the minimum time as the time I report. > > Be very cautious about that for a number of reasons. > > a) By reporting the minimum, without qualifying your statement, you > are lying. For example if you say "My code takes 10.112 seconds to > run." people are unlikely to ever observe that time - it is by > construction atypical. I try to be explicit about explaining what metric I'm reporting. If I'm reporting the minimum of ten trials I report it as the minimum of ten trials. Taking the minimum of ten trials and comparing it to the minimum of twenty trials and not qaulifying it would be even more dishonest because it isn't fair. I compare same number of trails, report that it is the minimum and the number of trails. > b) If ever you do this (use the minimum or maximum) then you are > unwittingly sampling an extreme value distribution: > > http://en.wikipedia.org/wiki/Extreme_value_distribution > > For example - say you 'know' that the measured times are normally > distributed with mean (and median) zero and a variance of 10. If you > take, say 100 measurements and then take the minimum - how is that > minimum distributed? In other words if your repeat this - again and > again - what does the histogram of the minimum look like? > > What you will find is that the dispersion (variance or otherwise) on > the median is small say \sigma^2=0.15 whereas the dispersion on the > minimum can be much larger \sigma^2=1.8 - *and* the distribution is > highly asymmetrical. > > If your two times were nominally separated by around 0.5 you would > not necessarily reliably determine which was fastest from the minima > since the distributions of the minima could overlap. > > Attached are two examples, histograms of the observed minimum for > many trials and the observed median for many trials of the above > scenario. Notice that the histogram of the observed minimum is *far* > broader than the observed median. In this particular case of runtime the distribution is usually not normal. There is some theoritical fastest the program could go if nothing like the OS or execution of other programs on the same machine cause it to run slower. This is the number we actually want. We can't measure it, but we observe that there is a "floor" under our data and the more times we run the more clear it becomes where that floor is. It just won't run faster than that no matter how often we run. The minimum value tends to be closer to the mode of the distribution than to the median because runtimes cluster around the minimum value and tail off above. The idea is that using the minimum as the metric eliminates most of the "noise" introduced into the data by stuff unrelated to what you are trying to measure. Taking the median is just taking the median of the noise. In an ideal compute environment your runtimes would be so similar from run to run you wouldn't bother to run more than once. If th ere is noise in your data set your statistic should be designed to eliminate the noise, not characterize it. In my recent boostcon presentation my benchmarks were all for a single trial of each algorithm for each benchmark because noise would be eliminated during the curve fitting of the data for analysis of how well the algorithms scaled (if not by the conversion to logarithmic axis....) If I had taken the minimum of some number of trials I would have reported that in the paper. Regards, Luke _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
|
|
Re: [xpressive] Performance Tuning?On Thu, Jul 30, 2009 at 3:24 PM, Edward Grace<ej.grace@...> wrote:
>>> I often forget about -DNDEBUG - this can have a significant impact ~10% >>> for >>> Spirit2 over atoi. Presumably you define NDEBUG when compiling on >>> Windows >>> (or is that automatically assumed for 'Release' builds?). >>> >>>> Since I will be running it on the exact same computer >>>> with the same OS, but different compilers, that will prove if it >>>> really is GCC being much slower then VC, or if they are near the same >>>> on my computer, then it is something else. >>> >>> I await with trepidation.... >> >> Very useful info, thanks. I altered slightly to support my setup, >> since boost trunk is in the /etc/usr/lib or something like that, it is >> picked up globally. I tried compiling it multiple times, got the same >> warning each time, never could find the output file. > > With -Werror any warnings will be interpreted as errors. If there are > compilation errors it won't build the object / executable. > >> Here is my >> latest attempt, I tried fully qualified names, did not help: >> OvermindDL1@overmind /cygdrive/s/cygwin/home/OvermindDL1/ejg_test >> $ g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror >> -I/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/other_includes -o >> /cygdrive/s/cygwin/home/OvermindDL1/ejg_test/ejg_uint_parser >> ejg_uint_parser_0_0_4_bind_1.cpp >> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h: >> In copyconstructor `std::vector<_Tp, _Alloc>::vector(const >> std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc = >> std::allocator<std::string>]': >> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h:82: >> warning: '__cur' might be used uninitialized in this function > > Hmm.. I've never come across that before. It looks like it's flagging an > error relating to something in the setup of the iterators for parse_qi. > > >> So I 'guess' it worked? But I cannot find any file it created at all... >> >> Hmm, I am going to try removing -Wall. >> Yep! It worked! > > As a future tip it's better to remove -Werror if needed - that way you will > still get any warnings - but they won't be mapped to errors. > >> Proceeding to timing tests.Calibrating overhead......done >> Timer overhead (t_c) ~= : 9.67512 >> Jitter ~= : 7.47951 >> qi_parse vs atoi : 78.1417 81.6005 86.3038% faster. >> qi_parse vs strtol : 76.5284 85.2148 86.8329% faster. >> strtol vs atoi : -4.67955 -2.60676 -0.488886% faster. >> qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster. >> >> All done! > > Hmm. Very similar to my results - except for you strtol is slower than > atoi. > >> I ran the Visual Studio build again, still got about the same as I got >> in my last email, so yea, Visual Studio is a lot better on templates >> then GCC is. I wonder why GCC performs so much worse... > > Indeed - an open issue. Perhaps I will play with some other options (march > etc.) to see if there's something I've overlooked. Well, I have to say I'm > gaining more respect for the MS compiler - I last used it a very long time > ago, back then it was awful! > *scratches head* I must be missing something very obvious, but I do not see how your and OvermindDL benchmarks can support this conclusion: as far as I can tell, you two never compared the two compilers directly, what you are seeing is that on GCC the speed up of qi_parse over atoi/strtol is less than that of MSVC, but this tells nothing about the absolute performance of the two compilers on this test (i.e. you never showed any absolute times). For what we know the gcc atoi might just be faster than msvc one. And in fact a quick google search brings these two pages: http://tinyurl.com/mqa5yl [msvc8 atoi performance is 58% of that of msvc6] http://tinyurl.com/mzyw66 [thread containing a comparison of atoi functions of different languages and compilers, in particular it seems that MSVC atoi is really 2 times slower than gcc atoi] A slow atoi on MSVC would explain such a difference in the tests, assuming that the ability of both compilers to optimize qi_parse is about the same. Finally, I see from the error messages that OvermindDL is using gcc.3.4, which is now fairly old, maybe he should try with a more recent one. Also -march=native or whatever architecture he is using might help. HTH, -- gpd _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |