[xpressive] Performance Tuning?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next >

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OvermindDL1 wrote:
> On Tue, Jul 28, 2009 at 8:14 PM, Joel de
> Guzman<joel@...> wrote:
>> OvermindDL1 wrote:
>>> Also, I think I might know why QI is faster then atoi/strtol.
>>> atoi/strtol handle local as I recall, QI does not...
>> Sorry, I can't parse that. What do you mean by "handle local"?
>
> Er, locale*

Sure, that's one. The reason is because spirit numerics are
customizable. And they are even more customizable than the
locales provide. For example, you can tweak the floating point
parsers to handle numbers like:

     1,234,567,890

or

     1.234567890 exp -200

with the same high performance. And, surely, you can customize
them to handle locales.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Paul Baxter wrote:
> I'm really appreciative of the testing you are all doing with parsing
> code and hope that at the end we can see both how fast and how clear and
> maintainable the various styles of parser code become (focus on
> accuracy, speed then perhaps more on the clarity).
>
> Its great to see the enthusiasm and results. Looking forward to more
> tips once we have a fair test directly pitting spirit with expressive
> and other parsing methodologies.

Keep i mind that we are testing just one small aspect of parsing/
text processing. It just so happens that spirit is designed to be
optimal in this *specific* use case. xpressive will definitely
shine in other areas (e.g. searching, replacing). So, as always,
use the right tool for the job.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Eric Niebler-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Joel de Guzman wrote:

> Paul Baxter wrote:
>> I'm really appreciative of the testing you are all doing with parsing
>> code and hope that at the end we can see both how fast and how clear
>> and maintainable the various styles of parser code become (focus on
>> accuracy, speed then perhaps more on the clarity).
>>
>> Its great to see the enthusiasm and results. Looking forward to more
>> tips once we have a fair test directly pitting spirit with expressive
>> and other parsing methodologies.
>
> Keep i mind that we are testing just one small aspect of parsing/
> text processing. It just so happens that spirit is designed to be
> optimal in this *specific* use case. xpressive will definitely
> shine in other areas (e.g. searching, replacing). So, as always,
> use the right tool for the job.

IIRC, xpressive was faster than Spirit 1. Spirit 2 has leapfrogged
xpressive in performance. That pleases me -- a regex engine with
exhaustive backtracking semantics has no business being faster than a
parser generator. :-) Kudos to Joel and Hartmut.

Sorry I haven't kept up with this discussion. Things in my day job have
taken a turn for the busy.

--
Eric Niebler
BoostPro Computing
http://www.boostpro.com
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edward Grace wrote:

> On 28 Jul 2009, at 18:17, Joel de Guzman wrote:
>> This time, I am using the benchmarking harness by David Abrahams,
>> Matthias Troyer, Michael Gauckler.
>
> This?
>
> http://tinyurl.com/kk858o
>
> There's some interesting trickery in there by the looks of things for
> eliminating the optimiser nastiness - that's not something I've thought
> about much I'll take a look.

The technique of using an accumulator to keep the whole system
"plugged" is one of the ideas you can use versus the need for
global variables, volatile, etc. This comment and code returning
from main says a lot:

     // This is ultimately responsible for preventing all the test code
     // from being optimized away.  Change this to return 0 and you
     // unplug the whole test's life support system.
     return test::live_code != 0;

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> This?
>>
>> http://tinyurl.com/kk858o
>>
>> There's some interesting trickery in there by the looks of things for
>> eliminating the optimiser nastiness - that's not something I've  
>> thought
>> about much I'll take a look.
>
> The technique of using an accumulator to keep the whole system
> "plugged" is one of the ideas you can use versus the need for
> global variables, volatile, etc.

I'm clearly going to have to ponder this in some depth.  While I've  
thought about trying to get the  maths (statistics) right, I've not  
really given the nitty gritty of the machine's operation a great deal  
of thought.

Then again, that's the whole point of engaging in discussion on this  
mailing list!

> This comment and code returning
> from main says a lot:
>
>      // This is ultimately responsible for preventing all the test  
> code
>      // from being optimized away.  Change this to return 0 and you
>      // unplug the whole test's life support system.
>      return test::live_code != 0;
>

Indeed, I'd spotted that.  It's a nice comment! ;-)

-ed
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> Proceeding to timing tests.Calibrating overhead......done
>> Timer overhead (t_c) ~= : 117.426
>> Jitter               ~= : 25.9133
>> qi_parse vs atoi    : 86.0764 86.3074 86.4471% faster.
>> qi_parse vs strtol  : 71.9253 72.1881 72.5288% faster.
>> strtol vs atoi      : 8.0502 8.26097 8.47215% faster.
>> qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster.
>>
>>
>> All done!
>> ====================
>>
>>
>> On my platform this is entirely consistent with the simple one-liner
>> modification you mentioned to the previous code.
>>
>> Take home message - yes Spirit really *is* faster.
>
> Enter buffer size: 10000
> initializing input strings...
>
>
>
> Checking that the parsers are functioning  correctly...
> atoi is behaving itself!
> strtol is behaving itself!
> qi is behaving itself!
>
> Proceeding to timing tests.Calibrating overhead......done
> Timer overhead (t_c) ~= : 12
> Jitter               ~= : 8.43769e-015
> qi_parse vs atoi     : 160.834 187.892 197.781% faster.
> qi_parse vs strtol  : 152.088 173.709 197.184% faster.
> strtol vs atoi       : 5.34019 7.29527 9.82952% faster.
> qi_parse vs qi_parse : -3.12862 -0.194198 1.53912% faster.
>
> All done!
>

[fixed with font required]

Interesting.  So, there is no difference between the speedup of  
strtol and atoi between your platform and mine (their confidence  
intervals overlap).


strtol vs atoi

OvermindDL1:   [5.3 ---------x-------- 9.8] %

          Ed:           [8.1 --x- 8.5] %

on the other hand qi_parse is significantly faster under Windows on  
your architecture compared to atoi than OS X.

qi_parse vs atoi

OvermindDL1:                                                    
[160.8  ---------x-- 197.8] %

          Ed:   [86.0 -----x----- 86.4] %


Likewise in both cases the timing of qi_parse against itself shows no  
difference since the confidence interval includes zero.

qi_parse vs qi_parse

OvermindDL1: [-3.13 ----------- 0 ----------- +1.54] %

          Ed:          [-0.03 -- 0 -- +0.23] %

> MSVC definitely compiles templates code better then GCC

More warnings?

> it seems (you
> said you were using GCC yes?).

Yes.... I wonder why am I starting to feel sheepish about that...

-ed
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edward Grace wrote:

>>> This?
>>>
>>> http://tinyurl.com/kk858o
>>>
>>> There's some interesting trickery in there by the looks of things for
>>> eliminating the optimiser nastiness - that's not something I've thought
>>> about much I'll take a look.
>>
>> The technique of using an accumulator to keep the whole system
>> "plugged" is one of the ideas you can use versus the need for
>> global variables, volatile, etc.
>
> I'm clearly going to have to ponder this in some depth.  While I've
> thought about trying to get the  maths (statistics) right, I've not
> really given the nitty gritty of the machine's operation a great deal of
> thought.

Well, there you have it. I'd love to have your expertise on statistics
plus Matthias Troyer, et al. test harness be combined in an easy to
use benchmarking library. Benchmarking is such a black art :-) !

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 29, 2009 at 4:20 AM, Edward Grace<ej.grace@...> wrote:

>>> Proceeding to timing tests.Calibrating overhead......done
>>> Timer overhead (t_c) ~= : 117.426
>>> Jitter               ~= : 25.9133
>>> qi_parse vs atoi    : 86.0764 86.3074 86.4471% faster.
>>> qi_parse vs strtol  : 71.9253 72.1881 72.5288% faster.
>>> strtol vs atoi      : 8.0502 8.26097 8.47215% faster.
>>> qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster.
>>>
>>>
>>> All done!
>>> ====================
>>>
>>>
>>> On my platform this is entirely consistent with the simple one-liner
>>> modification you mentioned to the previous code.
>>>
>>> Take home message - yes Spirit really *is* faster.
>>
>> Enter buffer size: 10000
>> initializing input strings...
>>
>>
>>
>> Checking that the parsers are functioning  correctly...
>> atoi is behaving itself!
>> strtol is behaving itself!
>> qi is behaving itself!
>>
>> Proceeding to timing tests.Calibrating overhead......done
>> Timer overhead (t_c) ~= : 12
>> Jitter               ~= : 8.43769e-015
>> qi_parse vs atoi     : 160.834 187.892 197.781% faster.
>> qi_parse vs strtol  : 152.088 173.709 197.184% faster.
>> strtol vs atoi       : 5.34019 7.29527 9.82952% faster.
>> qi_parse vs qi_parse : -3.12862 -0.194198 1.53912% faster.
>>
>> All done!
>>
>
> [fixed with font required]
>
> Interesting.  So, there is no difference between the speedup of strtol and
> atoi between your platform and mine (their confidence intervals overlap).
>
>
> strtol vs atoi
>
> OvermindDL1:   [5.3 ---------x-------- 9.8] %
>
>         Ed:           [8.1 --x- 8.5] %
>
> on the other hand qi_parse is significantly faster under Windows on your
> architecture compared to atoi than OS X.
>
> qi_parse vs atoi
>
> OvermindDL1:                                                   [160.8
>  ---------x-- 197.8] %
>
>         Ed:   [86.0 -----x----- 86.4] %
>
>
> Likewise in both cases the timing of qi_parse against itself shows no
> difference since the confidence interval includes zero.
>
> qi_parse vs qi_parse
>
> OvermindDL1: [-3.13 ----------- 0 ----------- +1.54] %
>
>         Ed:          [-0.03 -- 0 -- +0.23] %
>
>> MSVC definitely compiles templates code better then GCC
>
> More warnings?
>
>> it seems (you
>> said you were using GCC yes?).
>
> Yes.... I wonder why am I starting to feel sheepish about that...

Heh, I do have cygwin completely installed and fully updated on my
computer here (the recommended beta version that uses gcc 4.3 as I
recall) and I do have boost trunk (from a couple weeks ago anyway) in
there and compiled for it.  If you can tell me what command I need to
type to compile the file with all necessary optimizations, I will do
that here too.  Since I will be running it on the exact same computer
with the same OS, but different compilers, that will prove if it
really is GCC being much slower then VC, or if they are near the same
on my computer, then it is something else.  So, what should I type
assuming I have the ejg test file and cycle.h in the current directory
with the ejg in ./other_includes/ejg for full optimizations and
everything?
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning? - was (Re: [regex] "I want everything" should build "out of the box"(onWindows))

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>
>>> it seems (you
>>> said you were using GCC yes?).
>>
>> Yes.... I wonder why am I starting to feel sheepish about that...
>
> Heh, I do have cygwin completely installed and fully updated on my
> computer here (the recommended beta version that uses gcc 4.3 as I
> recall) and I do have boost trunk (from a couple weeks ago anyway) in
> there and compiled for it.  If you can tell me what command I need to
> type to compile the file with all necessary optimizations, I will do
> that here too.

Hi, assuming:

   * Your homed directory is $HOME
   * Spirit2 is in $HOME/spirit21_root, which should contain boost/
spirit/actor.hpp
   * The latest Boost is in $HOME/boost_root, which should contain  
boost/any.hpp
   * cycle.h is in the same directory as the file  
ejg_uint_parser_0_0_4_bind_1.cpp
   * The ejg timer stuff is in $HOME/ejg_root, this should contain  
ejg/timer.hpp

the following stanza will work in bash (note the backslashes to break  
the line), first we define some environment variables for legibility,  
then fire up g++,

# ------ cut -----

SPIRIT2=$HOME/spirit21_root
BOOST=$HOME/boost_root
EJG=$HOME/ejg_root

g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror \
  -I$SPIRIT2 -I$BOOST -I$EJG -o ejg_uint_parser \
  ejg_uint_parser_0_0_4_bind_1.cpp


# ----- cut -------

The following is a synopsis of what the bits mean, in case it's not  
obvious.

-DNDEBUG -> equivalent to #define NDEBUG, should switch off any debug  
parts of Boost
-O3 -> Optimisation level 3 - pretty much all in!
-ansi -> Require ANSI compliance of the language!
-pedantic -> Really really mean it!
-Wall -> Warn about everything (alegedly)
-Wno-long-long -> Do not warn about long long not being a mandated C+
+ standard type.
-Werror -> Convert warnings to errors
-I<blah> -> Include <blah> as a directory to search for include files  
along with the standard locations.
-o <blah> -> Generate the binary file <blah>

I often forget about -DNDEBUG - this can have a significant impact  
~10% for Spirit2 over atoi.  Presumably you define NDEBUG when  
compiling on Windows (or is that automatically assumed for 'Release'  
builds?).

> Since I will be running it on the exact same computer
> with the same OS, but different compilers, that will prove if it
> really is GCC being much slower then VC, or if they are near the same
> on my computer, then it is something else.

I await with trepidation....

-ed
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> I'm clearly going to have to ponder this in some depth.  While I've
>> thought about trying to get the  maths (statistics) right, I've not
>> really given the nitty gritty of the machine's operation a great  
>> deal of
>> thought.
>
> Well, there you have it. I'd love to have your expertise on statistics

I claim none!  I've been trying to learn as I go - perhaps a little  
knowledge is a dangerous thing!

> plus Matthias Troyer, et al. test harness be combined in an easy to
> use benchmarking library.

Be careful what you wish for, it might come true... ;-)

> Benchmarking is such a black art :-) !

I totally agree! Like presumably many others, this appeared to me to  
be a trivial problem - at first sight.  In fact it's anything but  
straightforward. I bet there are half a dozen or so PhDs sitting in  
this particular dark recess of computing.

-ed
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 29, 2009 at 9:17 AM, Edward Grace<ej.grace@...> wrote:

>>>
>>>> it seems (you
>>>> said you were using GCC yes?).
>>>
>>> Yes.... I wonder why am I starting to feel sheepish about that...
>>
>> Heh, I do have cygwin completely installed and fully updated on my
>> computer here (the recommended beta version that uses gcc 4.3 as I
>> recall) and I do have boost trunk (from a couple weeks ago anyway) in
>> there and compiled for it.  If you can tell me what command I need to
>> type to compile the file with all necessary optimizations, I will do
>> that here too.
>
> Hi, assuming:
>
>  * Your homed directory is $HOME
>  * Spirit2 is in $HOME/spirit21_root, which should contain
> boost/spirit/actor.hpp
>  * The latest Boost is in $HOME/boost_root, which should contain
> boost/any.hpp
>  * cycle.h is in the same directory as the file
> ejg_uint_parser_0_0_4_bind_1.cpp
>  * The ejg timer stuff is in $HOME/ejg_root, this should contain
> ejg/timer.hpp
>
> the following stanza will work in bash (note the backslashes to break the
> line), first we define some environment variables for legibility, then fire
> up g++,
>
> # ------ cut -----
>
> SPIRIT2=$HOME/spirit21_root
> BOOST=$HOME/boost_root
> EJG=$HOME/ejg_root
>
> g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror \
>  -I$SPIRIT2 -I$BOOST -I$EJG -o ejg_uint_parser \
>  ejg_uint_parser_0_0_4_bind_1.cpp
>
>
> # ----- cut -------
>
> The following is a synopsis of what the bits mean, in case it's not obvious.
>
> -DNDEBUG -> equivalent to #define NDEBUG, should switch off any debug parts
> of Boost
> -O3 -> Optimisation level 3 - pretty much all in!
> -ansi -> Require ANSI compliance of the language!
> -pedantic -> Really really mean it!
> -Wall -> Warn about everything (alegedly)
> -Wno-long-long -> Do not warn about long long not being a mandated C++
> standard type.
> -Werror -> Convert warnings to errors
> -I<blah> -> Include <blah> as a directory to search for include files along
> with the standard locations.
> -o <blah> -> Generate the binary file <blah>
>
> I often forget about -DNDEBUG - this can have a significant impact ~10% for
> Spirit2 over atoi.  Presumably you define NDEBUG when compiling on Windows
> (or is that automatically assumed for 'Release' builds?).
>
>> Since I will be running it on the exact same computer
>> with the same OS, but different compilers, that will prove if it
>> really is GCC being much slower then VC, or if they are near the same
>> on my computer, then it is something else.
>
> I await with trepidation....

Very useful info, thanks.  I altered slightly to support my setup,
since boost trunk is in the /etc/usr/lib or something like that, it is
picked up globally.  I tried compiling it multiple times, got the same
warning each time, never could find the output file.  Here is my
latest attempt, I tried fully qualified names, did not help:
OvermindDL1@overmind /cygdrive/s/cygwin/home/OvermindDL1/ejg_test
$ g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror
-I/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/other_includes -o
/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/ejg_uint_parser
ejg_uint_parser_0_0_4_bind_1.cpp
/usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h:
In copyconstructor `std::vector<_Tp, _Alloc>::vector(const
std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc =
std::allocator<std::string>]':
/usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h:82:
warning: '__cur' might be used uninitialized in this function

So I 'guess' it worked?  But I cannot find any file it created at all...

Hmm, I am going to try removing -Wall.
Yep!  It worked!
Executing it now, results:
Enter buffer size: 10000
initializing input strings...



Checking that the parsers are functioning  correctly...
atoi is behaving itself!
strtol is behaving itself!
qi is behaving itself!

Proceeding to timing tests.Calibrating overhead......done
Timer overhead (t_c) ~= : 9.67512
Jitter               ~= : 7.47951
qi_parse vs atoi     : 78.1417 81.6005 86.3038% faster.
qi_parse vs strtol  : 76.5284 85.2148 86.8329% faster.
strtol vs atoi       : -4.67955 -2.60676 -0.488886% faster.
qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster.

All done!

I ran the Visual Studio build again, still got about the same as I got
in my last email, so yea, Visual Studio is a lot better on templates
then GCC is.  I wonder why GCC performs so much worse...
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Andrey Tcherepanov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 29 Jul 2009 18:48:49 -0600, OvermindDL1 <overminddl1@...>  
wrote:

> Proceeding to timing tests.Calibrating overhead......done
> Timer overhead (t_c) ~= : 9.67512
> Jitter               ~= : 7.47951
> qi_parse vs atoi     : 78.1417 81.6005 86.3038% faster.
> qi_parse vs strtol  : 76.5284 85.2148 86.8329% faster.
> strtol vs atoi       : -4.67955 -2.60676 -0.488886% faster.
> qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster.

Folks, how do you get these 86% ?  What is the formula?
'cause I know I am missing something, since F(78.14,81.6) somehow gives  
0.86, but what is the F then...

Thank you very much,

   Andrey

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Andrey Tcherepanov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

D'oh... after looking into the code I realized that this is actually  
min%-median%-max% speedup. Somehow I thought that only the last number is  
meanful. Would be nice to have something like "min=78% med=81% max=86%  
speedup over 100000 iterations"

Oh well, I guess I was the only one puzzled :)

Sorry for the noise.

A.

On Wed, 29 Jul 2009 20:10:46 -0600, Andrey Tcherepanov  
<moyt63c02@...> wrote:

> On Wed, 29 Jul 2009 18:48:49 -0600, OvermindDL1 <overminddl1@...>  
> wrote:
>
>> Proceeding to timing tests.Calibrating overhead......done
>> Timer overhead (t_c) ~= : 9.67512
>> Jitter               ~= : 7.47951
>> qi_parse vs atoi     : 78.1417 81.6005 86.3038% faster.
>> qi_parse vs strtol  : 76.5284 85.2148 86.8329% faster.
>> strtol vs atoi       : -4.67955 -2.60676 -0.488886% faster.
>> qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster.
>
> Folks, how do you get these 86% ?  What is the formula?
> 'cause I know I am missing something, since F(78.14,81.6) somehow gives  
> 0.86, but what is the F then...
>
> Thank you very much,
>
>    Andrey
>
> _______________________________________________
> Unsubscribe & other changes:  
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Thorsten Ottosen-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edward Grace skrev:

>>> I'm clearly going to have to ponder this in some depth.  While I've
>>> thought about trying to get the  maths (statistics) right, I've not
>>> really given the nitty gritty of the machine's operation a great deal of
>>> thought.
>>
>> Well, there you have it. I'd love to have your expertise on statistics
>
> I claim none!  I've been trying to learn as I go - perhaps a little
> knowledge is a dangerous thing!
>
>> plus Matthias Troyer, et al. test harness be combined in an easy to
>> use benchmarking library.
>
> Be careful what you wish for, it might come true... ;-)
>
>> Benchmarking is such a black art :-) !
>
> I totally agree! Like presumably many others, this appeared to me to be
> a trivial problem - at first sight.  In fact it's anything but
> straightforward. I bet there are half a dozen or so PhDs sitting in this
> particular dark recess of computing.
>

Well, I normally use a tool like vtune running the code on real data. Do
you think such a tool is unreliable?

-Thorsten
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorsten Ottosen wrote:

> Edward Grace skrev:
>>>> I'm clearly going to have to ponder this in some depth.  While I've
>>>> thought about trying to get the  maths (statistics) right, I've not
>>>> really given the nitty gritty of the machine's operation a great
>>>> deal of
>>>> thought.
>>>
>>> Well, there you have it. I'd love to have your expertise on statistics
>>
>> I claim none!  I've been trying to learn as I go - perhaps a little
>> knowledge is a dangerous thing!
>>
>>> plus Matthias Troyer, et al. test harness be combined in an easy to
>>> use benchmarking library.
>>
>> Be careful what you wish for, it might come true... ;-)
>>
>>> Benchmarking is such a black art :-) !
>>
>> I totally agree! Like presumably many others, this appeared to me to
>> be a trivial problem - at first sight.  In fact it's anything but
>> straightforward. I bet there are half a dozen or so PhDs sitting in
>> this particular dark recess of computing.
>>
>
> Well, I normally use a tool like vtune running the code on real data. Do
> you think such a tool is unreliable?

I'd like my benchmarks to be freely verifiable by anyone.
It has to be cross-platform, free and can be bundled with
Boost.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> I often forget about -DNDEBUG - this can have a significant impact  
>> ~10% for
>> Spirit2 over atoi.  Presumably you define NDEBUG when compiling on  
>> Windows
>> (or is that automatically assumed for 'Release' builds?).
>>
>>> Since I will be running it on the exact same computer
>>> with the same OS, but different compilers, that will prove if it
>>> really is GCC being much slower then VC, or if they are near the  
>>> same
>>> on my computer, then it is something else.
>>
>> I await with trepidation....
>
> Very useful info, thanks.  I altered slightly to support my setup,
> since boost trunk is in the /etc/usr/lib or something like that, it is
> picked up globally.  I tried compiling it multiple times, got the same
> warning each time, never could find the output file.

With -Werror any warnings will be interpreted as errors.  If there  
are compilation errors it won't build the object / executable.

> Here is my
> latest attempt, I tried fully qualified names, did not help:
> OvermindDL1@overmind /cygdrive/s/cygwin/home/OvermindDL1/ejg_test
> $ g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror
> -I/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/other_includes -o
> /cygdrive/s/cygwin/home/OvermindDL1/ejg_test/ejg_uint_parser
> ejg_uint_parser_0_0_4_bind_1.cpp
> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/
> stl_uninitialized.h:
> In copyconstructor `std::vector<_Tp, _Alloc>::vector(const
> std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc =
> std::allocator<std::string>]':
> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/
> stl_uninitialized.h:82:
> warning: '__cur' might be used uninitialized in this function

Hmm..  I've never come across that before.  It looks like it's  
flagging an error relating to something in the setup of the iterators  
for parse_qi.


> So I 'guess' it worked?  But I cannot find any file it created at  
> all...
>
> Hmm, I am going to try removing -Wall.
> Yep!  It worked!

As a future tip it's better to remove -Werror if needed - that way  
you will still get any warnings - but they won't be mapped to errors.

> Proceeding to timing tests.Calibrating overhead......done
> Timer overhead (t_c) ~= : 9.67512
> Jitter               ~= : 7.47951
> qi_parse vs atoi     : 78.1417 81.6005 86.3038% faster.
> qi_parse vs strtol  : 76.5284 85.2148 86.8329% faster.
> strtol vs atoi       : -4.67955 -2.60676 -0.488886% faster.
> qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster.
>
> All done!

Hmm.  Very similar to my results - except for you strtol is slower  
than atoi.

> I ran the Visual Studio build again, still got about the same as I got
> in my last email, so yea, Visual Studio is a lot better on templates
> then GCC is.  I wonder why GCC performs so much worse...

Indeed - an open issue.  Perhaps I will play with some other options  
(march etc.) to see if there's something I've overlooked.  Well, I  
have to say I'm gaining more respect for the MS compiler - I last  
used it a very long time ago, back then it was awful!

-ed
------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Simonson, Lucanus J :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorsten Ottosen wrote:

> Edward Grace skrev:
>>>> I'm clearly going to have to ponder this in some depth.  While I've
>>>> thought about trying to get the  maths (statistics) right, I've not
>>>> really given the nitty gritty of the machine's operation a great
>>>> deal of thought.
>>>
>>> Well, there you have it. I'd love to have your expertise on
>>> statistics
>>
>> I claim none!  I've been trying to learn as I go - perhaps a little
>> knowledge is a dangerous thing!
>>
>>> plus Matthias Troyer, et al. test harness be combined in an easy to
>>> use benchmarking library.
>>
>> Be careful what you wish for, it might come true... ;-)
>>
>>> Benchmarking is such a black art :-) !
>>
>> I totally agree! Like presumably many others, this appeared to me to
>> be a trivial problem - at first sight.  In fact it's anything but
>> straightforward. I bet there are half a dozen or so PhDs sitting in
>> this particular dark recess of computing.
>>
>
> Well, I normally use a tool like vtune running the code on real data.
> Do
> you think such a tool is unreliable?
>
VTune is for tuning application performance.   VTune is too big a hammer for measuring benchmark runtime.  Benchmarks are about wall clock time of a piece of code as a metric for performance of something else.  How you measure that time and how you report those measurements is a problem that is prone to error.

Personally I perform several runs of the same benchmark and then take the minimum time as the time I report.  This excludes all outliers due to OS and such.  If a car company reported the average (mean or median) 0 to 60 mph for a given car when their test driver was trying to see how fast it was everyone would think they were crazy.  Why should the fact that he pushed the gas too hard on some of the trials and spun out and not hard enough on others count against the car?  I also typically don't have the luxury of benchmarking on an unloaded machine, so I have to make sure to do things fairly, for instance, instead of runing many trials of A then many trials of B and taking the minimum for each I run many trials of A then B back to back and take the minimum.  That way A and B on average see the same environement over the course of the experiement.  If I run A and B in the same process I run A first then B and then B first and then A as a separate run to ensure that the order
 doesn't impact their performance.

Usually benchmarking implies comparision, so the key concern is that the results are fair.

Regards,
Luke
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Boost.Plot? - Scalable Vector Graphics

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>

Again this is a long one but I feel it's worth discussing these  
things in some detail --- only by truly understanding the problem can  
we try to solve it.

(Something funny is going on regarding my access to this list - I  
keep getting things in strange order and with the wrong subject  
line.  Anyhow - to add my oar in to this discussion.)


>>
>> Well, I normally use a tool like vtune running the code on real data.
>> Do you think such a tool is unreliable?
>>

VTune is a sweet piece of kit, as are other profilers.  I don't think  
they are unreliable - one just has to understand what they do and /  
or don't do and their limitations.

For example, consider the 'fast' and 'slow' functions in my code  
which nominally differ in run-time by around 0.5%.  Do you think the  
'fire and forget' operation of a profiler would *reliably* detect  
this difference? [This is not a rhetorical question - I don't know  
and am genuinely curious].    It appears that my benchmarking does.  
Obviously this is a contrived example, no one is likely to care about  
0.5% differences.

For example, when you see that function 'abracadabra' takes 14.3323%  
of the run-time - how reliable is that number?  When you compare it  
to the other function that takes 14.5231% is it actually faster?  Are  
you sure?  What happens if you profile again do you come to the same  
conclusion?

 From my point of view profilers are useless as they are for the off-
line optimising of code.  My original goal, as I have remarked before  
(somewhat obliquely) is that I wish to select the 'best' function or  
parameters dynamically at run-time.  In other words, consider a set  
of functions (algorithms) which nominally do the same thing, I want  
to be able to do (I may be syntactically clumsy here)

   std::set<unary_function> sf;
   sf.insert(function_a);
   sf.insert(function_b);
   sf.insert(function_c));
   ..
   ..
   ..
   function_pointer f = get_fastest_function(sf,test_argument);
   ..
   ..
   f(argument); // <--- This is now the fastest implementation of  
whatever we want.

Here, the relative speed of the different functions could be 100% and  
differ for differing sizes of data such that it may not be possible  
to know a priori which is quickest.  Likewise it may be run in a  
heterogeneous environment so that the knowledge about which function/
parameter combination works best on node "A" is exactly wrong for  
node "B".

This may sound like a niche requirement - it is!

> VTune is for tuning application performance.   VTune is too big a  
> hammer for measuring benchmark runtime.  Benchmarks are about wall  
> clock time of a piece of code as a metric for performance of  
> something else.  How you measure that time and how you report those  
> measurements is a problem that is prone to error.

Ok, I should have read the whole email before writing the above  
monologue -- agreed.  Though I'd caution against a bare 'wall clock'  
times in favour of relative times (some times of course you may  
*have* to measure the wall-clock time).

> Personally I perform several runs of the same benchmark and then  
> take the minimum time as the time I report.

Be very cautious about that for a number of reasons.

a) By reporting the minimum, without qualifying your statement, you  
are lying.  For example if you say "My code takes 10.112 seconds to  
run." people are unlikely to ever observe that time - it is by  
construction atypical.

b) If ever you do this (use the minimum or maximum) then you are  
unwittingly sampling an extreme value distribution:

   http://en.wikipedia.org/wiki/Extreme_value_distribution

For example - say you 'know' that the measured times are normally  
distributed with mean (and median) zero and a variance of 10.  If you  
take, say 100 measurements and then take the minimum - how is that  
minimum distributed?  In other words if your repeat this - again and  
again - what does the histogram of the minimum look like?

What you will find is that the dispersion (variance or otherwise) on  
the median is small say \sigma^2=0.15 whereas the dispersion on the  
minimum can be much larger \sigma^2=1.8 - *and* the distribution is  
highly asymmetrical.

If your two times were nominally separated by around 0.5 you would  
not necessarily reliably determine which was fastest from the minima  
since the distributions of the minima could overlap.

Attached are two examples, histograms of the observed minimum for  
many trials and the observed median for many trials of the above  
scenario.  Notice that the histogram of the observed minimum is *far*  
broader than the observed median.





I suggest you experiment with this - you may be (unpleasantly)  
surprised; it's somewhat counter-intuitive an appropriate MATLAB (ok-
ok put away the pitchforks) snippet below

% EV
clear all;
v_median=zeros(50000,1); % Lots of trials to get a decent looking  
histogram.
v_min=v_median;
for m=1:length(v_median)
     v=randn(100,1).*sqrt(10); % Normally distributed, try other  
distributions.
     v_median(m)=median(v);
     v_min(m)=min(v);
end
figure(1);
hist(v_median,51);
figure(2);
hist(v_min,51);


> This excludes all outliers due to OS and such.

That may well be the aim, but I'm not sure it does get you what you  
want. This is where understanding of the maths (statistics) needs to  
be blended with an understanding of the machine.  Your use of the  
minimum could make it *less* possible to infer something from the  
measurements.  On the other hand you might be right - I don't know  
for sure -- but I am sceptical.

> If a car company reported the average (mean or median) 0 to 60 mph  
> for a given car when their test driver was trying to see how fast  
> it was everyone would think they were crazy.

That's all down to marketing.  I imagine they *do* determine it, they  
just don't report it that way. Most people couldn't care less what  
these numbers actually mean - just that it's 'better' than the other  
one.  In the infamous words of "Spinal Tap" - "it goes up to eleven!".

> Why should the fact that he pushed the gas too hard on some of the  
> trials and spun out and not hard enough on others count against the  
> car?

I imagine they test a number of different cars rather than just the  
same car a lot of times and that the reported time is some form of  
median.  This would mean that half the cars will go a little faster -  
half a little slower than this 'typical' car. Almost certainly the  
testing criteria and their analysis is more sophisticated than that -  
speculation on my part -- anyone in the car industry?

> I also typically don't have the luxury of benchmarking on an  
> unloaded machine, so I have to make sure to do things fairly, for  
> instance, instead of runing many trials of A then many trials of B  
> and taking the minimum for each I run many trials of A then B back  
> to back and take the minimum.  That way A and B on average see the  
> same environement over the course of the experiement.

Hopefully.  As someone pointed out "It's a dark art"!

> If I run A and B in the same process I run A first then B and then  
> B first and then A as a separate run to ensure that the order
>  doesn't impact their performance.

Just to add to the mix, I suggest randomising the order in which you  
run them (that's what I do) that way you are very unlikely to  
accidentally synchronise yourself with some other process e.g.

ABBBABAABAB etc...

> Usually benchmarking implies comparision, so the key concern is  
> that the results are fair.

Easy to write very hard to do! ;-)

Regards,

-ed

------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997




_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

median.pdf (6K) Download Attachment
minimum.pdf (6K) Download Attachment

Re: Boost.Plot? - Scalable Vector Graphics (was [expressive] performance tuning

by Simonson, Lucanus J :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edward Grace wrote:
>> Personally I perform several runs of the same benchmark and then
>> take the minimum time as the time I report.
>
> Be very cautious about that for a number of reasons.
>
> a) By reporting the minimum, without qualifying your statement, you
> are lying.  For example if you say "My code takes 10.112 seconds to
> run." people are unlikely to ever observe that time - it is by
> construction atypical.

I try to be explicit about explaining what metric I'm reporting.  If I'm reporting the minimum of ten trials I report it as the minimum of ten trials.  Taking the minimum of ten trials and comparing it to the minimum of twenty trials and not qaulifying it would be even more dishonest because it isn't fair.  I compare same number of trails, report that it is the minimum and the number of trails.

> b) If ever you do this (use the minimum or maximum) then you are
> unwittingly sampling an extreme value distribution:
>
>    http://en.wikipedia.org/wiki/Extreme_value_distribution
>
> For example - say you 'know' that the measured times are normally
> distributed with mean (and median) zero and a variance of 10.  If you
> take, say 100 measurements and then take the minimum - how is that
> minimum distributed?  In other words if your repeat this - again and
> again - what does the histogram of the minimum look like?
>
> What you will find is that the dispersion (variance or otherwise) on
> the median is small say \sigma^2=0.15 whereas the dispersion on the
> minimum can be much larger \sigma^2=1.8 - *and* the distribution is
> highly asymmetrical.
>
> If your two times were nominally separated by around 0.5 you would
> not necessarily reliably determine which was fastest from the minima
> since the distributions of the minima could overlap.
>
> Attached are two examples, histograms of the observed minimum for
> many trials and the observed median for many trials of the above
> scenario.  Notice that the histogram of the observed minimum is *far*
> broader than the observed median.

In this particular case of runtime the distribution is usually not normal.  There is some theoritical fastest the program could go if nothing like the OS or execution of other programs on the same machine cause it to run slower.  This is the number we actually want.  We can't measure it, but we observe that there is a "floor" under our data and the more times we run the more clear it becomes where that floor is.  It just won't run faster than that no matter how often we run.  The minimum value tends to be closer to the mode of the distribution than to the median because runtimes cluster around the minimum value and tail off above.  The idea is that using the minimum as the metric eliminates most of the "noise" introduced into the data by stuff unrelated to what you are trying to measure.  Taking the median is just taking the median of the noise.  In an ideal compute environment your runtimes would be so similar from run to run you wouldn't bother to run more than once.  If th
 ere is noise in your data set your statistic should be designed to eliminate the noise, not characterize it.

In my recent boostcon presentation my benchmarks were all for a single trial of each algorithm for each benchmark because noise would be eliminated during the curve fitting of the data for analysis of how well the algorithms scaled (if not by the conversion to logarithmic axis....)  If I had taken the minimum of some number of trials I would have reported that in the paper.

Regards,
Luke
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Gpderetta :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 30, 2009 at 3:24 PM, Edward Grace<ej.grace@...> wrote:

>>> I often forget about -DNDEBUG - this can have a significant impact ~10%
>>> for
>>> Spirit2 over atoi.  Presumably you define NDEBUG when compiling on
>>> Windows
>>> (or is that automatically assumed for 'Release' builds?).
>>>
>>>> Since I will be running it on the exact same computer
>>>> with the same OS, but different compilers, that will prove if it
>>>> really is GCC being much slower then VC, or if they are near the same
>>>> on my computer, then it is something else.
>>>
>>> I await with trepidation....
>>
>> Very useful info, thanks.  I altered slightly to support my setup,
>> since boost trunk is in the /etc/usr/lib or something like that, it is
>> picked up globally.  I tried compiling it multiple times, got the same
>> warning each time, never could find the output file.
>
> With -Werror any warnings will be interpreted as errors.  If there are
> compilation errors it won't build the object / executable.
>
>> Here is my
>> latest attempt, I tried fully qualified names, did not help:
>> OvermindDL1@overmind /cygdrive/s/cygwin/home/OvermindDL1/ejg_test
>> $ g++ -DNDEBUG -O3 -ansi -pedantic -Wall -Wno-long-long -Werror
>> -I/cygdrive/s/cygwin/home/OvermindDL1/ejg_test/other_includes -o
>> /cygdrive/s/cygwin/home/OvermindDL1/ejg_test/ejg_uint_parser
>> ejg_uint_parser_0_0_4_bind_1.cpp
>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h:
>> In copyconstructor `std::vector<_Tp, _Alloc>::vector(const
>> std::vector<_Tp, _Alloc>&) [with _Tp = std::string, _Alloc =
>> std::allocator<std::string>]':
>> /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++/bits/stl_uninitialized.h:82:
>> warning: '__cur' might be used uninitialized in this function
>
> Hmm..  I've never come across that before.  It looks like it's flagging an
> error relating to something in the setup of the iterators for parse_qi.
>
>
>> So I 'guess' it worked?  But I cannot find any file it created at all...
>>
>> Hmm, I am going to try removing -Wall.
>> Yep!  It worked!
>
> As a future tip it's better to remove -Werror if needed - that way you will
> still get any warnings - but they won't be mapped to errors.
>
>> Proceeding to timing tests.Calibrating overhead......done
>> Timer overhead (t_c) ~= : 9.67512
>> Jitter               ~= : 7.47951
>> qi_parse vs atoi     : 78.1417 81.6005 86.3038% faster.
>> qi_parse vs strtol  : 76.5284 85.2148 86.8329% faster.
>> strtol vs atoi       : -4.67955 -2.60676 -0.488886% faster.
>> qi_parse vs qi_parse : -0.900454 0.465715 1.85072% faster.
>>
>> All done!
>
> Hmm.  Very similar to my results - except for you strtol is slower than
> atoi.
>
>> I ran the Visual Studio build again, still got about the same as I got
>> in my last email, so yea, Visual Studio is a lot better on templates
>> then GCC is.  I wonder why GCC performs so much worse...
>
> Indeed - an open issue.  Perhaps I will play with some other options (march
> etc.) to see if there's something I've overlooked.  Well, I have to say I'm
> gaining more respect for the MS compiler - I last used it a very long time
> ago, back then it was awful!
>

*scratches head*

I must be missing something very obvious, but I do not see how your
and OvermindDL benchmarks can support this conclusion: as far as I can
tell, you two never compared the two compilers directly, what you are
seeing is that on GCC the speed up of qi_parse over atoi/strtol is
less than that of MSVC, but this tells nothing about the absolute
performance of the two compilers on this test (i.e. you never showed
any absolute times).

For what we know the gcc atoi might just be faster than msvc one. And
in fact a quick google search brings these two pages:

  http://tinyurl.com/mqa5yl [msvc8 atoi performance is 58% of that of msvc6]
  http://tinyurl.com/mzyw66 [thread containing a comparison of atoi
functions of different languages and compilers, in particular it seems
that MSVC atoi is really 2 times slower than gcc atoi]

A slow atoi on MSVC would explain such a difference in the tests,
assuming that the ability of both compilers to optimize qi_parse is
about the same.

Finally, I see from the error messages that OvermindDL is using
gcc.3.4, which is now fairly old, maybe he should try with a more
recent one. Also -march=native or whatever architecture he is using
might help.

HTH,

--
gpd
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
< Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next >