[xpressive] Performance Tuning?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next >

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> Do you think the overhead of calling through boost::bind could be
>> comparable to the length of time it takes to run the function?
>
> I don't know -- haven't looked -- where boost::bind was added or  
> why, but any unnecessary overhead should be eliminated unless it is  
> proven to be insignificant.

It seems that it's not a problem.

[....]

>> I suggest something that simply iterates over the test data but
>> does not check for correctness of parsing.  Although it won't
>> make a fat lot of difference in this case at least it's then
>> consistent - you're timing the parsers not the tests for
>> equality.  The correctness test could then be done later once
>> the timings are complete.
>
> I should have done that from the beginning.  It would be best, I  
> think, to run through the data once for each test, to verify the  
> result.  Then, timing runs can be done without any checks.  Both  
> should run from main() each time so that any optimizations are  
> validated before considering the performance.

Sure.

>> Does the size of the test data set matter?  In other words do you
>> notice similar speedups if the test data will all fit in cache?
>
> Wouldn't that give less representative performance results?

I guess it depends what you're trying to represent when measuring the  
performance.    If you're trying to represent the [totally contrived]  
case of parsing short segments of text that all fit in cache surely  
it's better?

Or is your question more regarding the distribution of the actual  
values under test rather than the length?

My suggestion is merely the curiosity of seeing if there is a cache  
dependent effect or not rather than a 'representative' measurement.  
I don't for a minute think it's a good idea to permanently modify  
someone's hard thought through and exhaustive testing.

As a tongue-in-cheek aside - there's potentially some merit in having  
a parser that's particularly fast for numbers starting with "123".

http://en.wikipedia.org/wiki/Benford%27s_law

Regards,

-ed
------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Stewart, Robert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edward Grace wrote:

>
> >> Does the size of the test data set matter?  In other words do you
> >> notice similar speedups if the test data will all fit in cache?
> >
> > Wouldn't that give less representative performance results?
>
> I guess it depends what you're trying to represent when
> measuring the performance.  If you're trying to represent the
> [totally contrived] case of parsing short segments of text that
> all fit in cache surely it's better?

I was presuming to suggest that the more representative test was
that of parsing across the spectrum of possible inputs such code
likely will parse in a real finance program rather than
contriving small tests to focus on isolated characteristics.

> Or is your question more regarding the distribution of the
> actual values under test rather than the length?

I don't think the lengths will vary much beyond what appears in
the test input other than, possibly, the lack of really short
inputs.

> My suggestion is merely the curiosity of seeing if there is a
> cache dependent effect or not rather than a 'representative'
> measurement.  I don't for a minute think it's a good idea to
> permanently modify someone's hard thought through and
> exhaustive testing.

I was afraid that the test code might be permanently altered and
lose sight of what I think is a fairly representative set of
test inputs.

_____
Rob Stewart                           robert.stewart@...
Software Engineer, Core Software      using std::disclaimer;
Susquehanna International Group, LLP  http://www.sig.com

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 28, 2009 at 7:35 AM, Edward Grace<ej.grace@...> wrote:
> Perhaps you could try a more canonical test - running
> "ejg_uint_parser_0_0_4.cpp"
>
> http://tinyurl.com/lro5ok
>
> That does not make use of boost::bind - but tries to avoid the optimizer
> getting rid of void(void) functions by using globals.

I have no clue why I am still awake, brain is already shut down, code
is a blurry blob to me right now, but I still compiled and ran your
latest file.  :)

The results:

initializing input strings...
Calibrating overhead......done
Timer overhead (t_c) ~= : 12
Jitter               ~= : 8.43769e-015
qi_parse vs atoi    : 1381.05 1385.57 1389.6% faster.
qi_parse vs strtol  : 1365.82 1389.6 1597.94% faster.
strtol vs atoi      : 1.03202 1.04052 10.6428% faster.
qi_parse vs qi_parse : 0 0 0% faster.



Checking that the results are correct...
atoi is behaving itself!
strtol is behaving itself!
qi is behaving itself!


All done!
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> Wouldn't that give less representative performance results?
>>
>> I guess it depends what you're trying to represent when
>> measuring the performance.  If you're trying to represent the
>> [totally contrived] case of parsing short segments of text that
>> all fit in cache surely it's better?
>
> I was presuming to suggest that the more representative test was
> that of parsing across the spectrum of possible inputs such code
> likely will parse in a real finance program

Benford's law it is then... ;-)

> rather than
> contriving small tests to focus on isolated characteristics.

Sure.

> I was afraid that the test code might be permanently altered and
> lose sight of what I think is a fairly representative set of
> test inputs.

No - no not at all... I fully understand your concern -- I'm just  
being curious!

-ed
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> That does not make use of boost::bind - but tries to avoid the  
>> optimizer
>> getting rid of void(void) functions by using globals.
>
> I have no clue why I am still awake, brain is already shut down, code
> is a blurry blob to me right now, but I still compiled and ran your
> latest file.  :)

Thanks.

> The results:
>
> initializing input strings...
> Calibrating overhead......done
> Timer overhead (t_c) ~= : 12
> Jitter               ~= : 8.43769e-015
> qi_parse vs atoi    : 1381.05 1385.57 1389.6% faster.
> qi_parse vs strtol  : 1365.82 1389.6 1597.94% faster.
> strtol vs atoi      : 1.03202 1.04052 10.6428% faster.
> qi_parse vs qi_parse : 0 0 0% faster.

Hmmm....  So you observe the same as me.  Spirit2 ~1000% faster than  
the basic libraries as opposed to 25% faster from the previous tests.

"Curiouser and curiouser" said Alice.

-ed
------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 28, 2009 at 8:49 AM, Edward Grace<ej.grace@...> wrote:

>>> That does not make use of boost::bind - but tries to avoid the optimizer
>>> getting rid of void(void) functions by using globals.
>>
>> I have no clue why I am still awake, brain is already shut down, code
>> is a blurry blob to me right now, but I still compiled and ran your
>> latest file.  :)
>
> Thanks.
>
>> The results:
>>
>> initializing input strings...
>> Calibrating overhead......done
>> Timer overhead (t_c) ~= : 12
>> Jitter               ~= : 8.43769e-015
>> qi_parse vs atoi    : 1381.05 1385.57 1389.6% faster.
>> qi_parse vs strtol  : 1365.82 1389.6 1597.94% faster.
>> strtol vs atoi      : 1.03202 1.04052 10.6428% faster.
>> qi_parse vs qi_parse : 0 0 0% faster.
>
> Hmmm....  So you observe the same as me.  Spirit2 ~1000% faster than the
> basic libraries as opposed to 25% faster from the previous tests.
>
> "Curiouser and curiouser" said Alice.

Heh, when my brain is functional again, if I remember (or if I am
reminded by this email) I will run it through a profiler and see what
is going on as well as looking at the disassembly to see what kind of
code it all creates.

It is definitely odd though, if Spirit2.1 really is faster, then you
would think the compiler library programmers would do a better job
then what they are currently doing.  I am really curious to find out
what is going on, hopefully delving into assembly later will shed some
light.
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> Heh, when my brain is functional again, if I remember (or if I am
> reminded by this email) I will run it through a profiler and see what
> is going on as well as looking at the disassembly to see what kind of
> code it all creates.

That would be very instructive - not something I'm really much good  
at anymore....
>


Cheers,

-ed

------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 28, 2009 at 9:00 AM, Edward Grace<ej.grace@...> wrote:
>>
>> Heh, when my brain is functional again, if I remember (or if I am
>> reminded by this email) I will run it through a profiler and see what
>> is going on as well as looking at the disassembly to see what kind of
>> code it all creates.
>
> That would be very instructive - not something I'm really much good at
> anymore....

I used to do it all the time, editing disassembly to make programs do
other things was always fun.  :)
Been a while though.

I do not think I have the cognitive capabilities to be able to read
assembler at the moment, but I went ahead and profiled the
application, results are attached to preserve formatting.

CS:EIP   Symbol + Offset                                                                                     64-bit Total % Timer samples
0x409dd0 ejg::statistics::ks_test<std::_Vector_iterator<double,std::allocator<double> >,double>                   12.37   11661        
0x40a400 std::_Unguarded_partition<double *>                                                                       9.74     9183          
0x40d330 strtoxl                                                                                                   9.71     9157          
0x408f40 ejg::generic_timer<_LARGE_INTEGER,long>::measure_execution_result<_LARGE_INTEGER (__cdecl*)(void)>       3.75     3533          
0x401a50 wrap_qi_parse                                                                                             3.47     3272          
0x40aa80 std::_Insertion_sort1<double *,double>                                                                   0.94     884          
0x401910 wrap_atoi                                                                                                 0.66     625          
0x40d55b strtol                                                                                                   0.6     569          
0x4019b0 wrap_strtol                                                                                               0.43     401          
0x40d2ae _LocaleUpdate::_LocaleUpdate                                                                             0.39     364          
0x40d823 atol                                                                                                     0.21     194          
0x40a7f0 std::_Median<double *>                                                                                   0.18     174          
0x4099d0 std::_Sort<double *,int>                                                                                 0.08     77            
0x413db4 TrailUpVec                                                                                               0.05     47            
0x401900 elapsed                                                                                                   0.03     27            
0x409d20 ejg::generic_timer<_LARGE_INTEGER,long>::time_raw_atom<void (__cdecl*)(void)>                             0.03     27            
0x40d834 atoi                                                                                                     0.03     26            
0x40d839 memmove_s                                                                                                 0.03     25            
0x413c40 memmove                                                                                                   0.02     19            
0x4018e0 getticks                                                                                                 0.01     14            
0x40a6d0 ejg::statistics::qks<double>                                                                             0.01     6            
0x413ee4 UnwindDownVec                                                                                             0.01     8            
0x4203fa __libm_sse2_exp                                                                                           0.01     5            
0x4068d0 std::vector<double,std::allocator<double> >::_Construct_n                                                 0       1            
0x40cc1c clock                                                                                                     0       3            
0x40cf80 ceil                                                                                                     0       3            
0x40cfc0 _ceil_pentium4                                                                                           0       2            
0x411970 _aulldiv                                                                                                 0       2            
0x4119e0 _allmul                                                                                                   0       1            
0x413d48 UnwindUpVec                                                                                               0       2            
0x413f50 TrailDownVec                                                                                             0       4            

31 functions, 718 instructions, Total: 40316 samples, 100.00% of samples in the module, 42.76% of total session samples
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 28, 2009 at 9:05 AM, OvermindDL1<overminddl1@...> wrote:

> On Tue, Jul 28, 2009 at 9:00 AM, Edward Grace<ej.grace@...> wrote:
>>>
>>> Heh, when my brain is functional again, if I remember (or if I am
>>> reminded by this email) I will run it through a profiler and see what
>>> is going on as well as looking at the disassembly to see what kind of
>>> code it all creates.
>>
>> That would be very instructive - not something I'm really much good at
>> anymore....
>
> I used to do it all the time, editing disassembly to make programs do
> other things was always fun.  :)
> Been a while though.
>
> I do not think I have the cognitive capabilities to be able to read
> assembler at the moment, but I went ahead and profiled the
> application, results are attached to preserve formatting.
>

Well, atoi's assembly is a function call, no inlining there...

And strtol in assembly is a function call too, no inlining or anything fancy.

And heh, Spirit is completely inlined except for the call to
boost::spirit::qi::detail::extract_int<int,10,1,-1,etc...
However that function is only called the very first time that function
is called.  Remember that Spirit increments the passed in iterator, so
all the start iterators in that vector ended up being incremented to
be equal to the last iterator.  :p
I change the wrap_qi_parse function to this (introduced a temporary so
the temporary is incremented instead of the thing you have stored in
the vector):
void wrap_qi_parse() {
  for (int i = 0; i < BUFFER_SIZE; ++i)
  {
          char const *iter = f[i];
    qi::parse(iter, l[i], int_, v[i]);
  }

}

When I compile and run the tests now, I get this:
initializing input strings...
Calibrating overhead......done
Timer overhead (t_c) ~= : 12
Jitter               ~= : 8.43769e-015
qi_parse vs atoi    : 170.429 170.438 170.482% faster.
qi_parse vs strtol  : 167.589 167.601 167.668% faster.
strtol vs atoi      : 1.04669 1.05746 1.06165% faster.
qi_parse vs qi_parse : 0 0 0% faster.



Checking that the results are correct...
atoi is behaving itself!
strtol is behaving itself!
qi is behaving itself!


All done!

That is a *lot* more reasonable, although Spirit is still most
definitely faster then the built-in functions.  :)

Heh, what do you know, my shut down brain still had some living brain
cells.  I have actually found that I, oddly, do my best debugging when
really tired, at least when I was younger, guess I still do.  :)
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>
> And heh, Spirit is completely inlined except for the call to
> boost::spirit::qi::detail::extract_int<int,10,1,-1,etc...
> However that function is only called the very first time that function
> is called.  Remember that Spirit increments the passed in iterator, so
> all the start iterators in that vector ended up being incremented to
> be equal to the last iterator.  :p

When you write the above, do you mean ** -> my change

> However that function is only called the very first time *qi::parse*
> is called.  Remember that Spirit increments the passed in iterator, so
>


In other words only the first call does anything - repeated calls to  
qi::parse do nothing because the iterator is already at the end?

Good catch.

> I change the wrap_qi_parse function to this (introduced a temporary so
> the temporary is incremented instead of the thing you have stored in
> the vector):
> void wrap_qi_parse() {
>   for (int i = 0; i < BUFFER_SIZE; ++i)
>   {
>  char const *iter = f[i];
>     qi::parse(iter, l[i], int_, v[i]);
>   }
>
> }
>
> When I compile and run the tests now, I get this:
> initializing input strings...
> Calibrating overhead......done
> Timer overhead (t_c) ~= : 12
> Jitter               ~= : 8.43769e-015
> qi_parse vs atoi    : 170.429 170.438 170.482% faster.
> qi_parse vs strtol  : 167.589 167.601 167.668% faster.
> strtol vs atoi      : 1.04669 1.05746 1.06165% faster.
> qi_parse vs qi_parse : 0 0 0% faster.
>
>
>
> Checking that the results are correct...
> atoi is behaving itself!
> strtol is behaving itself!
> qi is behaving itself!

Good stuff.  Thanks.

I might just re-jig that to try and make use of boost::bind instead  
of that bad roll-your-own wrapper function.  What's needed is a 'fire  
and forget' wrapping technique that can be applied to arbitrary  
functions and still give sensible answers.

> That is a *lot* more reasonable, although Spirit is still most
> definitely faster then the built-in functions.  :)

That's good though - one up for Boost!

> Heh, what do you know, my shut down brain still had some living brain
> cells.  I have actually found that I, oddly, do my best debugging when
> really tired, at least when I was younger, guess I still do.  :)

It's amazing what the promise of imminent sleep can do to motivate a  
battered cerebral cortex.  It's only playing along because it knows  
it's the only way it'll get to rest!

-ed
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> When I compile and run the tests now, I get this:
> initializing input strings...
> Calibrating overhead......done
> Timer overhead (t_c) ~= : 12
> Jitter               ~= : 8.43769e-015
> qi_parse vs atoi    : 170.429 170.438 170.482% faster.
> qi_parse vs strtol  : 167.589 167.601 167.668% faster.
> strtol vs atoi      : 1.04669 1.05746 1.06165% faster.
> qi_parse vs qi_parse : 0 0 0% faster.

Hi OvermindDL1,

When you've woken up would you mind taking a quick squiz at the  
following (anyone else - please feel free)

   ejg_uint_parser_0_0_4_bind_1.cpp

in the following part of Boost Vault, http://tinyurl.com/lro5ok

It's an attempt to crystallise the boo-boo you pointed out.    I've  
tried to do everything without ghastly global variables - it's also a  
salient lesson on const correctness.  If that'd been observed in the  
first place the iterator cock-up wouldn't have happened.


==================
$ ./ejg_uint_parser
Enter buffer size: 10000
initializing input strings...



Checking that the parsers are functioning  correctly...
atoi is behaving itself!
strtol is behaving itself!
qi is behaving itself!

Proceeding to timing tests.Calibrating overhead......done
Timer overhead (t_c) ~= : 117.426
Jitter               ~= : 25.9133
qi_parse vs atoi    : 86.0764 86.3074 86.4471% faster.
qi_parse vs strtol  : 71.9253 72.1881 72.5288% faster.
strtol vs atoi      : 8.0502 8.26097 8.47215% faster.
qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster.


All done!
====================


On my platform this is entirely consistent with the simple one-liner  
modification you mentioned to the previous code.

Take home message - yes Spirit really *is* faster.

------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Edward Grace wrote:

>> That is a *lot* more reasonable, although Spirit is still most
>> definitely faster then the built-in functions.  :)
>
> That's good though - one up for Boost!

My latest benchmarks for integers and floating points reveal
a 3x speed over atoi/strtol and friend C functions. You
mentioned a need to parse small numbers very quickly? Spirit
does. The tests I have take that into consideration too. If you
guys want to take a peek, it's in the Boost trunk in libs/benchmarks.

Some numbers:

///////////////////////////////////////////////////////////////////////////
atoi_test:            0.9265067422 [s] {checksum: d5b76d60}
strtol_test:          1.0766213977 [s] {checksum: d5b76d60}
spirit_int_test:      0.3097019879 [s] {checksum: d5b76d60}

///////////////////////////////////////////////////////////////////////////
atof_test:            7.3012049917 [s] {checksum: 3b7d82b0}
strtod_test:          8.0042894122 [s] {checksum: 3b7d82b0}
spirit_double_test:   2.6729373333 [s] {checksum: 3b7d82b0}

This time, I am using the benchmarking harness by David Abrahams,
Matthias Troyer, Michael Gauckler. It's the one used in timing
the overhead of Boost.Parameter. You are welcome to try it out
with your timer, Edward.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 28 Jul 2009, at 18:17, Joel de Guzman wrote:

> Edward Grace wrote:
>
>>> That is a *lot* more reasonable, although Spirit is still most
>>> definitely faster then the built-in functions.  :)
>>
>> That's good though - one up for Boost!
>
> My latest benchmarks for integers and floating points reveal
> a 3x speed over atoi/strtol and friend C functions. You
> mentioned a need to parse small numbers very quickly? Spirit
> does. The tests I have take that into consideration too. If you
> guys want to take a peek, it's in the Boost trunk in libs/benchmarks.

Sure.  Can you give me an exact url?  The last time I went on a code  
hunt in SVN I found the wrong thing.

> Some numbers:
>
> //////////////////////////////////////////////////////////////////////
> /////
> atoi_test:            0.9265067422 [s] {checksum: d5b76d60}
> strtol_test:          1.0766213977 [s] {checksum: d5b76d60}
> spirit_int_test:      0.3097019879 [s] {checksum: d5b76d60}
>
> //////////////////////////////////////////////////////////////////////
> /////
> atof_test:            7.3012049917 [s] {checksum: 3b7d82b0}
> strtod_test:          8.0042894122 [s] {checksum: 3b7d82b0}
> spirit_double_test:   2.6729373333 [s] {checksum: 3b7d82b0}
>
> This time, I am using the benchmarking harness by David Abrahams,
> Matthias Troyer, Michael Gauckler.

This?

http://tinyurl.com/kk858o

There's some interesting trickery in there by the looks of things for  
eliminating the optimiser nastiness - that's not something I've  
thought about much I'll take a look.

In the comments,

42 // operation to at least update the L1 cache.  *** Note: This
43 // concern is specific to the particular application at which
44 // we're targeting the test. ***

that seems quite important but a little opaque out of context.

One thing I take exception to is the (effective) use of the mean as a  
measurement of central tendency - perhaps their trickery has  
eliminated the heavy tail.  I'll have to take a look and see how it  
compares to my approach.

How do your relative timings compare if you repeat them while (say)  
watching a DVD? [*]

> It's the one used in timing
> the overhead of Boost.Parameter. You are welcome to try it out
> with your timer, Edward.

Likewise, can you be specific about where to look?

Thanks for flagging this up by the way - who knows maybe I've been  
reinventing the wheel!

Cheers,

-ed

[*] This may seem a perverse question. I'm interested in robust  
performance measurement, in other words accurately working out which  
function is fastest while the machine is under choppy loading -- not  
a sanitised testing environment - so the fastest function can be  
selected by the code itself.

------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by David Abrahams-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


on Tue Jul 28 2009, Edward Grace <ej.grace-AT-imperial.ac.uk> wrote:

> This?
>
> http://tinyurl.com/kk858o
>
> There's some interesting trickery in there by the looks of things for eliminating the
> optimiser nastiness - that's not something I've  thought about much I'll take a look.
>
> In the comments,
>
> 42 // operation to at least update the L1 cache.  *** Note: This
> 43 // concern is specific to the particular application at which
> 44 // we're targeting the test. ***
>
> that seems quite important but a little opaque out of context.

It means the application we were going to use this technique on was
going to update a long series of accumulators, which will certainly not
fit in a few registers.  We wanted to force the test to reflect that
reality.

> One thing I take exception to is the (effective) use of the mean as a
> measurement of central tendency

Why?

> - perhaps their trickery has eliminated the heavy tail.

Why would one assume there is a heavy tail in the first place?

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Paul Baxter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm really appreciative of the testing you are all doing with parsing code
and hope that at the end we can see both how fast and how clear and
maintainable the various styles of parser code become (focus on accuracy,
speed then perhaps more on the clarity).

Its great to see the enthusiasm and results. Looking forward to more tips
once we have a fair test directly pitting spirit with expressive and other
parsing methodologies.

I also am encouraged by what Edward's timer might add though I'm a little
wary of possible inclusion of code following FFTW's license as it may be
incompatible with Boost.

Edward, I do wonder about the statistical significance metrics you provide
(great idea by the way).
I'm wondering if your code assumes a normal timing duration distribution and
if so, can it 'measure' whether the distribution of results conform to this
assumption.

In my experience, timing benchmarks can suffer from having outliers (usually
OS-induced by not having a RTOS) that make the distribution less than
'normal'.

Would it be possible to establish that the given sample set of timings
correspond to a normal distribution (or perhaps discard a certain percentage
of outliers if necessary). I'm no statistics person, but I have seen cases
where 10,000 timing samples have been biased by 10 samples that probably
relate to a program issue or a severe OS issue. e.g. normally 1 ms per
iteration, 10 @ 1-10s each

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>
>> 42 // operation to at least update the L1 cache.  *** Note: This
>> 43 // concern is specific to the particular application at which
>> 44 // we're targeting the test. ***
>>
>> that seems quite important but a little opaque out of context.
>
> It means the application we were going to use this technique on was
> going to update a long series of accumulators, which will certainly  
> not
> fit in a few registers.  We wanted to force the test to reflect that
> reality.

Ah, I did some hunting and found things for the iterative calculation  
of the mean - presumably this is an embodiment of the original  
application.

How various caches interact is not something I've given a great deal  
of thought to.  In practice it can presumably make or break the real  
'speed' of something.  I can see that there's a great difference in  
timing a large bunch of iterations of the same thing one after  
another and the actual typical execution characteristics of the same  
function in a different context.

This kind of thing is something I have to shrug my shoulders at  - I  
can see it's a potential issue but don't know enough to comment.

>> One thing I take exception to is the (effective) use of the mean as a
>> measurement of central tendency
>
> Why?

The mean is not a robust measure of central tendency.  Robust, in  
this context, means that a single arbitrarily large observation can  
shift the estimator (the mean in this case) arbitrarily far.

The median on the other hand is robust, a single arbitrarily large  
observation may not move the median at all.

Technically the 'breakdown point' of the mean is zero. For the  
obligatory Wikipedia overview see:

   http://tinyurl.com/l4vldr

>> - perhaps their trickery has eliminated the heavy tail.
>
> Why would one assume there is a heavy tail in the first place?

Informally speaking one may expect the OS to very-very occasionally  
interrupt a running function that nominally takes (say) 100 cycles  
and run of to do some disk IO.  That delay may will run into hundreds  
of millions of clock cycles.  Consequently that observation may  
differ from all the others by (say) six orders of magnitude.

Most probably of course that will not occur; that's a long tail -  
very rare events of extreme magnitude.  A bit like returns in stock  
market crashes only less frequent. ;-)

Less informally I've measured it!  On a log-log histogram plot of  
~10^7 measurements even trivial functions display a definite hint of  
Pareto-like [ http://tinyurl.com/ngnyk2 ] behaviour in the wings.

-ed


------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Edward Grace :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
Paul,

[Excuse the following - perhaps it's an attack of  logorrhea ]

> I also am encouraged by what Edward's timer might add though I'm a  
> little
> wary of possible inclusion of code following FFTW's license as it  
> may be
> incompatible with Boost.

I'm pretty sure the FFTW license is Boost compatible (judging by the  
comments in the header and the BSD-like adoption of FFTW in many  
commercial and open source projects) however my aim is for a robust  
performance framework that is chronometer agnostic - there's no need  
to use FFTW's timer.

The getticks() macro just happens to be a handy cross platform  
implementation!

There are other concerns that I need to address regarding the  
licensing and provenance of this code. For now I am simply keen to  
see if it works for other people and in other real world situations -  
hence the application to the boost::spirit / xpressive debate.

After all, you can read books on nonparametric statistical inference  
until you are paralysed with inaction - the only real way to know if  
it works is to try it and get other people involved and apply it in  
real-world situations.

 From that I'm sure I will learn a lot and then be in a situation to  
weed out unnecessary cruft (e.g. bootstrap resampling) and add useful  
stuff (potentially the cache related code mentioned earlier).  After  
that it's a case of reviewing everything that's left after I've done  
my weeding and making sure it's  appropriate.

> Edward, I do wonder about the statistical significance metrics you  
> provide
> (great idea by the way).
> I'm wondering if your code assumes a normal timing duration  
> distribution and
> if so, can it 'measure' whether the distribution of results conform  
> to this
> assumption.

It makes few assumptions about any underlying probability density  
function (not even that it's strongly stationary) since it's based on  
nonparametric techniques.

Key to this is that, while the underlying distribution may not be  
normal or even known, the distribution of certain statistics  
estimated from a sample *is* well known.

For example, the median is asymptotically normal (with certain  
constraints on the PDF).  In other words if you take a large sample  
and calculate the median - then repeat this ad nauseam, the resulting  
medians will be approximately normally distributed.

For a concrete example this is put to use in the calculation of the  
confidence bounds - by using the Wilcoxon signed-rank test for which  
the W+ statistic has a well known distribution.  We can therefore  
determine if one function is faster than another in a (hopefully)  
statistically significant way.

> In my experience, timing benchmarks can suffer from having outliers

I don't view these as 'outliers' in the normal usage of 'measurement  
error' - they are measurements that are just as valid as any other -  
one just has to ask "what is this measuring - the function under test  
or the OS?".  Often it's the latter but there is (as I see it) no  
clear border-line between classifying any given measurement as one  
and not the other. Consequently I view 'throwing away' (trimming)  
measurements with great suspicion.

> (usually
> OS-induced by not having a RTOS)

RTOS? Real-time OS??  Do these eliminate this effect?

> that make the distribution less than
> 'normal'.

The distribution would not be normal even in principle.  It must have  
a well defined minimum, but (in principle) has no upper bound - the  
(gross) lack of symmetry rules out normality in a heartbeat!

> Would it be possible to establish that the given sample set of timings
> correspond to a normal distribution (or perhaps discard a certain  
> percentage
> of outliers if necessary).

It is possible to determine if the sample corresponds to a known  
distribution - the Kolmogorov-Smirnov test. However which one should  
we check?  There is no fundamental requirement for the measurements  
to belong to any family - even vaguely.  In fact it may not even be  
smooth!

> I'm no statistics person, but I have seen cases
> where 10,000 timing samples have been biased by 10 samples that  
> probably
> relate to a program issue or a severe OS issue. e.g. normally 1 ms per
> iteration, 10 @ 1-10s each

Yes - that issue has a dramatic impact on the robustness of (or lack  
thereof) of the mean.

Often of course these things are even vaguely IID (independent and  
identically distributed) - burst like behaviour can lead to strongly  
non-stationary behaviour.  For example some garbage collector might  
go nuts and skew your timing.  That's why, in my view, it's so  
important to compare relative timings and to obtain confidence bounds.

Pragmatically one might argue that when you're doing these things at  
the console during development and making decisions it's not really a  
major issue.  One can look at the numbers and say "hmm, fishy - I  
don't trust that" and tinker around until the result is in line with  
your expectation.  That's part of the art that is experimental  
science and many of us do it everyday without even realising.

How to codify that opinion into something that's autonomous and  
capable of e.g. selecting the 'correct' function in a high-load multi-
user environment in the real world is far from clear however.

That is my goal.

-ed
------------------------------------------------
"No more boom and bust." -- Dr. J. G. Brown, 1997

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 28, 2009 at 11:09 AM, Edward Grace<ej.grace@...> wrote:

>> When I compile and run the tests now, I get this:
>> initializing input strings...
>> Calibrating overhead......done
>> Timer overhead (t_c) ~= : 12
>> Jitter               ~= : 8.43769e-015
>> qi_parse vs atoi    : 170.429 170.438 170.482% faster.
>> qi_parse vs strtol  : 167.589 167.601 167.668% faster.
>> strtol vs atoi      : 1.04669 1.05746 1.06165% faster.
>> qi_parse vs qi_parse : 0 0 0% faster.
>
> Hi OvermindDL1,
>
> When you've woken up would you mind taking a quick squiz at the following
> (anyone else - please feel free)
>
>  ejg_uint_parser_0_0_4_bind_1.cpp
>
> in the following part of Boost Vault, http://tinyurl.com/lro5ok
>
> It's an attempt to crystallise the boo-boo you pointed out.    I've tried to
> do everything without ghastly global variables - it's also a salient lesson
> on const correctness.  If that'd been observed in the first place the
> iterator cock-up wouldn't have happened.
>
>
> ==================
> $ ./ejg_uint_parser
> Enter buffer size: 10000
> initializing input strings...
>
>
>
> Checking that the parsers are functioning  correctly...
> atoi is behaving itself!
> strtol is behaving itself!
> qi is behaving itself!
>
> Proceeding to timing tests.Calibrating overhead......done
> Timer overhead (t_c) ~= : 117.426
> Jitter               ~= : 25.9133
> qi_parse vs atoi    : 86.0764 86.3074 86.4471% faster.
> qi_parse vs strtol  : 71.9253 72.1881 72.5288% faster.
> strtol vs atoi      : 8.0502 8.26097 8.47215% faster.
> qi_parse vs qi_parse : -0.0274542 0.0393936 0.231944% faster.
>
>
> All done!
> ====================
>
>
> On my platform this is entirely consistent with the simple one-liner
> modification you mentioned to the previous code.
>
> Take home message - yes Spirit really *is* faster.

Enter buffer size: 10000
initializing input strings...



Checking that the parsers are functioning  correctly...
atoi is behaving itself!
strtol is behaving itself!
qi is behaving itself!

Proceeding to timing tests.Calibrating overhead......done
Timer overhead (t_c) ~= : 12
Jitter               ~= : 8.43769e-015
qi_parse vs atoi     : 160.834 187.892 197.781% faster.
qi_parse vs strtol  : 152.088 173.709 197.184% faster.
strtol vs atoi       : 5.34019 7.29527 9.82952% faster.
qi_parse vs qi_parse : -3.12862 -0.194198 1.53912% faster.

All done!


MSVC definitely compiles templates code better then GCC it seems (you
said you were using GCC yes?).

Also, I think I might know why QI is faster then atoi/strtol.
atoi/strtol handle local as I recall, QI does not...
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OvermindDL1 wrote:
>
> Also, I think I might know why QI is faster then atoi/strtol.
> atoi/strtol handle local as I recall, QI does not...

Sorry, I can't parse that. What do you mean by "handle local"?

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [xpressive] Performance Tuning?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 28, 2009 at 8:14 PM, Joel de
Guzman<joel@...> wrote:
> OvermindDL1 wrote:
>>
>> Also, I think I might know why QI is faster then atoi/strtol.
>> atoi/strtol handle local as I recall, QI does not...
>
> Sorry, I can't parse that. What do you mean by "handle local"?

Er, locale*
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
< Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 | Next >