Updated version of futures library

View: New views
10 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

Re: Updated version of futures library

by Peter Dimov-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Frank Mori Hess:
> I didn't mention it, but I was thinking of a scheduler picking method
> requests
> and dispatching them using a single thread.

Hm. Are you envisaging a concurrency model in which every active_function
runs its own scheduler thread? I probably need to take a look at the source
of libpoet.

>> wait_for_any( waiting_ );
>
> A scheduler which takes O(N) to dispatch a method request is considered
> inefficient.  N being the number of pending method requests in the
> scheduler.

The pseudocode isn't even O(N); it's O(NlnN) at best, maybe worse. :-) Do
note that it doesn't necessarily dispatch a single request per wait_for_any
call though.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Updated version of futures library

by Peter Dimov-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Frank Mori Hess:

>> wait_for_any( waiting_ );
>
> A scheduler which takes O(N) to dispatch a method request is considered
> inefficient.  N being the number of pending method requests in the
> scheduler.
> The call to wait_for_any alone is O(N), thus my motivation for using
> future_selector for a scheduler.

Using algorithmic complexity to evaluate the performance of a call that
blocks because you have nothing to do is not quite correct. Let's tidy up
the pseudocode a bit:

list< future<void> > deps_;

wait_for_any( deps_ );

for each in deps_:
    if ready() erase;

for each pending task:
    if all dependencies are ready()
        execute task

I believe that the "if all dependencies are ready" part corresponds to your
method_request::ready.

One can reasonably assume that wait_for_any will return early if it finds a
ready() future, bit even if it doesn't, we can optimize the loop to only
call wait_for_any when there was no activity on the last iteration.

So we only enter wait_for_any when none of deps_ are ready() and it blocks.
It doesn't really matter what the overhead of wait_for_any is (as long as it
isn't pathological) since it's usually going to block anyway. The worse the
overhead of wait_for_any, the less often we'll enter it. What matters, in
the end, is whether the rest of the scheduler can keep up with the outside
world submitting requests; if wait_for_any turns into a bottleneck, it
simply won't be called at all.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

[futures] operator< (was Re: Updated version of futures library)

by Frank Mori Hess :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 02 June 2008 18:12 pm, Peter Dimov wrote:
> The pseudocode isn't even O(N); it's O(NlnN) at best, maybe worse. :-) Do
> note that it doesn't necessarily dispatch a single request per wait_for_any
> call though.

Your pseudocode putting the futures in a set does bring up a missing feature
in both the proposals.  They don't provide an operator<() that is a strict
weak ordering so they can be used as keys.  A future_handle should have no
problem providing an ordering where two future_handles are equivalent if and
only if they both reference the same promise.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIRIYN5vihyNWuA4URAukHAJwNo5McItLrdn28lIq2cDEZR8UITQCfdqOt
alZDXfzr+2bV77IniW4LdNU=
=jzgI
-----END PGP SIGNATURE-----
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Updated version of futures library

by Frank Mori Hess :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 02 June 2008 19:17 pm, Peter Dimov wrote:

> Frank Mori Hess:
> >> wait_for_any( waiting_ );
> >
> > A scheduler which takes O(N) to dispatch a method request is considered
> > inefficient.  N being the number of pending method requests in the
> > scheduler.
> > The call to wait_for_any alone is O(N), thus my motivation for using
> > future_selector for a scheduler.
>
> Using algorithmic complexity to evaluate the performance of a call that
> blocks because you have nothing to do is not quite correct. Let's tidy up
> the pseudocode a bit:
>
> list< future<void> > deps_;
>
> wait_for_any( deps_ );
>
> for each in deps_:
>     if ready() erase;
>
> for each pending task:
>     if all dependencies are ready()
>         execute task
>
> I believe that the "if all dependencies are ready" part corresponds to your
> method_request::ready.
>
> One can reasonably assume that wait_for_any will return early if it finds a
> ready() future, bit even if it doesn't, we can optimize the loop to only
> call wait_for_any when there was no activity on the last iteration.
>
> So we only enter wait_for_any when none of deps_ are ready() and it blocks.
> It doesn't really matter what the overhead of wait_for_any is (as long as
> it isn't pathological) since it's usually going to block anyway. The worse
> the overhead of wait_for_any, the less often we'll enter it. What matters,
> in the end, is whether the rest of the scheduler can keep up with the
> outside world submitting requests; if wait_for_any turns into a bottleneck,
> it simply won't be called at all.

Okay, lets try ignoring the wait_for_any call.  Say you have on average N
method requests just sitting in your scheduler that are not ready, and have
on average M method requests that are ready to execute, and each method
request has on average P dependencies (I'd imagine P would probably be
small).  That means each pass through your pseudocode will take O(N*P) to
iterate through all the unready dependencies of the unready method requests.  
Then it will take O(M+N) to go through all the method requests and execute
the M ready tasks.  That results in O(N*P/M+1) per method request executed.  
Having M in the denominator is nice, as you would expect it to scale with N,
and it means the scheduler probably won't break down under heavy load.  

On the downside, in a common case I'd expect M to be small (between 0 and 1),
with performance concerns about the scheduler starting to come into play as M
approaches 1.  In my case, where the scheduler thread is executing the tasks
itself (as opposed to sending them to a thread pool for instance), if I were
in optimization mode I would start looking at breaking things up more finely
once M hit 1, to maximize use of available processors.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIRJbc5vihyNWuA4URAjs4AKCnkXU1L1I16IptutBfU30IIhL19wCePg5p
OQE5Qc7GW3032ITk8Ht36e0=
=Njyz
-----END PGP SIGNATURE-----
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Updated version of futures library

by Frank Mori Hess :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 02 June 2008 18:12 pm, Peter Dimov wrote:
> Frank Mori Hess:
> > I didn't mention it, but I was thinking of a scheduler picking method
> > requests
> > and dispatching them using a single thread.
>
> Hm. Are you envisaging a concurrency model in which every active_function
> runs its own scheduler thread? I probably need to take a look at the source
> of libpoet.

By default they do.  They can also share a scheduler, so you can have an
active object consisting of a bunch of active_functions all sharing one
scheduler.  That gives some automatic thread safety since all the user code
run by the active_functions sharing the scheduler is run in the same thread.  
If each active_function corresponds to a method on an underlying ordinary
object, then methods of the underlying object can all access and modify the
object's internal data without worrying about concurrent access, since they
will all be running in the same thread.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIRJjX5vihyNWuA4URAphWAJ9HgIFSOB7m5HZM7CKsq+rUUf/mCQCdFVO8
5BrfuXTZsOc07MX7uT7d7UM=
=nBzZ
-----END PGP SIGNATURE-----
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Updated version of futures library

by Peter Dimov-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Frank Mori Hess:
...

>> for each in deps_:
>>     if ready() erase;
>>
>> for each pending task:
>>     if all dependencies are ready()
>>         execute task
...

> Okay, lets try ignoring the wait_for_any call.  Say you have on average N
> method requests just sitting in your scheduler that are not ready, and
> have
> on average M method requests that are ready to execute, and each method
> request has on average P dependencies (I'd imagine P would probably be
> small).  That means each pass through your pseudocode will take O(N*P) to
> iterate through all the unready dependencies of the unready method
> requests.

The number of unready dependencies can be lower than N*P, but let's ignore
that for now.

> Then it will take O(M+N) to go through all the method requests and execute
> the M ready tasks.  That results in O(N*P/M+1) per method request
> executed.

I think that going through the method requests takes (N+M)*P, because of the
"all dependencies are ready" part. (I think that this is also true for your
scheduler.) So we have (2*N+M)*P total, or (2*N/M+1)*P per executed request.

> Having M in the denominator is nice, as you would expect it to scale with
> N,
> and it means the scheduler probably won't break down under heavy load.
>
> On the downside, in a common case I'd expect M to be small (between 0 and
> 1),
> with performance concerns about the scheduler starting to come into play
> as M
> approaches 1.

If the scheduler performance is insufficient, it will not be able to
maintain a low M. If it does manage to maintain a low M, then its
performance is sufficient.

Stated differently, in the additional N*P time taken by the first loop, some
of the futures will become ready that wouldn't otherwise, and the second
loop will execute more tasks, automatically diluting the overhead.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Updated version of futures library

by Peter Dimov-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I wrote that

> If the scheduler performance is insufficient, it will not be able to
> maintain a low M. If it does manage to maintain a low M, then its
> performance is sufficient.

Now that I think about it, this isn't true for the typical fork-join
parallelism.

http://gee.cs.oswego.edu/dl/papers/fj.pdf

If we have for example an active function fib(n) that returns
fib(n-1)+fib(n-2), it will quickly generate a large N. In this case however,
if the task queue is actually a stack, it can be reduced in a single
execution as the older tasks will become ready as soon as the newer tasks
are done. (But it will take logN to fill it. On second thought, with a FIFO
policy it will be generated in a single pass and then reduced in logN
passes. So it's basically the same.)

Not that fork-join parallelism makes much sense for a single-threaded active
object. :-)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: Updated version of futures library

by Frank Mori Hess-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 02 June 2008 21:26, Peter Dimov wrote:

> The number of unready dependencies can be lower than N*P, but let's
> ignore that for now.

The big O means I don't care about constant factors :)

> > Then it will take O(M+N) to go through all the method requests and
> > execute the M ready tasks.  That results in O(N*P/M+1) per method
> > request executed.
>
> I think that going through the method requests takes (N+M)*P, because of
> the "all dependencies are ready" part.

I think some kind of optimization could prevent having to check all P of
each method request's dependencies at that stage, like if the earlier pass
through the dependencies incremented a count in a ready dependency's
associated method request.  Then only the count's value would need to be
checked.

> (I think that this is also true
> for your scheduler.)

Oh, the scheduler I implemented in libpoet is not optimal.  But with, for
example, a completion callback on futures, you could in principle
implement an O(P) scheduler.  P future complete signals to a method
request make it generate its ready signal, then the slot connected to the
method request's ready signal (which might have had an iterator to the
method request's location in a pending requests list bound to it when it
was connected) moves the request from the pending requests list into a
ready requests list.

So my goal is to make waiting with future_selector in this scenario be
O(P).

> So we have (2*N+M)*P total, or (2*N/M+1)*P per
> executed request.
>
> > Having M in the denominator is nice, as you would expect it to scale
> > with N,
> > and it means the scheduler probably won't break down under heavy load.
> >
> > On the downside, in a common case I'd expect M to be small (between 0
> > and 1),
> > with performance concerns about the scheduler starting to come into
> > play as M
> > approaches 1.
>
> If the scheduler performance is insufficient, it will not be able to
> maintain a low M. If it does manage to maintain a low M, then its
> performance is sufficient.
>
> Stated differently, in the additional N*P time taken by the first loop,
> some of the futures will become ready that wouldn't otherwise, and the
> second loop will execute more tasks, automatically diluting the
> overhead.
I'm not quite sure what you're driving at with the above 2 paragraphs.  Are
you satisfied with your pseudocode scheduler's performance, or no?


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

attachment0 (196 bytes) Download Attachment

Re: Updated version of futures library

by Frank Mori Hess-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 02 June 2008 20:56, Frank Mori Hess wrote:
> Okay, lets try ignoring the wait_for_any call.  Say you have on average
> N method requests just sitting in your scheduler that are not ready, and
> have on average M method requests that are ready to execute, and each
> method request has on average P dependencies (I'd imagine P would
> probably be small).  That means each pass through your pseudocode will
> take O(N*P) to iterate through all the unready dependencies of the
> unready method requests.

Ah, I knew there was some mistake.  At the beginning of the loop, all N+M
method requests are "not ready" as far as the scheduler knows.  So that
adds another O(M*P) dependencies, giving O((N+M)*P) to iterate through
dependencies.

> Then it will take O(M+N) to go through all the
> method requests and execute the M ready tasks.  That results in
> O(N*P/M+1) per method request executed. Having M in the denominator is
> nice, as you would expect it to scale with N, and it means the scheduler
> probably won't break down under heavy load.

So we end up with O((N/M + 1)*P).


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

attachment0 (196 bytes) Download Attachment

Re: Updated version of futures library

by Peter Dimov-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Frank Mori Hess:

> I'm not quite sure what you're driving at with the above 2 paragraphs.
> Are you satisfied with your pseudocode scheduler's performance, or no?

I'm not sure, as it's hard to measure the performance of a pseudocode. :-)

To see how it fares, we need a benchmark based on a workload that is
representative for your intended use cases... one that cannot be trivially
rewritten to perform better using an alternative concurrency model. ;-)

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
< Prev | 1 - 2 - 3 | Next >