STM and Spinning

View: New views
9 Messages — Rating Filter:   Alert me  

STM and Spinning

by alarmnummer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Guys,

I'm working on a software transactional memory implementation and one
of the things I want to pick up is spinning and I'm looking for
advice/ideas.

With my main STM implementation if can happen that a load is done for
a specific version (the data loaded should have a version equal or
smaller than the load version) even though the write has not happened
yet. When the load detects that it can't determine if the version the
transaction is looking for has been written, it aborts the transaction
and the transaction is retried. If this problem is not detected, the
STM could suffer from isolation problems (not seeing changes made by
transactions completed before it) and we don't want that. With
databases this particular problem is called the lost update.

The problem of aborting is that a lot of work has been executed for
nothing. To lower the chance that a transaction aborts with this
failure, I want to add spinning: just keep repeat reading until the
load can determine if the current version is the correct one (or not).
Since this ambiguous period is very short, spinning could be a light
weight solution.

The simplest approach that comes to mind is some form of bounded
spinning. But what kind of value should be used for the number of
iterations? 10, 100, 1000?

Or should it be more flexible like adaptive spinning used on the
intrinsic lock? If yes, how to design such a component? And how to
prevent it causing overhead?

And is there no better way? What about the (deprecated) Thread.yield
method? The advantage is that other transactions have more cpu-room,
so are more likely to complete sooner. And the sooner other
transactions complete, the sooner the load can complete.

ps:
using a lock in combination with a wait set is to expensive here.
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Concurrent Puts on HashMap on different key sets.

by Rohit Reja-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

Can I do concurrent puts on a HashMap safely if different threads operate on different key sets? What kind of issues can I expect to encounter?

Thanks,
Rohit


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Concurrent Puts on HashMap on different key sets.

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

No, you cannot. HashMap is not at all thread-safe. Examples of what might go wrong, not in any way exhaustive: Two keys from different key sets might hash to the same bucket, or the bucket array might be resized in one thread while you're putting a value in another.

Just use ConcurrentHashMap.

--tim

On Fri, Aug 28, 2009 at 9:32 AM, Rohit Reja <rreja2000@...> wrote:
Hello,

Can I do concurrent puts on a HashMap safely if different threads operate on different key sets? What kind of issues can I expect to encounter?

Thanks,
Rohit


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Parent Message unknown Re: (no subject)

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Did you mean to post this question to concurrency-interest?

--tim

On Fri, Aug 28, 2009 at 11:39 AM, George Kovoor <youhumbleme@...> wrote:

Hi,
I would like to know why we create 8 times more tasks than the number of worker threads in Parallel Array, apart from exploiting work-stealing technique does it provide any other benefits. Is there any results that can prove selection of 8 times more task would be beneficial .

Thanks,
George.


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Concurrent Puts on HashMap on different key sets.

by Elias Ross :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Fri, Aug 28, 2009 at 7:07 AM, Tim Peierls <tim@...> wrote:
No, you cannot. HashMap is not at all thread-safe. Examples of what might go wrong, not in any way exhaustive: Two keys from different key sets might hash to the same bucket, or the bucket array might be resized in one thread while you're putting a value in another.

I have also experienced one thread hanging in an infinite loop on a put. An infinite loop is probably the worst possible thing to happen to your application, as it not only hangs one thread but it will use most of the CPU resources.

java.util.Hashtable is not safe to iterate over without locking. A college said that it was, since you won't see ConcurrentModificationException thrown from the Enumerator when iterating over the keys and values, but I can't imagine that it is.


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Parent Message unknown Re: (no subject)

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I personally don't know the answer to your question, but I'm sure
someone in the concurrency-interest mailing list could give some kind
of answer.

Also, check the source code for ParallelArray to see if there any
implementation notes that might help.

--tim

On 8/28/09, George Kovoor <youhumbleme@...> wrote:

> I am sorry, but if you get time would you please consider answering my
> question, I will be extremely grateful to you, this is for my MSc
> dissertation I have been doing an implementation based on Parallel Array so
> need to comment on the choice of selecting the multiplier as eight.
>
> Besides I have tried to post several questions to the group none showed up.I
> will post again.
>
> I sincerely apologies for the inconvenience caused.
>
> Thanks
> George
>
>
> On 28/08/2009 19:01, "Tim Peierls" <tim@...> wrote:
>
>> You aren't posting to a group at all. You are just sending to me, Tim
>> Peierls. I'm an individual, not a mailing list.
>>
>> --tim
>>
>> On 8/28/09, George Kovoor <youhumbleme@...> wrote:
>>> I would really appreciate an answer to my question,I apologies if I had
>>> posted this question to the wrong group.  Parallel Array I am referring
>>> to
>>> is in extra 166y  package. Please advise me if I need to post my question
>>> in
>>> any other group.
>>> George.
>>>
>>>
>>> On 28/08/2009 17:59, "Tim Peierls" <tim@...> wrote:
>>>
>>>> Did you mean to post this question to concurrency-interest?
>>>>
>>>> --tim
>>>>
>>>> On Fri, Aug 28, 2009 at 11:39 AM, George Kovoor
>>>> <youhumbleme@...>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> I would like to know why we create 8 times more tasks than the number
>>>>> of
>>>>> worker threads in Parallel Array, apart from exploiting work-stealing
>>>>> technique does it provide any other benefits. Is there any results that
>>>>> can
>>>>> prove selection of 8 times more task would be beneficial .
>>>>>
>>>>> Thanks,
>>>>> George.
>>>>
>>>>
>>>
>>>
>
>
>
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Automating FJ task granularity control

by Doug Lea :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

George Kovoor <youhumbleme@...> wrote:

[I think you may have initially sent this to the list
administration address (which discards such things).
It seems not have been posted.]

> I would like to know why we create 8 times more tasks than the number of
> worker threads in Parallel Array,

Generating approximately 8 * #workers leaf tasks is one of a few
ways to balance expected mean throughput versus its expected
variance. Since this issue is likely to come up more often
as more people use FJ, here's a rundown of some of the
tradeoffs involved in automating granularity control.

In array-based processing, if you knew that all subdivided tasks
took exactly the same time to execute, that all
CPUS/cores/workers were continuously available (and not busy in
unrelated user or system processing) and that there were no
time costs in getting subtasks to the workers or joining them
upon completion, you could statically partition, generating only
one leaf task per worker (i.e., with each processing a subrange
of #elements/#workers). But in production settings, none of
these ideal assumptions are likely to hold.

Generating more (finer-granularity) tasks enables better load
balancing across violations of ideal assumptions, at the cost of
greater task overhead -- using extra tasks decreases expected
variance but also decreases expected mean throughput.  Within
reasonable bounds, task overhead is relatively low in FJ, and
the impact on throughput is less than the impact on variance.
So, using more tasks pays off as long as you stay within small
constant multiples of ideal loads, like 4, 8, or 16.  (Using
powers of two matches up with divide-and-conquer partitioning.)
On the other hand, using large multiples, or at the extreme
creating as many tasks as there are elements, will encounter
non-linearities in overhead for large arrays, mainly because of
the resulting need for more "full" garbage collections (which
among other effects may entail stopping all threads).

An alternative approach to automating decomposition thresholds is
queue-sensing.  It requires a bit more dynamic overhead but
tends to further reduce expected variance, and applies even to
those problems for which you do not know ahead of time how much
total work will be encountered.  The queue-sensing approach
starts off with three observations:

1. All other things being equal, throughput decreases with the
total time any worker is idle.

2. Workers are idle if they have no tasks of their own and
cannot steal any.

3. In a divide-and-conquer computation, initially and again upon
completion, only one worker has any tasks.

Given only these, you'd maximize throughput by ensuring that
those workers that do have any work generate enough subtasks for
others to steal. If that were the only constraint, then you
would decompose each task as finely as possible, providing the
maximal number to steal.  However, using more tasks takes more
time (and space). At the extreme, if task overhead costs were
great enough, then each worker should maintain on average,
only 1/#workers tasks (i.e., less than one), making available
only one stealable task in the entire pool.

So the optimum number of tasks that each worker should try to
make available for others to steal must lie between 1/#workers
and "as many as possible".  To narrow down, you could try
capturing the various non-ideal progress and overhead properties
mentioned above using probabilistic models, including those
accounting for the fact that workers cannot exactly maintain any
given target number of available tasks, but can only decide to
generate a task or not at particular points of their
execution. If you try modeling this, and/or perform systematic
empirical comparisons, the best answer is normally to choose a
small constant target value. Choosing 2, 3, or 4 more tasks than
there are workers that might try to steal them balances expected
mean and variance.  You can prove this more rigorously for
particular models and mean-vs-variance criteria, but this
doesn't seem very useful. The assumptions needed to model
program contexts are themselves too varied and messy to take the
resulting models very seriously.  The main observation is that
using a small fixed constant suffices across a range of such
choices.  So, in divide-and-conquer programs where you have no
further information, choosing to subdivide if
getSurplusQueuedTaskCount is greater than a small constant value
(like 3), is as good a criterion as any I know. (Although it
does seem likely that other good approximation strategies will
emerge as well, including those that take placement and memory
locality into account, which may in turn require future
enhancements in the FJ framework to efficiently support.)

On the other hand, while ForkJoinTask.getSurplusQueuedTaskCount
is pretty cheap (in part because it internally attempts
to only approximate reality), it is a bit
more expensive than just pre-determining cutoffs.  For
array-based tasks, where you do know max total work (via
#elements), it is a close call in practice whether to use
queue-sensing vs the constant-multiple-of-ideal strategy.  The
ParallelArray support classes have at various times used both,
or combinations of them, with little observable impact.

Analogous techniques apply to non-divide-and-conquer FJ
computations. For an application to graph algorithms (spanning
trees etc), see the paper we wrote about batching/aggregation
for a version of FJ underlying X10, at
   http://gee.cs.oswego.edu/dl/papers/icpp08.pdf

-Doug

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Concurrent Puts on HashMap on different key sets.

by JamesGan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The worst thing in my mind is data race. If it happens, your customer might get wrong result without any prompt. It takes days to find such kind of bugs without an adequate tool.

On Sat, Aug 29, 2009 at 1:57 AM, Elias Ross <genman@...> wrote:

On Fri, Aug 28, 2009 at 7:07 AM, Tim Peierls <tim@...> wrote:
No, you cannot. HashMap is not at all thread-safe. Examples of what might go wrong, not in any way exhaustive: Two keys from different key sets might hash to the same bucket, or the bucket array might be resized in one thread while you're putting a value in another.

I have also experienced one thread hanging in an infinite loop on a put. An infinite loop is probably the worst possible thing to happen to your application, as it not only hangs one thread but it will use most of the CPU resources.

java.util.Hashtable is not safe to iterate over without locking. A college said that it was, since you won't see ConcurrentModificationException thrown from the Enumerator when iterating over the keys and values, but I can't imagine that it is.


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




--
Best Regards
James Gan
Current Project: Concurrent Building Block at http://amino-cbbs.sourceforge.net/
Blog: http://ganzhi.blogspot.com

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: STM and Spinning

by jason marshall-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think in one of the Peyton-Jones interview videos, (if not in one of their papers), he mentions another option, which is to track what read operations lead to the abort, and not to restart the transaction until a commit succeeds on one of those fields.



On Sun, Aug 23, 2009 at 1:50 PM, Peter Veentjer <alarmnummer@...> wrote:
Hi Guys,

I'm working on a software transactional memory implementation and one
of the things I want to pick up is spinning and I'm looking for
advice/ideas.

With my main STM implementation if can happen that a load is done for
a specific version (the data loaded should have a version equal or
smaller than the load version) even though the write has not happened
yet. When the load detects that it can't determine if the version the
transaction is looking for has been written, it aborts the transaction
and the transaction is retried. If this problem is not detected, the
STM could suffer from isolation problems (not seeing changes made by
transactions completed before it) and we don't want that. With
databases this particular problem is called the lost update.

The problem of aborting is that a lot of work has been executed for
nothing. To lower the chance that a transaction aborts with this
failure, I want to add spinning: just keep repeat reading until the
load can determine if the current version is the correct one (or not).
Since this ambiguous period is very short, spinning could be a light
weight solution.

The simplest approach that comes to mind is some form of bounded
spinning. But what kind of value should be used for the number of
iterations? 10, 100, 1000?

Or should it be more flexible like adaptive spinning used on the
intrinsic lock? If yes, how to design such a component? And how to
prevent it causing overhead?

And is there no better way? What about the (deprecated) Thread.yield
method? The advantage is that other transactions have more cpu-room,
so are more likely to complete sooner. And the sooner other
transactions complete, the sooner the load can complete.

ps:
using a lock in combination with a wait set is to expensive here.
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
- Jason

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest