sx locks and memory barriers

View: New views
19 Messages — Rating Filter:   Alert me  

sx locks and memory barriers

by Fabio Checconi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,
  looking at sys/sx.h I have some troubles understanding this comment:

 * A note about memory barriers.  Exclusive locks need to use the same
 * memory barriers as mutexes: _acq when acquiring an exclusive lock
 * and _rel when releasing an exclusive lock.  On the other side,
 * shared lock needs to use an _acq barrier when acquiring the lock
 * but, since they don't update any locked data, no memory barrier is
 * needed when releasing a shared lock.

In particular, I'm not understanding what prevents the following sequence
from happening:

CPU A CPU B

sx_slock(&data->lock);

sx_sunlock(&data->lock);

/* reordered after the unlock
   by the cpu */
if (data->buffer)
                                        sx_xlock(&data->lock);
                                        free(data->buffer);
                                        data->buffer = NULL;
                                        sx_xunlock(&data->lock);

        a = *data->buffer;

IOW, even if readers do not modify the data protected by the lock,
without a release barrier a memory access may leak past the unlock (as
the cpu won't notice any dependency between the unlock and the fetch,
feeling free to reorder them), thus potentially racing with an exclusive
writer accessing the data.

On architectures where atomic ops serialize memory accesses this would
never happen, otherwise the sequence above seems possible; am I missing
something?
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Robert Noland :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2009-09-25 at 00:49 +0200, Fabio Checconi wrote:

> Hi all,
>   looking at sys/sx.h I have some troubles understanding this comment:
>
>  * A note about memory barriers.  Exclusive locks need to use the same
>  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>  * and _rel when releasing an exclusive lock.  On the other side,
>  * shared lock needs to use an _acq barrier when acquiring the lock
>  * but, since they don't update any locked data, no memory barrier is
>  * needed when releasing a shared lock.
>
> In particular, I'm not understanding what prevents the following sequence
> from happening:
>
> CPU A CPU B
>
> sx_slock(&data->lock);
>
> sx_sunlock(&data->lock);
>
> /* reordered after the unlock
>    by the cpu */
> if (data->buffer)
> sx_xlock(&data->lock);
> free(data->buffer);
> data->buffer = NULL;
> sx_xunlock(&data->lock);
>
> a = *data->buffer;
>
> IOW, even if readers do not modify the data protected by the lock,
> without a release barrier a memory access may leak past the unlock (as
> the cpu won't notice any dependency between the unlock and the fetch,
> feeling free to reorder them), thus potentially racing with an exclusive
> writer accessing the data.

Maybe I am missing something suttle, but shouldn't the shared lock be
held for all data access if you want to guarantee sanity?  Meaning, if
you are accessing data->* without any locks held, all bets are off.

robert.

> On architectures where atomic ops serialize memory accesses this would
> never happen, otherwise the sequence above seems possible; am I missing
> something?
> _______________________________________________
> freebsd-hackers@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."
--
Robert Noland <rnoland@...>
FreeBSD

_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Ryan Stone-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The code that Fabio proposes looks like this:

sx_slock(&data->lock);
if (data->buffer)
    a = *data->buffer;
sx_sunlock(&data->lock);


This point is that without a memory barrier on the unlock, the CPU is
free to reorder the instructions into the order is his message.
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Robert Noland :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2009-09-25 at 09:30 -0400, Ryan Stone wrote:

> The code that Fabio proposes looks like this:
>
> sx_slock(&data->lock);
> if (data->buffer)
>     a = *data->buffer;
> sx_sunlock(&data->lock);
>
>
> This point is that without a memory barrier on the unlock, the CPU is
> free to reorder the instructions into the order is his message.

Ok, then I will sit back and wait for someone with more clue to
respond...

robert.

--
Robert Noland <rnoland@...>
FreeBSD

_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/25 Fabio Checconi <fabio@...>:

> Hi all,
>  looking at sys/sx.h I have some troubles understanding this comment:
>
>  * A note about memory barriers.  Exclusive locks need to use the same
>  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>  * and _rel when releasing an exclusive lock.  On the other side,
>  * shared lock needs to use an _acq barrier when acquiring the lock
>  * but, since they don't update any locked data, no memory barrier is
>  * needed when releasing a shared lock.
>
> In particular, I'm not understanding what prevents the following sequence
> from happening:
>
> CPU A                                   CPU B
>
> sx_slock(&data->lock);
>
> sx_sunlock(&data->lock);
>
> /* reordered after the unlock
>   by the cpu */
> if (data->buffer)
>                                        sx_xlock(&data->lock);
>                                        free(data->buffer);
>                                        data->buffer = NULL;
>                                        sx_xunlock(&data->lock);
>
>        a = *data->buffer;
>
> IOW, even if readers do not modify the data protected by the lock,
> without a release barrier a memory access may leak past the unlock (as
> the cpu won't notice any dependency between the unlock and the fetch,
> feeling free to reorder them), thus potentially racing with an exclusive
> writer accessing the data.
>
> On architectures where atomic ops serialize memory accesses this would
> never happen, otherwise the sequence above seems possible; am I missing
> something?

I think your concerns are right, possibly we need this patch:
http://www.freebsd.org/~attilio/sxrw_unlockb.diff

However speaking with John we agreed possibly there is a more serious breakage.
Possibly, memory barriers would also require to ensure the compiler to
not reorder the operation, while right now, in FreeBSD, they just take
care of the reordering from the architecture perspective.
The only way I'm aware of GCC offers that is to clobber memory.
I will provide a patch that address this soon, hoping that GCC will be
smart enough to not overhead too much the memory clobbering but just
try to understand what's our purpose and servers it (I will try to
compare code generated before and after the patch at least for tier-1
architectures).

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Max Laier :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:

> 2009/9/25 Fabio Checconi <fabio@...>:
> > Hi all,
> >  looking at sys/sx.h I have some troubles understanding this comment:
> >
> >  * A note about memory barriers.  Exclusive locks need to use the same
> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >  * and _rel when releasing an exclusive lock.  On the other side,
> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >  * but, since they don't update any locked data, no memory barrier is
> >  * needed when releasing a shared lock.
> >
> > In particular, I'm not understanding what prevents the following sequence
> > from happening:
> >
> > CPU A                                   CPU B
> >
> > sx_slock(&data->lock);
> >
> > sx_sunlock(&data->lock);
> >
> > /* reordered after the unlock
> >   by the cpu */
> > if (data->buffer)
> >                                        sx_xlock(&data->lock);
> >                                        free(data->buffer);
> >                                        data->buffer = NULL;
> >                                        sx_xunlock(&data->lock);
> >
> >        a = *data->buffer;
> >
> > IOW, even if readers do not modify the data protected by the lock,
> > without a release barrier a memory access may leak past the unlock (as
> > the cpu won't notice any dependency between the unlock and the fetch,
> > feeling free to reorder them), thus potentially racing with an exclusive
> > writer accessing the data.
> >
> > On architectures where atomic ops serialize memory accesses this would
> > never happen, otherwise the sequence above seems possible; am I missing
> > something?
>
> I think your concerns are right, possibly we need this patch:
> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>
> However speaking with John we agreed possibly there is a more serious
>  breakage. Possibly, memory barriers would also require to ensure the
>  compiler to not reorder the operation, while right now, in FreeBSD, they
>  just take care of the reordering from the architecture perspective.
> The only way I'm aware of GCC offers that is to clobber memory.
> I will provide a patch that address this soon, hoping that GCC will be
> smart enough to not overhead too much the memory clobbering but just
> try to understand what's our purpose and servers it (I will try to
> compare code generated before and after the patch at least for tier-1
> architectures).

Does GCC really reorder accesses to volatile objects? The C Standard seems to
object:

5.1.2.3 - 2
Accessing a volatile object, modifying an object, modifying a file, or calling
a function that does any of those operations are all side effects,11) which
are changes in the state of the execution environment. Evaluation of an
expression may produce side effects. At certain specified points in the
execution sequence called sequence points, all side effects of previous
evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place. (A summary of the sequence points is given in annex
C.)

I might be reading this wrong, of course.

--
/"\  Best regards,                      | mlaier@...
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:

> 2009/9/25 Fabio Checconi <fabio@...>:
> > Hi all,
> >  looking at sys/sx.h I have some troubles understanding this comment:
> >
> >  * A note about memory barriers.  Exclusive locks need to use the same
> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >  * and _rel when releasing an exclusive lock.  On the other side,
> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >  * but, since they don't update any locked data, no memory barrier is
> >  * needed when releasing a shared lock.
> >
> > In particular, I'm not understanding what prevents the following sequence
> > from happening:
> >
> > CPU A                                   CPU B
> >
> > sx_slock(&data->lock);
> >
> > sx_sunlock(&data->lock);
> >
> > /* reordered after the unlock
> >   by the cpu */
> > if (data->buffer)
> >                                        sx_xlock(&data->lock);
> >                                        free(data->buffer);
> >                                        data->buffer = NULL;
> >                                        sx_xunlock(&data->lock);
> >
> >        a = *data->buffer;
> >
> > IOW, even if readers do not modify the data protected by the lock,
> > without a release barrier a memory access may leak past the unlock (as
> > the cpu won't notice any dependency between the unlock and the fetch,
> > feeling free to reorder them), thus potentially racing with an exclusive
> > writer accessing the data.
> >
> > On architectures where atomic ops serialize memory accesses this would
> > never happen, otherwise the sequence above seems possible; am I missing
> > something?
>
> I think your concerns are right, possibly we need this patch:
> http://www.freebsd.org/~attilio/sxrw_unlockb.diff

Actually, since you are only worried about reads, I think this should be
an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
should prefer the cheapest barrier that provides what we need.  You would
still need to keep some language about the memory barriers since using "acq"
for shared unlocking is different from exclusive unlocking.

I can't recall why I thought this was ok originally, sadly my p4 logs didn't
include the reasoning either. :-/

> However speaking with John we agreed possibly there is a more serious
breakage.
> Possibly, memory barriers would also require to ensure the compiler to
> not reorder the operation, while right now, in FreeBSD, they just take
> care of the reordering from the architecture perspective.
> The only way I'm aware of GCC offers that is to clobber memory.
> I will provide a patch that address this soon, hoping that GCC will be
> smart enough to not overhead too much the memory clobbering but just
> try to understand what's our purpose and servers it (I will try to
> compare code generated before and after the patch at least for tier-1
> architectures).

The memory clobber is quite heavyweight.  It actually forces gcc to forget any
cached memory items in registers and reload everything again.  What I really
want is just a barrier to tell GCC to not reorder things.  If I read a value
in the program before acquiring a lock it is in theory fine to keep that
cached across the barrier.  However, there isn't a way to do this sort of
thing with GCC currently.

--
John Baldwin
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/29 John Baldwin <jhb@...>:

> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
>> 2009/9/25 Fabio Checconi <fabio@...>:
>> > Hi all,
>> >  looking at sys/sx.h I have some troubles understanding this comment:
>> >
>> >  * A note about memory barriers.  Exclusive locks need to use the same
>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>> >  * and _rel when releasing an exclusive lock.  On the other side,
>> >  * shared lock needs to use an _acq barrier when acquiring the lock
>> >  * but, since they don't update any locked data, no memory barrier is
>> >  * needed when releasing a shared lock.
>> >
>> > In particular, I'm not understanding what prevents the following sequence
>> > from happening:
>> >
>> > CPU A                                   CPU B
>> >
>> > sx_slock(&data->lock);
>> >
>> > sx_sunlock(&data->lock);
>> >
>> > /* reordered after the unlock
>> >   by the cpu */
>> > if (data->buffer)
>> >                                        sx_xlock(&data->lock);
>> >                                        free(data->buffer);
>> >                                        data->buffer = NULL;
>> >                                        sx_xunlock(&data->lock);
>> >
>> >        a = *data->buffer;
>> >
>> > IOW, even if readers do not modify the data protected by the lock,
>> > without a release barrier a memory access may leak past the unlock (as
>> > the cpu won't notice any dependency between the unlock and the fetch,
>> > feeling free to reorder them), thus potentially racing with an exclusive
>> > writer accessing the data.
>> >
>> > On architectures where atomic ops serialize memory accesses this would
>> > never happen, otherwise the sequence above seems possible; am I missing
>> > something?
>>
>> I think your concerns are right, possibly we need this patch:
>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>
> Actually, since you are only worried about reads, I think this should be
> an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
> should prefer the cheapest barrier that provides what we need.  You would
> still need to keep some language about the memory barriers since using "acq"
> for shared unlocking is different from exclusive unlocking.

Actually, I don't think that an acq barrier ensures enough protection
against the reordering of 'earlier' operation thus not fixing the
architecture ordering problem reported by Fabio. Also, I don't think
we just have to care about reads (or  I don't understand what you mean
here).
However, I'm not even sure that we have faster read barriers than the
write one. As long as it should be true in theory I don't think that's
what happen in practice.

> The memory clobber is quite heavyweight.  It actually forces gcc to forget any
> cached memory items in registers and reload everything again.  What I really
> want is just a barrier to tell GCC to not reorder things.  If I read a value
> in the program before acquiring a lock it is in theory fine to keep that
> cached across the barrier.  However, there isn't a way to do this sort of
> thing with GCC currently.

Yes, that's the only tool we have right now with GCC. I will try to
look for another way, but it sounds difficult to discover.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Marius Nünnerich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Sep 29, 2009 at 21:15, Attilio Rao <attilio@...> wrote:

> 2009/9/29 John Baldwin <jhb@...>:
>> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
>>> 2009/9/25 Fabio Checconi <fabio@...>:
>>> > Hi all,
>>> >  looking at sys/sx.h I have some troubles understanding this comment:
>>> >
>>> >  * A note about memory barriers.  Exclusive locks need to use the same
>>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>>> >  * and _rel when releasing an exclusive lock.  On the other side,
>>> >  * shared lock needs to use an _acq barrier when acquiring the lock
>>> >  * but, since they don't update any locked data, no memory barrier is
>>> >  * needed when releasing a shared lock.
>>> >
>>> > In particular, I'm not understanding what prevents the following sequence
>>> > from happening:
>>> >
>>> > CPU A                                   CPU B
>>> >
>>> > sx_slock(&data->lock);
>>> >
>>> > sx_sunlock(&data->lock);
>>> >
>>> > /* reordered after the unlock
>>> >   by the cpu */
>>> > if (data->buffer)
>>> >                                        sx_xlock(&data->lock);
>>> >                                        free(data->buffer);
>>> >                                        data->buffer = NULL;
>>> >                                        sx_xunlock(&data->lock);
>>> >
>>> >        a = *data->buffer;
>>> >
>>> > IOW, even if readers do not modify the data protected by the lock,
>>> > without a release barrier a memory access may leak past the unlock (as
>>> > the cpu won't notice any dependency between the unlock and the fetch,
>>> > feeling free to reorder them), thus potentially racing with an exclusive
>>> > writer accessing the data.
>>> >
>>> > On architectures where atomic ops serialize memory accesses this would
>>> > never happen, otherwise the sequence above seems possible; am I missing
>>> > something?
>>>
>>> I think your concerns are right, possibly we need this patch:
>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>>
>> Actually, since you are only worried about reads, I think this should be
>> an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
>> should prefer the cheapest barrier that provides what we need.  You would
>> still need to keep some language about the memory barriers since using "acq"
>> for shared unlocking is different from exclusive unlocking.
>
> Actually, I don't think that an acq barrier ensures enough protection
> against the reordering of 'earlier' operation thus not fixing the
> architecture ordering problem reported by Fabio. Also, I don't think
> we just have to care about reads (or  I don't understand what you mean
> here).
> However, I'm not even sure that we have faster read barriers than the
> write one. As long as it should be true in theory I don't think that's
> what happen in practice.
>
>> The memory clobber is quite heavyweight.  It actually forces gcc to forget any
>> cached memory items in registers and reload everything again.  What I really
>> want is just a barrier to tell GCC to not reorder things.  If I read a value
>> in the program before acquiring a lock it is in theory fine to keep that
>> cached across the barrier.  However, there isn't a way to do this sort of
>> thing with GCC currently.
>
> Yes, that's the only tool we have right now with GCC. I will try to
> look for another way, but it sounds difficult to discover.

Even if we would have a mechanism to tell GCC to not reorder the
instructions the CPU itself would still be free to reorder if there
are no barriers. Or am I missing something?
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote:

> 2009/9/29 John Baldwin <jhb@...>:
> > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
> >> 2009/9/25 Fabio Checconi <fabio@...>:
> >> > Hi all,
> >> >  looking at sys/sx.h I have some troubles understanding this comment:
> >> >
> >> >  * A note about memory barriers.  Exclusive locks need to use the same
> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >> >  * and _rel when releasing an exclusive lock.  On the other side,
> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >> >  * but, since they don't update any locked data, no memory barrier is
> >> >  * needed when releasing a shared lock.
> >> >
> >> > In particular, I'm not understanding what prevents the following sequence
> >> > from happening:
> >> >
> >> > CPU A                                   CPU B
> >> >
> >> > sx_slock(&data->lock);
> >> >
> >> > sx_sunlock(&data->lock);
> >> >
> >> > /* reordered after the unlock
> >> >   by the cpu */
> >> > if (data->buffer)
> >> >                                        sx_xlock(&data->lock);
> >> >                                        free(data->buffer);
> >> >                                        data->buffer = NULL;
> >> >                                        sx_xunlock(&data->lock);
> >> >
> >> >        a = *data->buffer;
> >> >
> >> > IOW, even if readers do not modify the data protected by the lock,
> >> > without a release barrier a memory access may leak past the unlock (as
> >> > the cpu won't notice any dependency between the unlock and the fetch,
> >> > feeling free to reorder them), thus potentially racing with an exclusive
> >> > writer accessing the data.
> >> >
> >> > On architectures where atomic ops serialize memory accesses this would
> >> > never happen, otherwise the sequence above seems possible; am I missing
> >> > something?
> >>
> >> I think your concerns are right, possibly we need this patch:
> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
> >
> > Actually, since you are only worried about reads, I think this should be
> > an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
> > should prefer the cheapest barrier that provides what we need.  You would
> > still need to keep some language about the memory barriers since using "acq"
> > for shared unlocking is different from exclusive unlocking.
>
> Actually, I don't think that an acq barrier ensures enough protection
> against the reordering of 'earlier' operation thus not fixing the
> architecture ordering problem reported by Fabio. Also, I don't think
> we just have to care about reads (or  I don't understand what you mean
> here).

Hmmm, it might not on certain archs.  It does on x86 (i.e. an lfence would
work here on x86), but probably not on ia64 and sparc64.  Also, we certainly
only care about reads.  A read/share lock cannot resolve any races where the
lock holder is writing data, it can only ensure that the lock holder can
safely read shared data without the data changing out from underneath it.

> However, I'm not even sure that we have faster read barriers than the
> write one. As long as it should be true in theory I don't think that's
> what happen in practice.

bde@ found that sfence was generally much more expensive than lfence on x86.
However, since x86 guarantees the order of all stores we don't actually need
to use sfence at all on x86 anyway.

--
John Baldwin
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/29 John Baldwin <jhb@...>:

> On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote:
>> 2009/9/29 John Baldwin <jhb@...>:
>> > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
>> >> 2009/9/25 Fabio Checconi <fabio@...>:
>> >> > Hi all,
>> >> >  looking at sys/sx.h I have some troubles understanding this comment:
>> >> >
>> >> >  * A note about memory barriers.  Exclusive locks need to use the same
>> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>> >> >  * and _rel when releasing an exclusive lock.  On the other side,
>> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
>> >> >  * but, since they don't update any locked data, no memory barrier is
>> >> >  * needed when releasing a shared lock.
>> >> >
>> >> > In particular, I'm not understanding what prevents the following sequence
>> >> > from happening:
>> >> >
>> >> > CPU A                                   CPU B
>> >> >
>> >> > sx_slock(&data->lock);
>> >> >
>> >> > sx_sunlock(&data->lock);
>> >> >
>> >> > /* reordered after the unlock
>> >> >   by the cpu */
>> >> > if (data->buffer)
>> >> >                                        sx_xlock(&data->lock);
>> >> >                                        free(data->buffer);
>> >> >                                        data->buffer = NULL;
>> >> >                                        sx_xunlock(&data->lock);
>> >> >
>> >> >        a = *data->buffer;
>> >> >
>> >> > IOW, even if readers do not modify the data protected by the lock,
>> >> > without a release barrier a memory access may leak past the unlock (as
>> >> > the cpu won't notice any dependency between the unlock and the fetch,
>> >> > feeling free to reorder them), thus potentially racing with an exclusive
>> >> > writer accessing the data.
>> >> >
>> >> > On architectures where atomic ops serialize memory accesses this would
>> >> > never happen, otherwise the sequence above seems possible; am I missing
>> >> > something?
>> >>
>> >> I think your concerns are right, possibly we need this patch:
>> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>> >
>> > Actually, since you are only worried about reads, I think this should be
>> > an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
>> > should prefer the cheapest barrier that provides what we need.  You would
>> > still need to keep some language about the memory barriers since using "acq"
>> > for shared unlocking is different from exclusive unlocking.
>>
>> Actually, I don't think that an acq barrier ensures enough protection
>> against the reordering of 'earlier' operation thus not fixing the
>> architecture ordering problem reported by Fabio. Also, I don't think
>> we just have to care about reads (or  I don't understand what you mean
>> here).
>
> Hmmm, it might not on certain archs.  It does on x86 (i.e. an lfence would
> work here on x86), but probably not on ia64 and sparc64.  Also, we certainly
> only care about reads.  A read/share lock cannot resolve any races where the
> lock holder is writing data, it can only ensure that the lock holder can
> safely read shared data without the data changing out from underneath it.
>
>> However, I'm not even sure that we have faster read barriers than the
>> write one. As long as it should be true in theory I don't think that's
>> what happen in practice.
>
> bde@ found that sfence was generally much more expensive than lfence on x86.
> However, since x86 guarantees the order of all stores we don't actually need
> to use sfence at all on x86 anyway.

Yes, x86 guarantees that the stores are strong ordered so I don't
think acq to be faster than rel.
Can I assume the patch I already sent as reviewed by you and commit then, right?

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/29 Max Laier <max@...>:

> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
>> 2009/9/25 Fabio Checconi <fabio@...>:
>> > Hi all,
>> >  looking at sys/sx.h I have some troubles understanding this comment:
>> >
>> >  * A note about memory barriers.  Exclusive locks need to use the same
>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>> >  * and _rel when releasing an exclusive lock.  On the other side,
>> >  * shared lock needs to use an _acq barrier when acquiring the lock
>> >  * but, since they don't update any locked data, no memory barrier is
>> >  * needed when releasing a shared lock.
>> >
>> > In particular, I'm not understanding what prevents the following sequence
>> > from happening:
>> >
>> > CPU A                                   CPU B
>> >
>> > sx_slock(&data->lock);
>> >
>> > sx_sunlock(&data->lock);
>> >
>> > /* reordered after the unlock
>> >   by the cpu */
>> > if (data->buffer)
>> >                                        sx_xlock(&data->lock);
>> >                                        free(data->buffer);
>> >                                        data->buffer = NULL;
>> >                                        sx_xunlock(&data->lock);
>> >
>> >        a = *data->buffer;
>> >
>> > IOW, even if readers do not modify the data protected by the lock,
>> > without a release barrier a memory access may leak past the unlock (as
>> > the cpu won't notice any dependency between the unlock and the fetch,
>> > feeling free to reorder them), thus potentially racing with an exclusive
>> > writer accessing the data.
>> >
>> > On architectures where atomic ops serialize memory accesses this would
>> > never happen, otherwise the sequence above seems possible; am I missing
>> > something?
>>
>> I think your concerns are right, possibly we need this patch:
>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>>
>> However speaking with John we agreed possibly there is a more serious
>>  breakage. Possibly, memory barriers would also require to ensure the
>>  compiler to not reorder the operation, while right now, in FreeBSD, they
>>  just take care of the reordering from the architecture perspective.
>> The only way I'm aware of GCC offers that is to clobber memory.
>> I will provide a patch that address this soon, hoping that GCC will be
>> smart enough to not overhead too much the memory clobbering but just
>> try to understand what's our purpose and servers it (I will try to
>> compare code generated before and after the patch at least for tier-1
>> architectures).
>
> Does GCC really reorder accesses to volatile objects? The C Standard seems to
> object:
>
> 5.1.2.3 - 2
> Accessing a volatile object, modifying an object, modifying a file, or calling
> a function that does any of those operations are all side effects,11) which
> are changes in the state of the execution environment. Evaluation of an
> expression may produce side effects. At certain specified points in the
> execution sequence called sequence points, all side effects of previous
> evaluations shall be complete and no side effects of subsequent evaluations
> shall have taken place. (A summary of the sequence points is given in annex
> C.)

Very interesting.
I was thinking about the other operating systems which basically do
'memory clobbering' for ensuring a compiler barrier, but actually they
often forsee such a barrier without the conjuction of a memory
operand.

I think I will need to speak a bit with a GCC engineer in order to see
what do they implement in regard of volatile operands.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/29 Marius Nünnerich <marius@...>:

> On Tue, Sep 29, 2009 at 21:15, Attilio Rao <attilio@...> wrote:
>> 2009/9/29 John Baldwin <jhb@...>:
>>> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
>>>> 2009/9/25 Fabio Checconi <fabio@...>:
>>>> > Hi all,
>>>> >  looking at sys/sx.h I have some troubles understanding this comment:
>>>> >
>>>> >  * A note about memory barriers.  Exclusive locks need to use the same
>>>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>>>> >  * and _rel when releasing an exclusive lock.  On the other side,
>>>> >  * shared lock needs to use an _acq barrier when acquiring the lock
>>>> >  * but, since they don't update any locked data, no memory barrier is
>>>> >  * needed when releasing a shared lock.
>>>> >
>>>> > In particular, I'm not understanding what prevents the following sequence
>>>> > from happening:
>>>> >
>>>> > CPU A                                   CPU B
>>>> >
>>>> > sx_slock(&data->lock);
>>>> >
>>>> > sx_sunlock(&data->lock);
>>>> >
>>>> > /* reordered after the unlock
>>>> >   by the cpu */
>>>> > if (data->buffer)
>>>> >                                        sx_xlock(&data->lock);
>>>> >                                        free(data->buffer);
>>>> >                                        data->buffer = NULL;
>>>> >                                        sx_xunlock(&data->lock);
>>>> >
>>>> >        a = *data->buffer;
>>>> >
>>>> > IOW, even if readers do not modify the data protected by the lock,
>>>> > without a release barrier a memory access may leak past the unlock (as
>>>> > the cpu won't notice any dependency between the unlock and the fetch,
>>>> > feeling free to reorder them), thus potentially racing with an exclusive
>>>> > writer accessing the data.
>>>> >
>>>> > On architectures where atomic ops serialize memory accesses this would
>>>> > never happen, otherwise the sequence above seems possible; am I missing
>>>> > something?
>>>>
>>>> I think your concerns are right, possibly we need this patch:
>>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>>>
>>> Actually, since you are only worried about reads, I think this should be
>>> an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
>>> should prefer the cheapest barrier that provides what we need.  You would
>>> still need to keep some language about the memory barriers since using "acq"
>>> for shared unlocking is different from exclusive unlocking.
>>
>> Actually, I don't think that an acq barrier ensures enough protection
>> against the reordering of 'earlier' operation thus not fixing the
>> architecture ordering problem reported by Fabio. Also, I don't think
>> we just have to care about reads (or  I don't understand what you mean
>> here).
>> However, I'm not even sure that we have faster read barriers than the
>> write one. As long as it should be true in theory I don't think that's
>> what happen in practice.
>>
>>> The memory clobber is quite heavyweight.  It actually forces gcc to forget any
>>> cached memory items in registers and reload everything again.  What I really
>>> want is just a barrier to tell GCC to not reorder things.  If I read a value
>>> in the program before acquiring a lock it is in theory fine to keep that
>>> cached across the barrier.  However, there isn't a way to do this sort of
>>> thing with GCC currently.
>>
>> Yes, that's the only tool we have right now with GCC. I will try to
>> look for another way, but it sounds difficult to discover.
>
> Even if we would have a mechanism to tell GCC to not reorder the
> instructions the CPU itself would still be free to reorder if there
> are no barriers. Or am I missing something?

Our code already takes care of that case in our barriers.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 29 September 2009 4:26:56 pm Marius Nünnerich wrote:

> On Tue, Sep 29, 2009 at 21:15, Attilio Rao <attilio@...> wrote:
> > 2009/9/29 John Baldwin <jhb@...>:
> >> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
> >>> 2009/9/25 Fabio Checconi <fabio@...>:
> >>> > Hi all,
> >>> >  looking at sys/sx.h I have some troubles understanding this comment:
> >>> >
> >>> >  * A note about memory barriers.  Exclusive locks need to use the same
> >>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >>> >  * and _rel when releasing an exclusive lock.  On the other side,
> >>> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >>> >  * but, since they don't update any locked data, no memory barrier is
> >>> >  * needed when releasing a shared lock.
> >>> >
> >>> > In particular, I'm not understanding what prevents the following sequence
> >>> > from happening:
> >>> >
> >>> > CPU A                                   CPU B
> >>> >
> >>> > sx_slock(&data->lock);
> >>> >
> >>> > sx_sunlock(&data->lock);
> >>> >
> >>> > /* reordered after the unlock
> >>> >   by the cpu */
> >>> > if (data->buffer)
> >>> >                                        sx_xlock(&data->lock);
> >>> >                                        free(data->buffer);
> >>> >                                        data->buffer = NULL;
> >>> >                                        sx_xunlock(&data->lock);
> >>> >
> >>> >        a = *data->buffer;
> >>> >
> >>> > IOW, even if readers do not modify the data protected by the lock,
> >>> > without a release barrier a memory access may leak past the unlock (as
> >>> > the cpu won't notice any dependency between the unlock and the fetch,
> >>> > feeling free to reorder them), thus potentially racing with an exclusive
> >>> > writer accessing the data.
> >>> >
> >>> > On architectures where atomic ops serialize memory accesses this would
> >>> > never happen, otherwise the sequence above seems possible; am I missing
> >>> > something?
> >>>
> >>> I think your concerns are right, possibly we need this patch:
> >>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
> >>
> >> Actually, since you are only worried about reads, I think this should be
> >> an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we
> >> should prefer the cheapest barrier that provides what we need.  You would
> >> still need to keep some language about the memory barriers since using "acq"
> >> for shared unlocking is different from exclusive unlocking.
> >
> > Actually, I don't think that an acq barrier ensures enough protection
> > against the reordering of 'earlier' operation thus not fixing the
> > architecture ordering problem reported by Fabio. Also, I don't think
> > we just have to care about reads (or  I don't understand what you mean
> > here).
> > However, I'm not even sure that we have faster read barriers than the
> > write one. As long as it should be true in theory I don't think that's
> > what happen in practice.
> >
> >> The memory clobber is quite heavyweight.  It actually forces gcc to forget any
> >> cached memory items in registers and reload everything again.  What I really
> >> want is just a barrier to tell GCC to not reorder things.  If I read a value
> >> in the program before acquiring a lock it is in theory fine to keep that
> >> cached across the barrier.  However, there isn't a way to do this sort of
> >> thing with GCC currently.
> >
> > Yes, that's the only tool we have right now with GCC. I will try to
> > look for another way, but it sounds difficult to discover.
>
> Even if we would have a mechanism to tell GCC to not reorder the
> instructions the CPU itself would still be free to reorder if there
> are no barriers. Or am I missing something?

No, the thing to do here for the second part is add "memory" clobbers to the
existing atomic ops with barriers.  It will still require barriers for them to
be enforced.

--
John Baldwin
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:

> 2009/9/29 Max Laier <max@...>:
> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
> >> 2009/9/25 Fabio Checconi <fabio@...>:
> >> > Hi all,
> >> >  looking at sys/sx.h I have some troubles understanding this comment:
> >> >
> >> >  * A note about memory barriers.  Exclusive locks need to use the same
> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >> >  * and _rel when releasing an exclusive lock.  On the other side,
> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >> >  * but, since they don't update any locked data, no memory barrier is
> >> >  * needed when releasing a shared lock.
> >> >
> >> > In particular, I'm not understanding what prevents the following sequence
> >> > from happening:
> >> >
> >> > CPU A                                   CPU B
> >> >
> >> > sx_slock(&data->lock);
> >> >
> >> > sx_sunlock(&data->lock);
> >> >
> >> > /* reordered after the unlock
> >> >   by the cpu */
> >> > if (data->buffer)
> >> >                                        sx_xlock(&data->lock);
> >> >                                        free(data->buffer);
> >> >                                        data->buffer = NULL;
> >> >                                        sx_xunlock(&data->lock);
> >> >
> >> >        a = *data->buffer;
> >> >
> >> > IOW, even if readers do not modify the data protected by the lock,
> >> > without a release barrier a memory access may leak past the unlock (as
> >> > the cpu won't notice any dependency between the unlock and the fetch,
> >> > feeling free to reorder them), thus potentially racing with an exclusive
> >> > writer accessing the data.
> >> >
> >> > On architectures where atomic ops serialize memory accesses this would
> >> > never happen, otherwise the sequence above seems possible; am I missing
> >> > something?
> >>
> >> I think your concerns are right, possibly we need this patch:
> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
> >>
> >> However speaking with John we agreed possibly there is a more serious
> >>  breakage. Possibly, memory barriers would also require to ensure the
> >>  compiler to not reorder the operation, while right now, in FreeBSD, they
> >>  just take care of the reordering from the architecture perspective.
> >> The only way I'm aware of GCC offers that is to clobber memory.
> >> I will provide a patch that address this soon, hoping that GCC will be
> >> smart enough to not overhead too much the memory clobbering but just
> >> try to understand what's our purpose and servers it (I will try to
> >> compare code generated before and after the patch at least for tier-1
> >> architectures).
> >
> > Does GCC really reorder accesses to volatile objects? The C Standard seems to
> > object:
> >
> > 5.1.2.3 - 2
> > Accessing a volatile object, modifying an object, modifying a file, or calling
> > a function that does any of those operations are all side effects,11) which
> > are changes in the state of the execution environment. Evaluation of an
> > expression may produce side effects. At certain specified points in the
> > execution sequence called sequence points, all side effects of previous
> > evaluations shall be complete and no side effects of subsequent evaluations
> > shall have taken place. (A summary of the sequence points is given in annex
> > C.)
>
> Very interesting.
> I was thinking about the other operating systems which basically do
> 'memory clobbering' for ensuring a compiler barrier, but actually they
> often forsee such a barrier without the conjuction of a memory
> operand.
>
> I think I will need to speak a bit with a GCC engineer in order to see
> what do they implement in regard of volatile operands.

GCC can be quite aggressive with reordering even in the face of volatile.  I
was recently doing a hack to export some data from the kernel to userland
that used a spin loop to grab a snapshot of the contents of a structure
similar to the method used in the kernel with the timehands structures.  It
used a volatile structure exposed from the kernel that looked something
like:

struct foo {
        volatile int gen;
        /* other stuff */
};

volatile struct foo *p;

do {
        x = p->gen;
        /* read other stuff */
        y = p->gen;
} while (x != y && x != 0);

GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
I eventually had to add explicit "memory" clobbers to force GCC to not
move the reads of 'gen' around but do them "around" all the other
operations, so that the working code is:

do {
        x = p->gen;
        asm volatile("" ::: "memory");
        /* read other stuff */
        asm volatile("" ::: "memory");
        y = p->gen;
} while (x != y && x != 0);

--
John Baldwin
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/29 John Baldwin <jhb@...>:

> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
>> 2009/9/29 Max Laier <max@...>:
>> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
>> >> 2009/9/25 Fabio Checconi <fabio@...>:
>> >> > Hi all,
>> >> >  looking at sys/sx.h I have some troubles understanding this comment:
>> >> >
>> >> >  * A note about memory barriers.  Exclusive locks need to use the same
>> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>> >> >  * and _rel when releasing an exclusive lock.  On the other side,
>> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
>> >> >  * but, since they don't update any locked data, no memory barrier is
>> >> >  * needed when releasing a shared lock.
>> >> >
>> >> > In particular, I'm not understanding what prevents the following sequence
>> >> > from happening:
>> >> >
>> >> > CPU A                                   CPU B
>> >> >
>> >> > sx_slock(&data->lock);
>> >> >
>> >> > sx_sunlock(&data->lock);
>> >> >
>> >> > /* reordered after the unlock
>> >> >   by the cpu */
>> >> > if (data->buffer)
>> >> >                                        sx_xlock(&data->lock);
>> >> >                                        free(data->buffer);
>> >> >                                        data->buffer = NULL;
>> >> >                                        sx_xunlock(&data->lock);
>> >> >
>> >> >        a = *data->buffer;
>> >> >
>> >> > IOW, even if readers do not modify the data protected by the lock,
>> >> > without a release barrier a memory access may leak past the unlock (as
>> >> > the cpu won't notice any dependency between the unlock and the fetch,
>> >> > feeling free to reorder them), thus potentially racing with an exclusive
>> >> > writer accessing the data.
>> >> >
>> >> > On architectures where atomic ops serialize memory accesses this would
>> >> > never happen, otherwise the sequence above seems possible; am I missing
>> >> > something?
>> >>
>> >> I think your concerns are right, possibly we need this patch:
>> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>> >>
>> >> However speaking with John we agreed possibly there is a more serious
>> >>  breakage. Possibly, memory barriers would also require to ensure the
>> >>  compiler to not reorder the operation, while right now, in FreeBSD, they
>> >>  just take care of the reordering from the architecture perspective.
>> >> The only way I'm aware of GCC offers that is to clobber memory.
>> >> I will provide a patch that address this soon, hoping that GCC will be
>> >> smart enough to not overhead too much the memory clobbering but just
>> >> try to understand what's our purpose and servers it (I will try to
>> >> compare code generated before and after the patch at least for tier-1
>> >> architectures).
>> >
>> > Does GCC really reorder accesses to volatile objects? The C Standard seems to
>> > object:
>> >
>> > 5.1.2.3 - 2
>> > Accessing a volatile object, modifying an object, modifying a file, or calling
>> > a function that does any of those operations are all side effects,11) which
>> > are changes in the state of the execution environment. Evaluation of an
>> > expression may produce side effects. At certain specified points in the
>> > execution sequence called sequence points, all side effects of previous
>> > evaluations shall be complete and no side effects of subsequent evaluations
>> > shall have taken place. (A summary of the sequence points is given in annex
>> > C.)
>>
>> Very interesting.
>> I was thinking about the other operating systems which basically do
>> 'memory clobbering' for ensuring a compiler barrier, but actually they
>> often forsee such a barrier without the conjuction of a memory
>> operand.
>>
>> I think I will need to speak a bit with a GCC engineer in order to see
>> what do they implement in regard of volatile operands.
>
> GCC can be quite aggressive with reordering even in the face of volatile.  I
> was recently doing a hack to export some data from the kernel to userland
> that used a spin loop to grab a snapshot of the contents of a structure
> similar to the method used in the kernel with the timehands structures.  It
> used a volatile structure exposed from the kernel that looked something
> like:
>
> struct foo {
>        volatile int gen;
>        /* other stuff */
> };
>
> volatile struct foo *p;
>
> do {
>        x = p->gen;
>        /* read other stuff */
>        y = p->gen;
> } while (x != y && x != 0);
>
> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
> I eventually had to add explicit "memory" clobbers to force GCC to not
> move the reads of 'gen' around but do them "around" all the other
> operations, so that the working code is:
>
> do {
>        x = p->gen;
>        asm volatile("" ::: "memory");
>        /* read other stuff */
>        asm volatile("" ::: "memory");
>        y = p->gen;
> } while (x != y && x != 0);
>

I see.
So probabilly clobbering memory is the only choice we have right now.
I will try to make a patch which also keeps into account the
possibility to skip it (or define by hand alternative approaches) for
different compilers.
I wonder, specifically, how llvm/clang relies with it.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 29 September 2009 5:39:43 pm Attilio Rao wrote:
> 2009/9/29 John Baldwin <jhb@...>:
> > On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
> >> 2009/9/29 Max Laier <max@...>:
> >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
> >> >> 2009/9/25 Fabio Checconi <fabio@...>:
> >> >> > Hi all,
> >> >> >  looking at sys/sx.h I have some troubles understanding this
comment:
> >> >> >
> >> >> >  * A note about memory barriers.  Exclusive locks need to use the
same
> >> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >> >> >  * and _rel when releasing an exclusive lock.  On the other side,
> >> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >> >> >  * but, since they don't update any locked data, no memory barrier
is
> >> >> >  * needed when releasing a shared lock.
> >> >> >
> >> >> > In particular, I'm not understanding what prevents the following
sequence

> >> >> > from happening:
> >> >> >
> >> >> > CPU A                                   CPU B
> >> >> >
> >> >> > sx_slock(&data->lock);
> >> >> >
> >> >> > sx_sunlock(&data->lock);
> >> >> >
> >> >> > /* reordered after the unlock
> >> >> >   by the cpu */
> >> >> > if (data->buffer)
> >> >> >                                        sx_xlock(&data->lock);
> >> >> >                                        free(data->buffer);
> >> >> >                                        data->buffer = NULL;
> >> >> >                                        sx_xunlock(&data->lock);
> >> >> >
> >> >> >        a = *data->buffer;
> >> >> >
> >> >> > IOW, even if readers do not modify the data protected by the lock,
> >> >> > without a release barrier a memory access may leak past the unlock
(as
> >> >> > the cpu won't notice any dependency between the unlock and the
fetch,
> >> >> > feeling free to reorder them), thus potentially racing with an
exclusive
> >> >> > writer accessing the data.
> >> >> >
> >> >> > On architectures where atomic ops serialize memory accesses this
would
> >> >> > never happen, otherwise the sequence above seems possible; am I
missing
> >> >> > something?
> >> >>
> >> >> I think your concerns are right, possibly we need this patch:
> >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
> >> >>
> >> >> However speaking with John we agreed possibly there is a more serious
> >> >>  breakage. Possibly, memory barriers would also require to ensure the
> >> >>  compiler to not reorder the operation, while right now, in FreeBSD,
they
> >> >>  just take care of the reordering from the architecture perspective.
> >> >> The only way I'm aware of GCC offers that is to clobber memory.
> >> >> I will provide a patch that address this soon, hoping that GCC will be
> >> >> smart enough to not overhead too much the memory clobbering but just
> >> >> try to understand what's our purpose and servers it (I will try to
> >> >> compare code generated before and after the patch at least for tier-1
> >> >> architectures).
> >> >
> >> > Does GCC really reorder accesses to volatile objects? The C Standard
seems to
> >> > object:
> >> >
> >> > 5.1.2.3 - 2
> >> > Accessing a volatile object, modifying an object, modifying a file, or
calling
> >> > a function that does any of those operations are all side effects,11)
which
> >> > are changes in the state of the execution environment. Evaluation of an
> >> > expression may produce side effects. At certain specified points in the
> >> > execution sequence called sequence points, all side effects of previous
> >> > evaluations shall be complete and no side effects of subsequent
evaluations
> >> > shall have taken place. (A summary of the sequence points is given in
annex

> >> > C.)
> >>
> >> Very interesting.
> >> I was thinking about the other operating systems which basically do
> >> 'memory clobbering' for ensuring a compiler barrier, but actually they
> >> often forsee such a barrier without the conjuction of a memory
> >> operand.
> >>
> >> I think I will need to speak a bit with a GCC engineer in order to see
> >> what do they implement in regard of volatile operands.
> >
> > GCC can be quite aggressive with reordering even in the face of volatile.  
I
> > was recently doing a hack to export some data from the kernel to userland
> > that used a spin loop to grab a snapshot of the contents of a structure
> > similar to the method used in the kernel with the timehands structures.  
It

> > used a volatile structure exposed from the kernel that looked something
> > like:
> >
> > struct foo {
> >        volatile int gen;
> >        /* other stuff */
> > };
> >
> > volatile struct foo *p;
> >
> > do {
> >        x = p->gen;
> >        /* read other stuff */
> >        y = p->gen;
> > } while (x != y && x != 0);
> >
> > GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
> > I eventually had to add explicit "memory" clobbers to force GCC to not
> > move the reads of 'gen' around but do them "around" all the other
> > operations, so that the working code is:
> >
> > do {
> >        x = p->gen;
> >        asm volatile("" ::: "memory");
> >        /* read other stuff */
> >        asm volatile("" ::: "memory");
> >        y = p->gen;
> > } while (x != y && x != 0);
> >
>
> I see.
> So probabilly clobbering memory is the only choice we have right now.
> I will try to make a patch which also keeps into account the
> possibility to skip it (or define by hand alternative approaches) for
> different compilers.
> I wonder, specifically, how llvm/clang relies with it.

We already allow for different compilers to defined different versions of
atomic_*().  I think all you need to do for now is just add "memory" clobbers
to all of the atomic operations that have either an _acq or _rel memory
barrier.

--
John Baldwin
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/9/29 John Baldwin <jhb@...>:

> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
>> 2009/9/29 Max Laier <max@...>:
>> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
>> >> 2009/9/25 Fabio Checconi <fabio@...>:
>> >> > Hi all,
>> >> >  looking at sys/sx.h I have some troubles understanding this comment:
>> >> >
>> >> >  * A note about memory barriers.  Exclusive locks need to use the same
>> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>> >> >  * and _rel when releasing an exclusive lock.  On the other side,
>> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
>> >> >  * but, since they don't update any locked data, no memory barrier is
>> >> >  * needed when releasing a shared lock.
>> >> >
>> >> > In particular, I'm not understanding what prevents the following sequence
>> >> > from happening:
>> >> >
>> >> > CPU A                                   CPU B
>> >> >
>> >> > sx_slock(&data->lock);
>> >> >
>> >> > sx_sunlock(&data->lock);
>> >> >
>> >> > /* reordered after the unlock
>> >> >   by the cpu */
>> >> > if (data->buffer)
>> >> >                                        sx_xlock(&data->lock);
>> >> >                                        free(data->buffer);
>> >> >                                        data->buffer = NULL;
>> >> >                                        sx_xunlock(&data->lock);
>> >> >
>> >> >        a = *data->buffer;
>> >> >
>> >> > IOW, even if readers do not modify the data protected by the lock,
>> >> > without a release barrier a memory access may leak past the unlock (as
>> >> > the cpu won't notice any dependency between the unlock and the fetch,
>> >> > feeling free to reorder them), thus potentially racing with an exclusive
>> >> > writer accessing the data.
>> >> >
>> >> > On architectures where atomic ops serialize memory accesses this would
>> >> > never happen, otherwise the sequence above seems possible; am I missing
>> >> > something?
>> >>
>> >> I think your concerns are right, possibly we need this patch:
>> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>> >>
>> >> However speaking with John we agreed possibly there is a more serious
>> >>  breakage. Possibly, memory barriers would also require to ensure the
>> >>  compiler to not reorder the operation, while right now, in FreeBSD, they
>> >>  just take care of the reordering from the architecture perspective.
>> >> The only way I'm aware of GCC offers that is to clobber memory.
>> >> I will provide a patch that address this soon, hoping that GCC will be
>> >> smart enough to not overhead too much the memory clobbering but just
>> >> try to understand what's our purpose and servers it (I will try to
>> >> compare code generated before and after the patch at least for tier-1
>> >> architectures).
>> >
>> > Does GCC really reorder accesses to volatile objects? The C Standard seems to
>> > object:
>> >
>> > 5.1.2.3 - 2
>> > Accessing a volatile object, modifying an object, modifying a file, or calling
>> > a function that does any of those operations are all side effects,11) which
>> > are changes in the state of the execution environment. Evaluation of an
>> > expression may produce side effects. At certain specified points in the
>> > execution sequence called sequence points, all side effects of previous
>> > evaluations shall be complete and no side effects of subsequent evaluations
>> > shall have taken place. (A summary of the sequence points is given in annex
>> > C.)
>>
>> Very interesting.
>> I was thinking about the other operating systems which basically do
>> 'memory clobbering' for ensuring a compiler barrier, but actually they
>> often forsee such a barrier without the conjuction of a memory
>> operand.
>>
>> I think I will need to speak a bit with a GCC engineer in order to see
>> what do they implement in regard of volatile operands.
>
> GCC can be quite aggressive with reordering even in the face of volatile.  I
> was recently doing a hack to export some data from the kernel to userland
> that used a spin loop to grab a snapshot of the contents of a structure
> similar to the method used in the kernel with the timehands structures.  It
> used a volatile structure exposed from the kernel that looked something
> like:
>
> struct foo {
>        volatile int gen;
>        /* other stuff */
> };
>
> volatile struct foo *p;
>
> do {
>        x = p->gen;
>        /* read other stuff */
>        y = p->gen;
> } while (x != y && x != 0);
>
> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
> I eventually had to add explicit "memory" clobbers to force GCC to not
> move the reads of 'gen' around but do them "around" all the other
> operations, so that the working code is:
>
> do {
>        x = p->gen;
>        asm volatile("" ::: "memory");
>        /* read other stuff */
>        asm volatile("" ::: "memory");
>        y = p->gen;
> } while (x != y && x != 0);

The situation was not so desperate as I first thought.
Actually, only ia32 and amd64 seems affected by the missing of memory
clobbering because it is already done for all the other platform when
using all the memory barriers.
On ia32 and amd64 too, the impact is more limited than what I
expected. atomic_cmpset_* already clobbers the memory and only
atomic_store_rel_* is not adeguately protected among the atomics used
in our locking primitives, thus I would really expect a limited
performance penalty if any.
What was not really protected where the functions defined through
ATOMIC_ASM() and that was the larger part to fix.

I spoke briefly about the compiler support with Christian Lattner
(@Apple, LLVM leader) and he mentioned he was aware of the aggressive
behaviour of GCC pointing me in the direction that what the C Standard
really mandates is that read/write operations with non-volatile
operands can be reordered across a volatile operand but that the
read/write of volatile operands are strong ordered in regard of
eachother. This however means that we have to fix the 'memory clobber'
for GCC-simil compilers and also offer a support for the other that
let them specify a memory barrier.
I then wrote this patch:
http://www.freebsd.org/~attilio/atomic.h.diff

That should address all the concern raised and also forsee the
possibility for other compiler to specify memory barriers semantics
differently from normal atomic.

rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems.

I still didn't compare pickly the produced binaries, but I see a
little increase in the binary sizes probabilly caming from the .text
growing.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."

Re: sx locks and memory barriers

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Attilio Rao wrote:

> 2009/9/29 John Baldwin <jhb@...>:
>> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
>>> 2009/9/29 Max Laier <max@...>:
>>>> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
>>>>> 2009/9/25 Fabio Checconi <fabio@...>:
>>>>>> Hi all,
>>>>>>  looking at sys/sx.h I have some troubles understanding this comment:
>>>>>>
>>>>>>  * A note about memory barriers.  Exclusive locks need to use the same
>>>>>>  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>>>>>>  * and _rel when releasing an exclusive lock.  On the other side,
>>>>>>  * shared lock needs to use an _acq barrier when acquiring the lock
>>>>>>  * but, since they don't update any locked data, no memory barrier is
>>>>>>  * needed when releasing a shared lock.
>>>>>>
>>>>>> In particular, I'm not understanding what prevents the following sequence
>>>>>> from happening:
>>>>>>
>>>>>> CPU A                                   CPU B
>>>>>>
>>>>>> sx_slock(&data->lock);
>>>>>>
>>>>>> sx_sunlock(&data->lock);
>>>>>>
>>>>>> /* reordered after the unlock
>>>>>>   by the cpu */
>>>>>> if (data->buffer)
>>>>>>                                        sx_xlock(&data->lock);
>>>>>>                                        free(data->buffer);
>>>>>>                                        data->buffer = NULL;
>>>>>>                                        sx_xunlock(&data->lock);
>>>>>>
>>>>>>        a = *data->buffer;
>>>>>>
>>>>>> IOW, even if readers do not modify the data protected by the lock,
>>>>>> without a release barrier a memory access may leak past the unlock (as
>>>>>> the cpu won't notice any dependency between the unlock and the fetch,
>>>>>> feeling free to reorder them), thus potentially racing with an exclusive
>>>>>> writer accessing the data.
>>>>>>
>>>>>> On architectures where atomic ops serialize memory accesses this would
>>>>>> never happen, otherwise the sequence above seems possible; am I missing
>>>>>> something?
>>>>> I think your concerns are right, possibly we need this patch:
>>>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>>>>>
>>>>> However speaking with John we agreed possibly there is a more serious
>>>>>  breakage. Possibly, memory barriers would also require to ensure the
>>>>>  compiler to not reorder the operation, while right now, in FreeBSD, they
>>>>>  just take care of the reordering from the architecture perspective.
>>>>> The only way I'm aware of GCC offers that is to clobber memory.
>>>>> I will provide a patch that address this soon, hoping that GCC will be
>>>>> smart enough to not overhead too much the memory clobbering but just
>>>>> try to understand what's our purpose and servers it (I will try to
>>>>> compare code generated before and after the patch at least for tier-1
>>>>> architectures).
>>>> Does GCC really reorder accesses to volatile objects? The C Standard seems to
>>>> object:
>>>>
>>>> 5.1.2.3 - 2
>>>> Accessing a volatile object, modifying an object, modifying a file, or calling
>>>> a function that does any of those operations are all side effects,11) which
>>>> are changes in the state of the execution environment. Evaluation of an
>>>> expression may produce side effects. At certain specified points in the
>>>> execution sequence called sequence points, all side effects of previous
>>>> evaluations shall be complete and no side effects of subsequent evaluations
>>>> shall have taken place. (A summary of the sequence points is given in annex
>>>> C.)
>>> Very interesting.
>>> I was thinking about the other operating systems which basically do
>>> 'memory clobbering' for ensuring a compiler barrier, but actually they
>>> often forsee such a barrier without the conjuction of a memory
>>> operand.
>>>
>>> I think I will need to speak a bit with a GCC engineer in order to see
>>> what do they implement in regard of volatile operands.
>> GCC can be quite aggressive with reordering even in the face of volatile.  I
>> was recently doing a hack to export some data from the kernel to userland
>> that used a spin loop to grab a snapshot of the contents of a structure
>> similar to the method used in the kernel with the timehands structures.  It
>> used a volatile structure exposed from the kernel that looked something
>> like:
>>
>> struct foo {
>>        volatile int gen;
>>        /* other stuff */
>> };
>>
>> volatile struct foo *p;
>>
>> do {
>>        x = p->gen;
>>        /* read other stuff */
>>        y = p->gen;
>> } while (x != y && x != 0);
>>
>> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
>> I eventually had to add explicit "memory" clobbers to force GCC to not
>> move the reads of 'gen' around but do them "around" all the other
>> operations, so that the working code is:
>>
>> do {
>>        x = p->gen;
>>        asm volatile("" ::: "memory");
>>        /* read other stuff */
>>        asm volatile("" ::: "memory");
>>        y = p->gen;
>> } while (x != y && x != 0);
>
> The situation was not so desperate as I first thought.
> Actually, only ia32 and amd64 seems affected by the missing of memory
> clobbering because it is already done for all the other platform when
> using all the memory barriers.
> On ia32 and amd64 too, the impact is more limited than what I
> expected. atomic_cmpset_* already clobbers the memory and only
> atomic_store_rel_* is not adeguately protected among the atomics used
> in our locking primitives, thus I would really expect a limited
> performance penalty if any.
> What was not really protected where the functions defined through
> ATOMIC_ASM() and that was the larger part to fix.
>
> I spoke briefly about the compiler support with Christian Lattner
> (@Apple, LLVM leader) and he mentioned he was aware of the aggressive
> behaviour of GCC pointing me in the direction that what the C Standard
> really mandates is that read/write operations with non-volatile
> operands can be reordered across a volatile operand but that the
> read/write of volatile operands are strong ordered in regard of
> eachother. This however means that we have to fix the 'memory clobber'
> for GCC-simil compilers and also offer a support for the other that
> let them specify a memory barrier.
> I then wrote this patch:
> http://www.freebsd.org/~attilio/atomic.h.diff
>
> That should address all the concern raised and also forsee the
> possibility for other compiler to specify memory barriers semantics
> differently from normal atomic.
>
> rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems.
>
> I still didn't compare pickly the produced binaries, but I see a
> little increase in the binary sizes probabilly caming from the .text
> growing.

This looks good to me.  One thing that is missing is that the UP atomic
load/store ops still need "memory" clobbers to prevent the compiler from
reordering things (you could still have bad juju if you preempted in
between the atomic op and the compiler-reordered operation on UP).

--
John Baldwin
_______________________________________________
freebsd-hackers@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..."