|
View:
New views
19 Messages
—
Rating Filter:
Alert me
|
|
|
sx locks and memory barriersHi all,
looking at sys/sx.h I have some troubles understanding this comment: * A note about memory barriers. Exclusive locks need to use the same * memory barriers as mutexes: _acq when acquiring an exclusive lock * and _rel when releasing an exclusive lock. On the other side, * shared lock needs to use an _acq barrier when acquiring the lock * but, since they don't update any locked data, no memory barrier is * needed when releasing a shared lock. In particular, I'm not understanding what prevents the following sequence from happening: CPU A CPU B sx_slock(&data->lock); sx_sunlock(&data->lock); /* reordered after the unlock by the cpu */ if (data->buffer) sx_xlock(&data->lock); free(data->buffer); data->buffer = NULL; sx_xunlock(&data->lock); a = *data->buffer; IOW, even if readers do not modify the data protected by the lock, without a release barrier a memory access may leak past the unlock (as the cpu won't notice any dependency between the unlock and the fetch, feeling free to reorder them), thus potentially racing with an exclusive writer accessing the data. On architectures where atomic ops serialize memory accesses this would never happen, otherwise the sequence above seems possible; am I missing something? _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Fri, 2009-09-25 at 00:49 +0200, Fabio Checconi wrote:
> Hi all, > looking at sys/sx.h I have some troubles understanding this comment: > > * A note about memory barriers. Exclusive locks need to use the same > * memory barriers as mutexes: _acq when acquiring an exclusive lock > * and _rel when releasing an exclusive lock. On the other side, > * shared lock needs to use an _acq barrier when acquiring the lock > * but, since they don't update any locked data, no memory barrier is > * needed when releasing a shared lock. > > In particular, I'm not understanding what prevents the following sequence > from happening: > > CPU A CPU B > > sx_slock(&data->lock); > > sx_sunlock(&data->lock); > > /* reordered after the unlock > by the cpu */ > if (data->buffer) > sx_xlock(&data->lock); > free(data->buffer); > data->buffer = NULL; > sx_xunlock(&data->lock); > > a = *data->buffer; > > IOW, even if readers do not modify the data protected by the lock, > without a release barrier a memory access may leak past the unlock (as > the cpu won't notice any dependency between the unlock and the fetch, > feeling free to reorder them), thus potentially racing with an exclusive > writer accessing the data. Maybe I am missing something suttle, but shouldn't the shared lock be held for all data access if you want to guarantee sanity? Meaning, if you are accessing data->* without any locks held, all bets are off. robert. > On architectures where atomic ops serialize memory accesses this would > never happen, otherwise the sequence above seems possible; am I missing > something? > _______________________________________________ > freebsd-hackers@... mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." -- Robert Noland <rnoland@...> FreeBSD _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersThe code that Fabio proposes looks like this:
sx_slock(&data->lock); if (data->buffer) a = *data->buffer; sx_sunlock(&data->lock); This point is that without a memory barrier on the unlock, the CPU is free to reorder the instructions into the order is his message. _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Fri, 2009-09-25 at 09:30 -0400, Ryan Stone wrote:
> The code that Fabio proposes looks like this: > > sx_slock(&data->lock); > if (data->buffer) > a = *data->buffer; > sx_sunlock(&data->lock); > > > This point is that without a memory barrier on the unlock, the CPU is > free to reorder the instructions into the order is his message. Ok, then I will sit back and wait for someone with more clue to respond... robert. -- Robert Noland <rnoland@...> FreeBSD _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/25 Fabio Checconi <fabio@...>:
> Hi all, > looking at sys/sx.h I have some troubles understanding this comment: > > * A note about memory barriers. Exclusive locks need to use the same > * memory barriers as mutexes: _acq when acquiring an exclusive lock > * and _rel when releasing an exclusive lock. On the other side, > * shared lock needs to use an _acq barrier when acquiring the lock > * but, since they don't update any locked data, no memory barrier is > * needed when releasing a shared lock. > > In particular, I'm not understanding what prevents the following sequence > from happening: > > CPU A CPU B > > sx_slock(&data->lock); > > sx_sunlock(&data->lock); > > /* reordered after the unlock > by the cpu */ > if (data->buffer) > sx_xlock(&data->lock); > free(data->buffer); > data->buffer = NULL; > sx_xunlock(&data->lock); > > a = *data->buffer; > > IOW, even if readers do not modify the data protected by the lock, > without a release barrier a memory access may leak past the unlock (as > the cpu won't notice any dependency between the unlock and the fetch, > feeling free to reorder them), thus potentially racing with an exclusive > writer accessing the data. > > On architectures where atomic ops serialize memory accesses this would > never happen, otherwise the sequence above seems possible; am I missing > something? I think your concerns are right, possibly we need this patch: http://www.freebsd.org/~attilio/sxrw_unlockb.diff However speaking with John we agreed possibly there is a more serious breakage. Possibly, memory barriers would also require to ensure the compiler to not reorder the operation, while right now, in FreeBSD, they just take care of the reordering from the architecture perspective. The only way I'm aware of GCC offers that is to clobber memory. I will provide a patch that address this soon, hoping that GCC will be smart enough to not overhead too much the memory clobbering but just try to understand what's our purpose and servers it (I will try to compare code generated before and after the patch at least for tier-1 architectures). Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
> 2009/9/25 Fabio Checconi <fabio@...>: > > Hi all, > > looking at sys/sx.h I have some troubles understanding this comment: > > > > * A note about memory barriers. Exclusive locks need to use the same > > * memory barriers as mutexes: _acq when acquiring an exclusive lock > > * and _rel when releasing an exclusive lock. On the other side, > > * shared lock needs to use an _acq barrier when acquiring the lock > > * but, since they don't update any locked data, no memory barrier is > > * needed when releasing a shared lock. > > > > In particular, I'm not understanding what prevents the following sequence > > from happening: > > > > CPU A CPU B > > > > sx_slock(&data->lock); > > > > sx_sunlock(&data->lock); > > > > /* reordered after the unlock > > by the cpu */ > > if (data->buffer) > > sx_xlock(&data->lock); > > free(data->buffer); > > data->buffer = NULL; > > sx_xunlock(&data->lock); > > > > a = *data->buffer; > > > > IOW, even if readers do not modify the data protected by the lock, > > without a release barrier a memory access may leak past the unlock (as > > the cpu won't notice any dependency between the unlock and the fetch, > > feeling free to reorder them), thus potentially racing with an exclusive > > writer accessing the data. > > > > On architectures where atomic ops serialize memory accesses this would > > never happen, otherwise the sequence above seems possible; am I missing > > something? > > I think your concerns are right, possibly we need this patch: > http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > However speaking with John we agreed possibly there is a more serious > breakage. Possibly, memory barriers would also require to ensure the > compiler to not reorder the operation, while right now, in FreeBSD, they > just take care of the reordering from the architecture perspective. > The only way I'm aware of GCC offers that is to clobber memory. > I will provide a patch that address this soon, hoping that GCC will be > smart enough to not overhead too much the memory clobbering but just > try to understand what's our purpose and servers it (I will try to > compare code generated before and after the patch at least for tier-1 > architectures). Does GCC really reorder accesses to volatile objects? The C Standard seems to object: 5.1.2.3 - 2 Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects,11) which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.) I might be reading this wrong, of course. -- /"\ Best regards, | mlaier@... \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
> 2009/9/25 Fabio Checconi <fabio@...>: > > Hi all, > > looking at sys/sx.h I have some troubles understanding this comment: > > > > * A note about memory barriers. Exclusive locks need to use the same > > * memory barriers as mutexes: _acq when acquiring an exclusive lock > > * and _rel when releasing an exclusive lock. On the other side, > > * shared lock needs to use an _acq barrier when acquiring the lock > > * but, since they don't update any locked data, no memory barrier is > > * needed when releasing a shared lock. > > > > In particular, I'm not understanding what prevents the following sequence > > from happening: > > > > CPU A CPU B > > > > sx_slock(&data->lock); > > > > sx_sunlock(&data->lock); > > > > /* reordered after the unlock > > by the cpu */ > > if (data->buffer) > > sx_xlock(&data->lock); > > free(data->buffer); > > data->buffer = NULL; > > sx_xunlock(&data->lock); > > > > a = *data->buffer; > > > > IOW, even if readers do not modify the data protected by the lock, > > without a release barrier a memory access may leak past the unlock (as > > the cpu won't notice any dependency between the unlock and the fetch, > > feeling free to reorder them), thus potentially racing with an exclusive > > writer accessing the data. > > > > On architectures where atomic ops serialize memory accesses this would > > never happen, otherwise the sequence above seems possible; am I missing > > something? > > I think your concerns are right, possibly we need this patch: > http://www.freebsd.org/~attilio/sxrw_unlockb.diff Actually, since you are only worried about reads, I think this should be an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we should prefer the cheapest barrier that provides what we need. You would still need to keep some language about the memory barriers since using "acq" for shared unlocking is different from exclusive unlocking. I can't recall why I thought this was ok originally, sadly my p4 logs didn't include the reasoning either. :-/ > However speaking with John we agreed possibly there is a more serious breakage. > Possibly, memory barriers would also require to ensure the compiler to > not reorder the operation, while right now, in FreeBSD, they just take > care of the reordering from the architecture perspective. > The only way I'm aware of GCC offers that is to clobber memory. > I will provide a patch that address this soon, hoping that GCC will be > smart enough to not overhead too much the memory clobbering but just > try to understand what's our purpose and servers it (I will try to > compare code generated before and after the patch at least for tier-1 > architectures). The memory clobber is quite heavyweight. It actually forces gcc to forget any cached memory items in registers and reload everything again. What I really want is just a barrier to tell GCC to not reorder things. If I read a value in the program before acquiring a lock it is in theory fine to keep that cached across the barrier. However, there isn't a way to do this sort of thing with GCC currently. -- John Baldwin _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/29 John Baldwin <jhb@...>:
> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >> 2009/9/25 Fabio Checconi <fabio@...>: >> > Hi all, >> > looking at sys/sx.h I have some troubles understanding this comment: >> > >> > * A note about memory barriers. Exclusive locks need to use the same >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> > * and _rel when releasing an exclusive lock. On the other side, >> > * shared lock needs to use an _acq barrier when acquiring the lock >> > * but, since they don't update any locked data, no memory barrier is >> > * needed when releasing a shared lock. >> > >> > In particular, I'm not understanding what prevents the following sequence >> > from happening: >> > >> > CPU A CPU B >> > >> > sx_slock(&data->lock); >> > >> > sx_sunlock(&data->lock); >> > >> > /* reordered after the unlock >> > by the cpu */ >> > if (data->buffer) >> > sx_xlock(&data->lock); >> > free(data->buffer); >> > data->buffer = NULL; >> > sx_xunlock(&data->lock); >> > >> > a = *data->buffer; >> > >> > IOW, even if readers do not modify the data protected by the lock, >> > without a release barrier a memory access may leak past the unlock (as >> > the cpu won't notice any dependency between the unlock and the fetch, >> > feeling free to reorder them), thus potentially racing with an exclusive >> > writer accessing the data. >> > >> > On architectures where atomic ops serialize memory accesses this would >> > never happen, otherwise the sequence above seems possible; am I missing >> > something? >> >> I think your concerns are right, possibly we need this patch: >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > Actually, since you are only worried about reads, I think this should be > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we > should prefer the cheapest barrier that provides what we need. You would > still need to keep some language about the memory barriers since using "acq" > for shared unlocking is different from exclusive unlocking. Actually, I don't think that an acq barrier ensures enough protection against the reordering of 'earlier' operation thus not fixing the architecture ordering problem reported by Fabio. Also, I don't think we just have to care about reads (or I don't understand what you mean here). However, I'm not even sure that we have faster read barriers than the write one. As long as it should be true in theory I don't think that's what happen in practice. > The memory clobber is quite heavyweight. It actually forces gcc to forget any > cached memory items in registers and reload everything again. What I really > want is just a barrier to tell GCC to not reorder things. If I read a value > in the program before acquiring a lock it is in theory fine to keep that > cached across the barrier. However, there isn't a way to do this sort of > thing with GCC currently. Yes, that's the only tool we have right now with GCC. I will try to look for another way, but it sounds difficult to discover. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tue, Sep 29, 2009 at 21:15, Attilio Rao <attilio@...> wrote:
> 2009/9/29 John Baldwin <jhb@...>: >> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >>> 2009/9/25 Fabio Checconi <fabio@...>: >>> > Hi all, >>> >  looking at sys/sx.h I have some troubles understanding this comment: >>> > >>> >  * A note about memory barriers.  Exclusive locks need to use the same >>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock >>> >  * and _rel when releasing an exclusive lock.  On the other side, >>> >  * shared lock needs to use an _acq barrier when acquiring the lock >>> >  * but, since they don't update any locked data, no memory barrier is >>> >  * needed when releasing a shared lock. >>> > >>> > In particular, I'm not understanding what prevents the following sequence >>> > from happening: >>> > >>> > CPU A                  CPU B >>> > >>> > sx_slock(&data->lock); >>> > >>> > sx_sunlock(&data->lock); >>> > >>> > /* reordered after the unlock >>> >  by the cpu */ >>> > if (data->buffer) >>> >                     sx_xlock(&data->lock); >>> >                     free(data->buffer); >>> >                     data->buffer = NULL; >>> >                     sx_xunlock(&data->lock); >>> > >>> >     a = *data->buffer; >>> > >>> > IOW, even if readers do not modify the data protected by the lock, >>> > without a release barrier a memory access may leak past the unlock (as >>> > the cpu won't notice any dependency between the unlock and the fetch, >>> > feeling free to reorder them), thus potentially racing with an exclusive >>> > writer accessing the data. >>> > >>> > On architectures where atomic ops serialize memory accesses this would >>> > never happen, otherwise the sequence above seems possible; am I missing >>> > something? >>> >>> I think your concerns are right, possibly we need this patch: >>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> Actually, since you are only worried about reads, I think this should be >> an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we >> should prefer the cheapest barrier that provides what we need.  You would >> still need to keep some language about the memory barriers since using "acq" >> for shared unlocking is different from exclusive unlocking. > > Actually, I don't think that an acq barrier ensures enough protection > against the reordering of 'earlier' operation thus not fixing the > architecture ordering problem reported by Fabio. Also, I don't think > we just have to care about reads (or  I don't understand what you mean > here). > However, I'm not even sure that we have faster read barriers than the > write one. As long as it should be true in theory I don't think that's > what happen in practice. > >> The memory clobber is quite heavyweight.  It actually forces gcc to forget any >> cached memory items in registers and reload everything again.  What I really >> want is just a barrier to tell GCC to not reorder things.  If I read a value >> in the program before acquiring a lock it is in theory fine to keep that >> cached across the barrier.  However, there isn't a way to do this sort of >> thing with GCC currently. > > Yes, that's the only tool we have right now with GCC. I will try to > look for another way, but it sounds difficult to discover. Even if we would have a mechanism to tell GCC to not reorder the instructions the CPU itself would still be free to reorder if there are no barriers. Or am I missing something? _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote:
> 2009/9/29 John Baldwin <jhb@...>: > > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: > >> 2009/9/25 Fabio Checconi <fabio@...>: > >> > Hi all, > >> > looking at sys/sx.h I have some troubles understanding this comment: > >> > > >> > * A note about memory barriers. Exclusive locks need to use the same > >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock > >> > * and _rel when releasing an exclusive lock. On the other side, > >> > * shared lock needs to use an _acq barrier when acquiring the lock > >> > * but, since they don't update any locked data, no memory barrier is > >> > * needed when releasing a shared lock. > >> > > >> > In particular, I'm not understanding what prevents the following sequence > >> > from happening: > >> > > >> > CPU A CPU B > >> > > >> > sx_slock(&data->lock); > >> > > >> > sx_sunlock(&data->lock); > >> > > >> > /* reordered after the unlock > >> > by the cpu */ > >> > if (data->buffer) > >> > sx_xlock(&data->lock); > >> > free(data->buffer); > >> > data->buffer = NULL; > >> > sx_xunlock(&data->lock); > >> > > >> > a = *data->buffer; > >> > > >> > IOW, even if readers do not modify the data protected by the lock, > >> > without a release barrier a memory access may leak past the unlock (as > >> > the cpu won't notice any dependency between the unlock and the fetch, > >> > feeling free to reorder them), thus potentially racing with an exclusive > >> > writer accessing the data. > >> > > >> > On architectures where atomic ops serialize memory accesses this would > >> > never happen, otherwise the sequence above seems possible; am I missing > >> > something? > >> > >> I think your concerns are right, possibly we need this patch: > >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > > > Actually, since you are only worried about reads, I think this should be > > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we > > should prefer the cheapest barrier that provides what we need. You would > > still need to keep some language about the memory barriers since using "acq" > > for shared unlocking is different from exclusive unlocking. > > Actually, I don't think that an acq barrier ensures enough protection > against the reordering of 'earlier' operation thus not fixing the > architecture ordering problem reported by Fabio. Also, I don't think > we just have to care about reads (or I don't understand what you mean > here). Hmmm, it might not on certain archs. It does on x86 (i.e. an lfence would work here on x86), but probably not on ia64 and sparc64. Also, we certainly only care about reads. A read/share lock cannot resolve any races where the lock holder is writing data, it can only ensure that the lock holder can safely read shared data without the data changing out from underneath it. > However, I'm not even sure that we have faster read barriers than the > write one. As long as it should be true in theory I don't think that's > what happen in practice. bde@ found that sfence was generally much more expensive than lfence on x86. However, since x86 guarantees the order of all stores we don't actually need to use sfence at all on x86 anyway. -- John Baldwin _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/29 John Baldwin <jhb@...>:
> On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote: >> 2009/9/29 John Baldwin <jhb@...>: >> > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi <fabio@...>: >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> > sx_xlock(&data->lock); >> >> > free(data->buffer); >> >> > data->buffer = NULL; >> >> > sx_xunlock(&data->lock); >> >> > >> >> > a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> > >> > Actually, since you are only worried about reads, I think this should be >> > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we >> > should prefer the cheapest barrier that provides what we need. You would >> > still need to keep some language about the memory barriers since using "acq" >> > for shared unlocking is different from exclusive unlocking. >> >> Actually, I don't think that an acq barrier ensures enough protection >> against the reordering of 'earlier' operation thus not fixing the >> architecture ordering problem reported by Fabio. Also, I don't think >> we just have to care about reads (or I don't understand what you mean >> here). > > Hmmm, it might not on certain archs. It does on x86 (i.e. an lfence would > work here on x86), but probably not on ia64 and sparc64. Also, we certainly > only care about reads. A read/share lock cannot resolve any races where the > lock holder is writing data, it can only ensure that the lock holder can > safely read shared data without the data changing out from underneath it. > >> However, I'm not even sure that we have faster read barriers than the >> write one. As long as it should be true in theory I don't think that's >> what happen in practice. > > bde@ found that sfence was generally much more expensive than lfence on x86. > However, since x86 guarantees the order of all stores we don't actually need > to use sfence at all on x86 anyway. Yes, x86 guarantees that the stores are strong ordered so I don't think acq to be faster than rel. Can I assume the patch I already sent as reviewed by you and commit then, right? Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/29 Max Laier <max@...>:
> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> 2009/9/25 Fabio Checconi <fabio@...>: >> > Hi all, >> > looking at sys/sx.h I have some troubles understanding this comment: >> > >> > * A note about memory barriers. Exclusive locks need to use the same >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> > * and _rel when releasing an exclusive lock. On the other side, >> > * shared lock needs to use an _acq barrier when acquiring the lock >> > * but, since they don't update any locked data, no memory barrier is >> > * needed when releasing a shared lock. >> > >> > In particular, I'm not understanding what prevents the following sequence >> > from happening: >> > >> > CPU A CPU B >> > >> > sx_slock(&data->lock); >> > >> > sx_sunlock(&data->lock); >> > >> > /* reordered after the unlock >> > by the cpu */ >> > if (data->buffer) >> > sx_xlock(&data->lock); >> > free(data->buffer); >> > data->buffer = NULL; >> > sx_xunlock(&data->lock); >> > >> > a = *data->buffer; >> > >> > IOW, even if readers do not modify the data protected by the lock, >> > without a release barrier a memory access may leak past the unlock (as >> > the cpu won't notice any dependency between the unlock and the fetch, >> > feeling free to reorder them), thus potentially racing with an exclusive >> > writer accessing the data. >> > >> > On architectures where atomic ops serialize memory accesses this would >> > never happen, otherwise the sequence above seems possible; am I missing >> > something? >> >> I think your concerns are right, possibly we need this patch: >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> However speaking with John we agreed possibly there is a more serious >> breakage. Possibly, memory barriers would also require to ensure the >> compiler to not reorder the operation, while right now, in FreeBSD, they >> just take care of the reordering from the architecture perspective. >> The only way I'm aware of GCC offers that is to clobber memory. >> I will provide a patch that address this soon, hoping that GCC will be >> smart enough to not overhead too much the memory clobbering but just >> try to understand what's our purpose and servers it (I will try to >> compare code generated before and after the patch at least for tier-1 >> architectures). > > Does GCC really reorder accesses to volatile objects? The C Standard seems to > object: > > 5.1.2.3 - 2 > Accessing a volatile object, modifying an object, modifying a file, or calling > a function that does any of those operations are all side effects,11) which > are changes in the state of the execution environment. Evaluation of an > expression may produce side effects. At certain specified points in the > execution sequence called sequence points, all side effects of previous > evaluations shall be complete and no side effects of subsequent evaluations > shall have taken place. (A summary of the sequence points is given in annex > C.) Very interesting. I was thinking about the other operating systems which basically do 'memory clobbering' for ensuring a compiler barrier, but actually they often forsee such a barrier without the conjuction of a memory operand. I think I will need to speak a bit with a GCC engineer in order to see what do they implement in regard of volatile operands. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/29 Marius Nünnerich <marius@...>:
> On Tue, Sep 29, 2009 at 21:15, Attilio Rao <attilio@...> wrote: >> 2009/9/29 John Baldwin <jhb@...>: >>> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >>>> 2009/9/25 Fabio Checconi <fabio@...>: >>>> > Hi all, >>>> > looking at sys/sx.h I have some troubles understanding this comment: >>>> > >>>> > * A note about memory barriers. Exclusive locks need to use the same >>>> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >>>> > * and _rel when releasing an exclusive lock. On the other side, >>>> > * shared lock needs to use an _acq barrier when acquiring the lock >>>> > * but, since they don't update any locked data, no memory barrier is >>>> > * needed when releasing a shared lock. >>>> > >>>> > In particular, I'm not understanding what prevents the following sequence >>>> > from happening: >>>> > >>>> > CPU A CPU B >>>> > >>>> > sx_slock(&data->lock); >>>> > >>>> > sx_sunlock(&data->lock); >>>> > >>>> > /* reordered after the unlock >>>> > by the cpu */ >>>> > if (data->buffer) >>>> > sx_xlock(&data->lock); >>>> > free(data->buffer); >>>> > data->buffer = NULL; >>>> > sx_xunlock(&data->lock); >>>> > >>>> > a = *data->buffer; >>>> > >>>> > IOW, even if readers do not modify the data protected by the lock, >>>> > without a release barrier a memory access may leak past the unlock (as >>>> > the cpu won't notice any dependency between the unlock and the fetch, >>>> > feeling free to reorder them), thus potentially racing with an exclusive >>>> > writer accessing the data. >>>> > >>>> > On architectures where atomic ops serialize memory accesses this would >>>> > never happen, otherwise the sequence above seems possible; am I missing >>>> > something? >>>> >>>> I think your concerns are right, possibly we need this patch: >>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >>> >>> Actually, since you are only worried about reads, I think this should be >>> an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we >>> should prefer the cheapest barrier that provides what we need. You would >>> still need to keep some language about the memory barriers since using "acq" >>> for shared unlocking is different from exclusive unlocking. >> >> Actually, I don't think that an acq barrier ensures enough protection >> against the reordering of 'earlier' operation thus not fixing the >> architecture ordering problem reported by Fabio. Also, I don't think >> we just have to care about reads (or I don't understand what you mean >> here). >> However, I'm not even sure that we have faster read barriers than the >> write one. As long as it should be true in theory I don't think that's >> what happen in practice. >> >>> The memory clobber is quite heavyweight. It actually forces gcc to forget any >>> cached memory items in registers and reload everything again. What I really >>> want is just a barrier to tell GCC to not reorder things. If I read a value >>> in the program before acquiring a lock it is in theory fine to keep that >>> cached across the barrier. However, there isn't a way to do this sort of >>> thing with GCC currently. >> >> Yes, that's the only tool we have right now with GCC. I will try to >> look for another way, but it sounds difficult to discover. > > Even if we would have a mechanism to tell GCC to not reorder the > instructions the CPU itself would still be free to reorder if there > are no barriers. Or am I missing something? Our code already takes care of that case in our barriers. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tuesday 29 September 2009 4:26:56 pm Marius Nünnerich wrote:
> On Tue, Sep 29, 2009 at 21:15, Attilio Rao <attilio@...> wrote: > > 2009/9/29 John Baldwin <jhb@...>: > >> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: > >>> 2009/9/25 Fabio Checconi <fabio@...>: > >>> > Hi all, > >>> >  looking at sys/sx.h I have some troubles understanding this comment: > >>> > > >>> >  * A note about memory barriers.  Exclusive locks need to use the same > >>> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock > >>> >  * and _rel when releasing an exclusive lock.  On the other side, > >>> >  * shared lock needs to use an _acq barrier when acquiring the lock > >>> >  * but, since they don't update any locked data, no memory barrier is > >>> >  * needed when releasing a shared lock. > >>> > > >>> > In particular, I'm not understanding what prevents the following sequence > >>> > from happening: > >>> > > >>> > CPU A                  CPU B > >>> > > >>> > sx_slock(&data->lock); > >>> > > >>> > sx_sunlock(&data->lock); > >>> > > >>> > /* reordered after the unlock > >>> >  by the cpu */ > >>> > if (data->buffer) > >>> >                     sx_xlock(&data->lock); > >>> >                     free(data->buffer); > >>> >                     data->buffer = NULL; > >>> >                     sx_xunlock(&data->lock); > >>> > > >>> >     a = *data->buffer; > >>> > > >>> > IOW, even if readers do not modify the data protected by the lock, > >>> > without a release barrier a memory access may leak past the unlock (as > >>> > the cpu won't notice any dependency between the unlock and the fetch, > >>> > feeling free to reorder them), thus potentially racing with an exclusive > >>> > writer accessing the data. > >>> > > >>> > On architectures where atomic ops serialize memory accesses this would > >>> > never happen, otherwise the sequence above seems possible; am I missing > >>> > something? > >>> > >>> I think your concerns are right, possibly we need this patch: > >>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > >> > >> Actually, since you are only worried about reads, I think this should be > >> an "acq" barrier rather than a "rel".  In some cases "acq" is cheaper, so we > >> should prefer the cheapest barrier that provides what we need.  You would > >> still need to keep some language about the memory barriers since using "acq" > >> for shared unlocking is different from exclusive unlocking. > > > > Actually, I don't think that an acq barrier ensures enough protection > > against the reordering of 'earlier' operation thus not fixing the > > architecture ordering problem reported by Fabio. Also, I don't think > > we just have to care about reads (or  I don't understand what you mean > > here). > > However, I'm not even sure that we have faster read barriers than the > > write one. As long as it should be true in theory I don't think that's > > what happen in practice. > > > >> The memory clobber is quite heavyweight.  It actually forces gcc to forget any > >> cached memory items in registers and reload everything again.  What I really > >> want is just a barrier to tell GCC to not reorder things.  If I read a value > >> in the program before acquiring a lock it is in theory fine to keep that > >> cached across the barrier.  However, there isn't a way to do this sort of > >> thing with GCC currently. > > > > Yes, that's the only tool we have right now with GCC. I will try to > > look for another way, but it sounds difficult to discover. > > Even if we would have a mechanism to tell GCC to not reorder the > instructions the CPU itself would still be free to reorder if there > are no barriers. Or am I missing something? No, the thing to do here for the second part is add "memory" clobbers to the existing atomic ops with barriers. It will still require barriers for them to be enforced. -- John Baldwin _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
> 2009/9/29 Max Laier <max@...>: > > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: > >> 2009/9/25 Fabio Checconi <fabio@...>: > >> > Hi all, > >> > looking at sys/sx.h I have some troubles understanding this comment: > >> > > >> > * A note about memory barriers. Exclusive locks need to use the same > >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock > >> > * and _rel when releasing an exclusive lock. On the other side, > >> > * shared lock needs to use an _acq barrier when acquiring the lock > >> > * but, since they don't update any locked data, no memory barrier is > >> > * needed when releasing a shared lock. > >> > > >> > In particular, I'm not understanding what prevents the following sequence > >> > from happening: > >> > > >> > CPU A CPU B > >> > > >> > sx_slock(&data->lock); > >> > > >> > sx_sunlock(&data->lock); > >> > > >> > /* reordered after the unlock > >> > by the cpu */ > >> > if (data->buffer) > >> > sx_xlock(&data->lock); > >> > free(data->buffer); > >> > data->buffer = NULL; > >> > sx_xunlock(&data->lock); > >> > > >> > a = *data->buffer; > >> > > >> > IOW, even if readers do not modify the data protected by the lock, > >> > without a release barrier a memory access may leak past the unlock (as > >> > the cpu won't notice any dependency between the unlock and the fetch, > >> > feeling free to reorder them), thus potentially racing with an exclusive > >> > writer accessing the data. > >> > > >> > On architectures where atomic ops serialize memory accesses this would > >> > never happen, otherwise the sequence above seems possible; am I missing > >> > something? > >> > >> I think your concerns are right, possibly we need this patch: > >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > >> > >> However speaking with John we agreed possibly there is a more serious > >> breakage. Possibly, memory barriers would also require to ensure the > >> compiler to not reorder the operation, while right now, in FreeBSD, they > >> just take care of the reordering from the architecture perspective. > >> The only way I'm aware of GCC offers that is to clobber memory. > >> I will provide a patch that address this soon, hoping that GCC will be > >> smart enough to not overhead too much the memory clobbering but just > >> try to understand what's our purpose and servers it (I will try to > >> compare code generated before and after the patch at least for tier-1 > >> architectures). > > > > Does GCC really reorder accesses to volatile objects? The C Standard seems to > > object: > > > > 5.1.2.3 - 2 > > Accessing a volatile object, modifying an object, modifying a file, or calling > > a function that does any of those operations are all side effects,11) which > > are changes in the state of the execution environment. Evaluation of an > > expression may produce side effects. At certain specified points in the > > execution sequence called sequence points, all side effects of previous > > evaluations shall be complete and no side effects of subsequent evaluations > > shall have taken place. (A summary of the sequence points is given in annex > > C.) > > Very interesting. > I was thinking about the other operating systems which basically do > 'memory clobbering' for ensuring a compiler barrier, but actually they > often forsee such a barrier without the conjuction of a memory > operand. > > I think I will need to speak a bit with a GCC engineer in order to see > what do they implement in regard of volatile operands. GCC can be quite aggressive with reordering even in the face of volatile. I was recently doing a hack to export some data from the kernel to userland that used a spin loop to grab a snapshot of the contents of a structure similar to the method used in the kernel with the timehands structures. It used a volatile structure exposed from the kernel that looked something like: struct foo { volatile int gen; /* other stuff */ }; volatile struct foo *p; do { x = p->gen; /* read other stuff */ y = p->gen; } while (x != y && x != 0); GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. I eventually had to add explicit "memory" clobbers to force GCC to not move the reads of 'gen' around but do them "around" all the other operations, so that the working code is: do { x = p->gen; asm volatile("" ::: "memory"); /* read other stuff */ asm volatile("" ::: "memory"); y = p->gen; } while (x != y && x != 0); -- John Baldwin _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/29 John Baldwin <jhb@...>:
> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >> 2009/9/29 Max Laier <max@...>: >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi <fabio@...>: >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> > sx_xlock(&data->lock); >> >> > free(data->buffer); >> >> > data->buffer = NULL; >> >> > sx_xunlock(&data->lock); >> >> > >> >> > a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> >> >> However speaking with John we agreed possibly there is a more serious >> >> breakage. Possibly, memory barriers would also require to ensure the >> >> compiler to not reorder the operation, while right now, in FreeBSD, they >> >> just take care of the reordering from the architecture perspective. >> >> The only way I'm aware of GCC offers that is to clobber memory. >> >> I will provide a patch that address this soon, hoping that GCC will be >> >> smart enough to not overhead too much the memory clobbering but just >> >> try to understand what's our purpose and servers it (I will try to >> >> compare code generated before and after the patch at least for tier-1 >> >> architectures). >> > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems to >> > object: >> > >> > 5.1.2.3 - 2 >> > Accessing a volatile object, modifying an object, modifying a file, or calling >> > a function that does any of those operations are all side effects,11) which >> > are changes in the state of the execution environment. Evaluation of an >> > expression may produce side effects. At certain specified points in the >> > execution sequence called sequence points, all side effects of previous >> > evaluations shall be complete and no side effects of subsequent evaluations >> > shall have taken place. (A summary of the sequence points is given in annex >> > C.) >> >> Very interesting. >> I was thinking about the other operating systems which basically do >> 'memory clobbering' for ensuring a compiler barrier, but actually they >> often forsee such a barrier without the conjuction of a memory >> operand. >> >> I think I will need to speak a bit with a GCC engineer in order to see >> what do they implement in regard of volatile operands. > > GCC can be quite aggressive with reordering even in the face of volatile. I > was recently doing a hack to export some data from the kernel to userland > that used a spin loop to grab a snapshot of the contents of a structure > similar to the method used in the kernel with the timehands structures. It > used a volatile structure exposed from the kernel that looked something > like: > > struct foo { > volatile int gen; > /* other stuff */ > }; > > volatile struct foo *p; > > do { > x = p->gen; > /* read other stuff */ > y = p->gen; > } while (x != y && x != 0); > > GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. > I eventually had to add explicit "memory" clobbers to force GCC to not > move the reads of 'gen' around but do them "around" all the other > operations, so that the working code is: > > do { > x = p->gen; > asm volatile("" ::: "memory"); > /* read other stuff */ > asm volatile("" ::: "memory"); > y = p->gen; > } while (x != y && x != 0); > I see. So probabilly clobbering memory is the only choice we have right now. I will try to make a patch which also keeps into account the possibility to skip it (or define by hand alternative approaches) for different compilers. I wonder, specifically, how llvm/clang relies with it. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersOn Tuesday 29 September 2009 5:39:43 pm Attilio Rao wrote:
> 2009/9/29 John Baldwin <jhb@...>: > > On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: > >> 2009/9/29 Max Laier <max@...>: > >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: > >> >> 2009/9/25 Fabio Checconi <fabio@...>: > >> >> > Hi all, > >> >> > looking at sys/sx.h I have some troubles understanding this comment: > >> >> > > >> >> > * A note about memory barriers. Exclusive locks need to use the same > >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock > >> >> > * and _rel when releasing an exclusive lock. On the other side, > >> >> > * shared lock needs to use an _acq barrier when acquiring the lock > >> >> > * but, since they don't update any locked data, no memory barrier is > >> >> > * needed when releasing a shared lock. > >> >> > > >> >> > In particular, I'm not understanding what prevents the following sequence > >> >> > from happening: > >> >> > > >> >> > CPU A CPU B > >> >> > > >> >> > sx_slock(&data->lock); > >> >> > > >> >> > sx_sunlock(&data->lock); > >> >> > > >> >> > /* reordered after the unlock > >> >> > by the cpu */ > >> >> > if (data->buffer) > >> >> > sx_xlock(&data->lock); > >> >> > free(data->buffer); > >> >> > data->buffer = NULL; > >> >> > sx_xunlock(&data->lock); > >> >> > > >> >> > a = *data->buffer; > >> >> > > >> >> > IOW, even if readers do not modify the data protected by the lock, > >> >> > without a release barrier a memory access may leak past the unlock > >> >> > the cpu won't notice any dependency between the unlock and the fetch, > >> >> > feeling free to reorder them), thus potentially racing with an exclusive > >> >> > writer accessing the data. > >> >> > > >> >> > On architectures where atomic ops serialize memory accesses this would > >> >> > never happen, otherwise the sequence above seems possible; am I missing > >> >> > something? > >> >> > >> >> I think your concerns are right, possibly we need this patch: > >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > >> >> > >> >> However speaking with John we agreed possibly there is a more serious > >> >> breakage. Possibly, memory barriers would also require to ensure the > >> >> compiler to not reorder the operation, while right now, in FreeBSD, they > >> >> just take care of the reordering from the architecture perspective. > >> >> The only way I'm aware of GCC offers that is to clobber memory. > >> >> I will provide a patch that address this soon, hoping that GCC will be > >> >> smart enough to not overhead too much the memory clobbering but just > >> >> try to understand what's our purpose and servers it (I will try to > >> >> compare code generated before and after the patch at least for tier-1 > >> >> architectures). > >> > > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems to > >> > object: > >> > > >> > 5.1.2.3 - 2 > >> > Accessing a volatile object, modifying an object, modifying a file, or calling > >> > a function that does any of those operations are all side effects,11) which > >> > are changes in the state of the execution environment. Evaluation of an > >> > expression may produce side effects. At certain specified points in the > >> > execution sequence called sequence points, all side effects of previous > >> > evaluations shall be complete and no side effects of subsequent evaluations > >> > shall have taken place. (A summary of the sequence points is given in annex > >> > C.) > >> > >> Very interesting. > >> I was thinking about the other operating systems which basically do > >> 'memory clobbering' for ensuring a compiler barrier, but actually they > >> often forsee such a barrier without the conjuction of a memory > >> operand. > >> > >> I think I will need to speak a bit with a GCC engineer in order to see > >> what do they implement in regard of volatile operands. > > > > GCC can be quite aggressive with reordering even in the face of volatile. > > was recently doing a hack to export some data from the kernel to userland > > that used a spin loop to grab a snapshot of the contents of a structure > > similar to the method used in the kernel with the timehands structures. It > > used a volatile structure exposed from the kernel that looked something > > like: > > > > struct foo { > > volatile int gen; > > /* other stuff */ > > }; > > > > volatile struct foo *p; > > > > do { > > x = p->gen; > > /* read other stuff */ > > y = p->gen; > > } while (x != y && x != 0); > > > > GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. > > I eventually had to add explicit "memory" clobbers to force GCC to not > > move the reads of 'gen' around but do them "around" all the other > > operations, so that the working code is: > > > > do { > > x = p->gen; > > asm volatile("" ::: "memory"); > > /* read other stuff */ > > asm volatile("" ::: "memory"); > > y = p->gen; > > } while (x != y && x != 0); > > > > I see. > So probabilly clobbering memory is the only choice we have right now. > I will try to make a patch which also keeps into account the > possibility to skip it (or define by hand alternative approaches) for > different compilers. > I wonder, specifically, how llvm/clang relies with it. We already allow for different compilers to defined different versions of atomic_*(). I think all you need to do for now is just add "memory" clobbers to all of the atomic operations that have either an _acq or _rel memory barrier. -- John Baldwin _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriers2009/9/29 John Baldwin <jhb@...>:
> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >> 2009/9/29 Max Laier <max@...>: >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi <fabio@...>: >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> > sx_xlock(&data->lock); >> >> > free(data->buffer); >> >> > data->buffer = NULL; >> >> > sx_xunlock(&data->lock); >> >> > >> >> > a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> >> >> However speaking with John we agreed possibly there is a more serious >> >> breakage. Possibly, memory barriers would also require to ensure the >> >> compiler to not reorder the operation, while right now, in FreeBSD, they >> >> just take care of the reordering from the architecture perspective. >> >> The only way I'm aware of GCC offers that is to clobber memory. >> >> I will provide a patch that address this soon, hoping that GCC will be >> >> smart enough to not overhead too much the memory clobbering but just >> >> try to understand what's our purpose and servers it (I will try to >> >> compare code generated before and after the patch at least for tier-1 >> >> architectures). >> > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems to >> > object: >> > >> > 5.1.2.3 - 2 >> > Accessing a volatile object, modifying an object, modifying a file, or calling >> > a function that does any of those operations are all side effects,11) which >> > are changes in the state of the execution environment. Evaluation of an >> > expression may produce side effects. At certain specified points in the >> > execution sequence called sequence points, all side effects of previous >> > evaluations shall be complete and no side effects of subsequent evaluations >> > shall have taken place. (A summary of the sequence points is given in annex >> > C.) >> >> Very interesting. >> I was thinking about the other operating systems which basically do >> 'memory clobbering' for ensuring a compiler barrier, but actually they >> often forsee such a barrier without the conjuction of a memory >> operand. >> >> I think I will need to speak a bit with a GCC engineer in order to see >> what do they implement in regard of volatile operands. > > GCC can be quite aggressive with reordering even in the face of volatile. I > was recently doing a hack to export some data from the kernel to userland > that used a spin loop to grab a snapshot of the contents of a structure > similar to the method used in the kernel with the timehands structures. It > used a volatile structure exposed from the kernel that looked something > like: > > struct foo { > volatile int gen; > /* other stuff */ > }; > > volatile struct foo *p; > > do { > x = p->gen; > /* read other stuff */ > y = p->gen; > } while (x != y && x != 0); > > GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. > I eventually had to add explicit "memory" clobbers to force GCC to not > move the reads of 'gen' around but do them "around" all the other > operations, so that the working code is: > > do { > x = p->gen; > asm volatile("" ::: "memory"); > /* read other stuff */ > asm volatile("" ::: "memory"); > y = p->gen; > } while (x != y && x != 0); The situation was not so desperate as I first thought. Actually, only ia32 and amd64 seems affected by the missing of memory clobbering because it is already done for all the other platform when using all the memory barriers. On ia32 and amd64 too, the impact is more limited than what I expected. atomic_cmpset_* already clobbers the memory and only atomic_store_rel_* is not adeguately protected among the atomics used in our locking primitives, thus I would really expect a limited performance penalty if any. What was not really protected where the functions defined through ATOMIC_ASM() and that was the larger part to fix. I spoke briefly about the compiler support with Christian Lattner (@Apple, LLVM leader) and he mentioned he was aware of the aggressive behaviour of GCC pointing me in the direction that what the C Standard really mandates is that read/write operations with non-volatile operands can be reordered across a volatile operand but that the read/write of volatile operands are strong ordered in regard of eachother. This however means that we have to fix the 'memory clobber' for GCC-simil compilers and also offer a support for the other that let them specify a memory barrier. I then wrote this patch: http://www.freebsd.org/~attilio/atomic.h.diff That should address all the concern raised and also forsee the possibility for other compiler to specify memory barriers semantics differently from normal atomic. rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems. I still didn't compare pickly the produced binaries, but I see a little increase in the binary sizes probabilly caming from the .text growing. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
|
|
Re: sx locks and memory barriersAttilio Rao wrote:
> 2009/9/29 John Baldwin <jhb@...>: >> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >>> 2009/9/29 Max Laier <max@...>: >>>> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >>>>> 2009/9/25 Fabio Checconi <fabio@...>: >>>>>> Hi all, >>>>>> looking at sys/sx.h I have some troubles understanding this comment: >>>>>> >>>>>> * A note about memory barriers. Exclusive locks need to use the same >>>>>> * memory barriers as mutexes: _acq when acquiring an exclusive lock >>>>>> * and _rel when releasing an exclusive lock. On the other side, >>>>>> * shared lock needs to use an _acq barrier when acquiring the lock >>>>>> * but, since they don't update any locked data, no memory barrier is >>>>>> * needed when releasing a shared lock. >>>>>> >>>>>> In particular, I'm not understanding what prevents the following sequence >>>>>> from happening: >>>>>> >>>>>> CPU A CPU B >>>>>> >>>>>> sx_slock(&data->lock); >>>>>> >>>>>> sx_sunlock(&data->lock); >>>>>> >>>>>> /* reordered after the unlock >>>>>> by the cpu */ >>>>>> if (data->buffer) >>>>>> sx_xlock(&data->lock); >>>>>> free(data->buffer); >>>>>> data->buffer = NULL; >>>>>> sx_xunlock(&data->lock); >>>>>> >>>>>> a = *data->buffer; >>>>>> >>>>>> IOW, even if readers do not modify the data protected by the lock, >>>>>> without a release barrier a memory access may leak past the unlock (as >>>>>> the cpu won't notice any dependency between the unlock and the fetch, >>>>>> feeling free to reorder them), thus potentially racing with an exclusive >>>>>> writer accessing the data. >>>>>> >>>>>> On architectures where atomic ops serialize memory accesses this would >>>>>> never happen, otherwise the sequence above seems possible; am I missing >>>>>> something? >>>>> I think your concerns are right, possibly we need this patch: >>>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >>>>> >>>>> However speaking with John we agreed possibly there is a more serious >>>>> breakage. Possibly, memory barriers would also require to ensure the >>>>> compiler to not reorder the operation, while right now, in FreeBSD, they >>>>> just take care of the reordering from the architecture perspective. >>>>> The only way I'm aware of GCC offers that is to clobber memory. >>>>> I will provide a patch that address this soon, hoping that GCC will be >>>>> smart enough to not overhead too much the memory clobbering but just >>>>> try to understand what's our purpose and servers it (I will try to >>>>> compare code generated before and after the patch at least for tier-1 >>>>> architectures). >>>> Does GCC really reorder accesses to volatile objects? The C Standard seems to >>>> object: >>>> >>>> 5.1.2.3 - 2 >>>> Accessing a volatile object, modifying an object, modifying a file, or calling >>>> a function that does any of those operations are all side effects,11) which >>>> are changes in the state of the execution environment. Evaluation of an >>>> expression may produce side effects. At certain specified points in the >>>> execution sequence called sequence points, all side effects of previous >>>> evaluations shall be complete and no side effects of subsequent evaluations >>>> shall have taken place. (A summary of the sequence points is given in annex >>>> C.) >>> Very interesting. >>> I was thinking about the other operating systems which basically do >>> 'memory clobbering' for ensuring a compiler barrier, but actually they >>> often forsee such a barrier without the conjuction of a memory >>> operand. >>> >>> I think I will need to speak a bit with a GCC engineer in order to see >>> what do they implement in regard of volatile operands. >> GCC can be quite aggressive with reordering even in the face of volatile. I >> was recently doing a hack to export some data from the kernel to userland >> that used a spin loop to grab a snapshot of the contents of a structure >> similar to the method used in the kernel with the timehands structures. It >> used a volatile structure exposed from the kernel that looked something >> like: >> >> struct foo { >> volatile int gen; >> /* other stuff */ >> }; >> >> volatile struct foo *p; >> >> do { >> x = p->gen; >> /* read other stuff */ >> y = p->gen; >> } while (x != y && x != 0); >> >> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. >> I eventually had to add explicit "memory" clobbers to force GCC to not >> move the reads of 'gen' around but do them "around" all the other >> operations, so that the working code is: >> >> do { >> x = p->gen; >> asm volatile("" ::: "memory"); >> /* read other stuff */ >> asm volatile("" ::: "memory"); >> y = p->gen; >> } while (x != y && x != 0); > > The situation was not so desperate as I first thought. > Actually, only ia32 and amd64 seems affected by the missing of memory > clobbering because it is already done for all the other platform when > using all the memory barriers. > On ia32 and amd64 too, the impact is more limited than what I > expected. atomic_cmpset_* already clobbers the memory and only > atomic_store_rel_* is not adeguately protected among the atomics used > in our locking primitives, thus I would really expect a limited > performance penalty if any. > What was not really protected where the functions defined through > ATOMIC_ASM() and that was the larger part to fix. > > I spoke briefly about the compiler support with Christian Lattner > (@Apple, LLVM leader) and he mentioned he was aware of the aggressive > behaviour of GCC pointing me in the direction that what the C Standard > really mandates is that read/write operations with non-volatile > operands can be reordered across a volatile operand but that the > read/write of volatile operands are strong ordered in regard of > eachother. This however means that we have to fix the 'memory clobber' > for GCC-simil compilers and also offer a support for the other that > let them specify a memory barrier. > I then wrote this patch: > http://www.freebsd.org/~attilio/atomic.h.diff > > That should address all the concern raised and also forsee the > possibility for other compiler to specify memory barriers semantics > differently from normal atomic. > > rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems. > > I still didn't compare pickly the produced binaries, but I see a > little increase in the binary sizes probabilly caming from the .text > growing. This looks good to me. One thing that is missing is that the UP atomic load/store ops still need "memory" clobbers to prevent the compiler from reordering things (you could still have bad juju if you preempted in between the atomic op and the compiler-reordered operation on UP). -- John Baldwin _______________________________________________ freebsd-hackers@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@..." |
| Free embeddable forum powered by Nabble | Forum Help |