[PATCH] AMD Opteron Rev. E hack

View: New views
12 Messages — Rating Filter:   Alert me  

[PATCH] AMD Opteron Rev. E hack

by Giovanni Trematerra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
I have a quick and dirty patch to address the problem as discussed on
commit r198868 in svn-src-head@
I introduced BROKEN_OPTERON_E kernel option for i386/amd64 arch.
The patch isn't tested yet, I only successfully compiled on i386.
Can you let me know if the patch is on the right direction to resolve the issue?
style(9) tips are welcomed.

Thank you

--
Giovanni Trematerra


_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

opteronhack.diff (6K) Download Attachment

Re: [PATCH] AMD Opteron Rev. E hack

by Dag-Erling Smørgrav :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Giovanni Trematerra <giovanni.trematerra@...> writes:
> I have a quick and dirty patch to address the problem as discussed on
> commit r198868 in svn-src-head@
> I introduced BROKEN_OPTERON_E kernel option for i386/amd64 arch.
> The patch isn't tested yet, I only successfully compiled on i386.
> Can you let me know if the patch is on the right direction to resolve the issue?
> style(9) tips are welcomed.

We should probably call it OPTERON_E_LFENCE_HACK or something, cf.
NO_F00F_HACK (which has the opposite semantics - the f00f hack is
enabled by default, though at this point we should probably reverse it
and just print a warning if we detect a vulnerable CPU), and the wording
of those printfs needs work, but it looks OK overall.

BTW, you should use the text/x-patch MIME type for patches, instead of
application/octet-stream.

DES
--
Dag-Erling Smørgrav - des@...
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: [PATCH] AMD Opteron Rev. E hack

by Kostik Belousov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 12:02:39PM +0100, Giovanni Trematerra wrote:
> Hi,
> I have a quick and dirty patch to address the problem as discussed on
> commit r198868 in svn-src-head@
> I introduced BROKEN_OPTERON_E kernel option for i386/amd64 arch.
> The patch isn't tested yet, I only successfully compiled on i386.
> Can you let me know if the patch is on the right direction to resolve the issue?
> style(9) tips are welcomed.

I think there is no much sense in printing that hack in unused;
instead, you should print info when option is enabled and vulnerable
CPU is detected.

Aren't atomic_readandclear need the same workaround ?

It would be much easier to read the patch if you generated it
with analog of "svn diff -x -p" for your VCS.


attachment0 (203 bytes) Download Attachment

Re: [PATCH] AMD Opteron Rev. E hack

by Dag-Erling Smørgrav :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kostik Belousov <kostikbel@...> writes:
> I think there is no much sense in printing that hack in unused;
> instead, you should print info when option is enabled and vulnerable
> CPU is detected.

We should *definitely* print a warninhg when a vulnerable CPU is
detected and the option is *not* enabled.  How do you justify not
telling the user that you know the machine will crash as soon as he runs
'make buildworld' with a high -j value?

DES
--
Dag-Erling Smørgrav - des@...
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: [PATCH] AMD Opteron Rev. E hack

by Kostik Belousov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 12:52:00PM +0100, Dag-Erling Sm??rgrav wrote:
> Kostik Belousov <kostikbel@...> writes:
> > I think there is no much sense in printing that hack in unused;
> > instead, you should print info when option is enabled and vulnerable
> > CPU is detected.
>
> We should *definitely* print a warninhg when a vulnerable CPU is
> detected and the option is *not* enabled.  How do you justify not
> telling the user that you know the machine will crash as soon as he runs
> 'make buildworld' with a high -j value?

We do not do this for other cpu bugs workarounds, why this should be
different. Besides, there were no confirmed reports of this happening
in field (I mean the bug manifestation, not make -j panicing or hanging
machine :).


attachment0 (203 bytes) Download Attachment

Re: [PATCH] AMD Opteron Rev. E hack

by Giovanni Trematerra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 5, 2009 at 12:28 PM, Kostik Belousov <kostikbel@...> wrote:

>
> Aren't atomic_readandclear need the same workaround ?
>

I understood that the bug manifests itself only when lock instruction is used.
atomic_readandclear doesn't use lock.
I think i386/linux/linux_support.s and amd64/linux32/linux32_support.s
need the work-around too.


> It would be much easier to read the patch if you generated it
> with analog of "svn diff -x -p" for your VCS.
>

I'm sorry, I'll do my best next time.

--
Giovanni Trematerra
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: [PATCH] AMD Opteron Rev. E hack

by Maxim Dounin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello!

On Thu, Nov 05, 2009 at 02:00:34PM +0200, Kostik Belousov wrote:

> On Thu, Nov 05, 2009 at 12:52:00PM +0100, Dag-Erling Sm??rgrav wrote:
> > Kostik Belousov <kostikbel@...> writes:
> > > I think there is no much sense in printing that hack in unused;
> > > instead, you should print info when option is enabled and vulnerable
> > > CPU is detected.
> >
> > We should *definitely* print a warninhg when a vulnerable CPU is
> > detected and the option is *not* enabled.  How do you justify not
> > telling the user that you know the machine will crash as soon as he runs
> > 'make buildworld' with a high -j value?
>
> We do not do this for other cpu bugs workarounds, why this should be
> different.

Well, probably is't a good idea to do so?  Something like NetBSD's
sys/arch/x86/x86/errata.c seems to be right way to go.

> Besides, there were no confirmed reports of this happening
> in field (I mean the bug manifestation, not make -j panicing or hanging
> machine :).

http://bugs.mysql.com/bug.php?id=26081

Maxim Dounin
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: [PATCH] AMD Opteron Rev. E hack

by Kostik Belousov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 01:24:53PM +0100, Giovanni Trematerra wrote:
> On Thu, Nov 5, 2009 at 12:28 PM, Kostik Belousov <kostikbel@...> wrote:
>
> >
> > Aren't atomic_readandclear need the same workaround ?
> >
>
> I understood that the bug manifests itself only when lock instruction is used.
> atomic_readandclear doesn't use lock.
xchgl uses lock implicitely:

If a memory operand is referenced, the processor's locking protocol is
automatically implemented for the duration of the exchange operation,
regardless of the presence or absence of the LOCK prefix or of the value
of the IOPL.

> I think i386/linux/linux_support.s and amd64/linux32/linux32_support.s
> need the work-around too.
What about casuword32 ?
It is actually unclear from the bug description whether we need it there.
Description states "a locked instruction doesn't act as a read-acquire
barrier if followed by a non-locked read-modify-write instruction."

casuword32 for amd64, for instance, does movq immediately after
cmpxchgl, that is a store op, not read-modify-store. So, does it need a
workaround ? Similar stores are performed in linux32_support.s.

atomic.h functions definitely need your workaround since they are inlined.

Also, I remember that bind(8) provides its own atomic implementation.



attachment0 (203 bytes) Download Attachment

Re: [PATCH] AMD Opteron Rev. E hack

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/11/5 Giovanni Trematerra <giovanni.trematerra@...>:
> Hi,
> I have a quick and dirty patch to address the problem as discussed on
> commit r198868 in svn-src-head@
> I introduced BROKEN_OPTERON_E kernel option for i386/amd64 arch.
> The patch isn't tested yet, I only successfully compiled on i386.
> Can you let me know if the patch is on the right direction to resolve the issue?
> style(9) tips are welcomed.

> diff -r 75d35d8e7fe1 sys/amd64/amd64/identcpu.c
> --- a/sys/amd64/amd64/identcpu.c        Thu Nov 05 11:18:35 2009 +0100
> +++ b/sys/amd64/amd64/identcpu.c        Thu Nov 05 12:42:35 2009 +0100
> @@ -404,6 +404,10 @@
>
>         if (cpu_vendor_id == CPU_VENDOR_AMD)
>                 print_AMD_info();
> +#if defined(BROKEN_OPTERON_E)
> +       else
> +               printf("BROKEN_OPTERON_E option in your kernel is useless with y
> our CPU\n");
> +#endif
> }
>
>  void
> @@ -620,10 +624,17 @@
>          */
>         if (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x20 &&
>             CPUID_TO_MODEL(cpu_id) <= 0x3f) {
> +#if !defined(BROKEN_OPTERON_E)
>                 printf("WARNING: This architecture revision has known SMP "
>                     "hardware bugs which may cause random instability\n");
> -               printf("WARNING: For details see: "
> -                   "http://bugzilla.kernel.org/show_bug.cgi?id=11305\n");
> +#else
> +               printf("WARNING: options BROKEN_OPTERON_E is in your kernel. "
> +                               "Expect performance penalties\n");
> +       else
> +
> +               printf("WARNING: options BROKEN_OPTERON_E is useless with your C
> PU."
> +                               "Expect performance penalties\n");
> +#endif
>         }
>  }

I would leave the whole logic within print_AMD_info() and not pollute
external code and I would not print a string if the fix is in.

> diff -r 75d35d8e7fe1 sys/amd64/include/atomic.h
> --- a/sys/amd64/include/atomic.h        Thu Nov 05 11:18:35 2009 +0100
> +++ b/sys/amd64/include/atomic.h        Thu Nov 05 12:42:35 2009 +0100
> @@ -36,6 +36,14 @@
>  #define        wmb()   __asm __volatile("sfence;" : : : "memory")
>  #define        rmb()   __asm __volatile("lfence;" : : : "memory")
>
> +#include "opt_cpu.h"
> +
> +#if defined(BROKEN_OPTERON_E) && (defined(SMP) || !defined(_KERNEL))
> +       #define OPTERON_E_HACK() rmb()
> +#else
> +       #define OPTERON_E_HACK()
> +#endif
> +
> /*
>  * Various simple operations on memory, each of which is atomic in the
>  * presence of interrupts and multiple processors.
> @@ -147,6 +155,8 @@
>          "m" (*dst)                    /* 4 */
>         : "memory");
>
> +       OPTERON_E_HACK();
> +
>         return (res);
>  }

You need to override the whole barrier IMHO and not add this new stub.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: [PATCH] AMD Opteron Rev. E hack

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/11/5 Kostik Belousov <kostikbel@...>:

> On Thu, Nov 05, 2009 at 01:24:53PM +0100, Giovanni Trematerra wrote:
>> On Thu, Nov 5, 2009 at 12:28 PM, Kostik Belousov <kostikbel@...> wrote:
>>
>> >
>> > Aren't atomic_readandclear need the same workaround ?
>> >
>>
>> I understood that the bug manifests itself only when lock instruction is used.
>> atomic_readandclear doesn't use lock.
> xchgl uses lock implicitely:
>
> If a memory operand is referenced, the processor's locking protocol is
> automatically implemented for the duration of the exchange operation,
> regardless of the presence or absence of the LOCK prefix or of the value
> of the IOPL.
>
>> I think i386/linux/linux_support.s and amd64/linux32/linux32_support.s
>> need the work-around too.
> What about casuword32 ?
> It is actually unclear from the bug description whether we need it there.
> Description states "a locked instruction doesn't act as a read-acquire
> barrier if followed by a non-locked read-modify-write instruction."
>
> casuword32 for amd64, for instance, does movq immediately after
> cmpxchgl, that is a store op, not read-modify-store. So, does it need a
> workaround ? Similar stores are performed in linux32_support.s.
>
> atomic.h functions definitely need your workaround since they are inlined.
>
> Also, I remember that bind(8) provides its own atomic implementation.

However, that fix is not simple to got as one would think.
Our atomics implementation on ia32 and amd64 assume rel/acq barriers
aims for the same code but this problem introduces asymmetric
behaviour between them.

The good way to fix this is to offer an lfence on acq barriers and
leave the other cases untouched, but this could be meaning
re-structure out atomic.h a lot. I will try to think to a good way to
do that when I have more available time.
In the while the WARNING msg will do its job (I can still trimm the
URL path however).

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: [PATCH] AMD Opteron Rev. E hack

by Kostik Belousov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 03:10:38PM +0100, Attilio Rao wrote:
> The good way to fix this is to offer an lfence on acq barriers and
> leave the other cases untouched, but this could be meaning
> re-structure out atomic.h a lot. I will try to think to a good way to
> do that when I have more available time.
> In the while the WARNING msg will do its job (I can still trimm the
> URL path however).

For 8.0, please strip URL at least.


attachment0 (203 bytes) Download Attachment

Re: [PATCH] AMD Opteron Rev. E hack

by Giovanni Trematerra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 5, 2009 at 3:10 PM, Attilio Rao <attilio@...> wrote:

>
> The good way to fix this is to offer an lfence on acq barriers and
> leave the other cases untouched, but this could be meaning
> re-structure out atomic.h a lot. I will try to think to a good way to
> do that when I have more available time.
> In the while the WARNING msg will do its job (I can still trimm the
> URL path however).
>

Well, That's not so easy to fix as I thought.
Thanks Attilio to point me out.

--
Gianni
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."