8.0RC2 amd64 - kernel panic running make buildworld

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 | Next >

8.0RC2 amd64 - kernel panic running make buildworld

by Kai Gallasch :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.

I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.

When I try to do a make buildworld or make buildkernel the server
reboots without any message left in the logs. The same happens
when building bigger ports (for example ruby18 or perl58)

With 8.0-RC2 debug flags and witness seem to be disabled in the
standard GENERIC kernel, so unfortunately it is not possible for me to
build a debug kernel without my server crashing..

Now my idea was to install the old 8.0-BETA4 and upgrade to RC2 through
makeworld + buildkernel (gdb+witness). But no luck. When trying to
upgrade to RC2 the 8.0-BETA4 also crashes. At least 8.0-BETA4 has debug
+ witness active in the GENERIC kernel..

So below some debug output of 8.0-BETA4 crashing. Has a vfs/ffs LOR
problem with the BETA4 already been fixed?

Does it make sense to send in a pr with the old 8.0-BETA4?

BTW. I installed 7.2-STABLE on this same server and did a "make
buildworld" and "make buildkernel" which completed without any problem.

Cheers,
--Kai


----- make buildworld -j7 crash, freebsd 8.0-amd64-beta4 -----

lock order reversal:
 1st 0xffffff00073d5ba8 ufs (ufs)
@ /usr/src/sys/ufs/ffs/ffs_snapshot.c:423 2nd 0xffffff819d921558
bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2559 3rd
0xffffff00070c19d0 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:544
KDB: stack backtrace: db_trace_self_wrapper() at
db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
__lockmgr_args() at __lockmgr_args+0xcf3
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
ffs_snapshot() at ffs_snapshot+0x1b9d
ffs_mount() at ffs_mount+0x666
vfs_donmount() at vfs_donmount+0xcde
nmount() at nmount+0x63
syscall() at syscall+0x1af
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x8007b14fc, rsp =
0x7fffffffe9b8, rbp = 0x800902530 --- lock order reversal:
 1st 0xffffff819d921558 bufwait (bufwait)
@ /usr/src/sys/kern/vfs_bio.c:2559 2nd 0xffffff0007d9fa30 snaplk
(snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:793 KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
__lockmgr_args() at __lockmgr_args+0xcf3
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
ffs_snapshot() at ffs_snapshot+0x1a6a
ffs_mount() at ffs_mount+0x666
vfs_donmount() at vfs_donmount+0xcde
nmount() at nmount+0x63
syscall() at syscall+0x1af
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x8007b14fc, rsp =
0x7fffffffe9b8, rbp = 0x800902530 --- lock order reversal:
 1st 0xffffff0007d9fa30 snaplk (snaplk)
@ /usr/src/sys/kern/vfs_vnops.c:296 2nd 0xffffff00073d5ba8 ufs (ufs)
@ /usr/src/sys/ufs/ffs/ffs_snapshot.c:1587 KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
__lockmgr_args() at __lockmgr_args+0xcf3
ffs_snapremove() at ffs_snapremove+0xe7
softdep_releasefile() at softdep_releasefile+0x139
ufs_inactive() at ufs_inactive+0x1a5
vinactive() at vinactive+0x72
vput() at vput+0x230
vn_close() at vn_close+0x118
vn_closefile() at vn_closefile+0x5a
_fdrop() at _fdrop+0x23
closef() at closef+0x5b
kern_close() at kern_close+0x110
syscall() at syscall+0x1af
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (6, FreeBSD ELF64, close), rip = 0x80084cf9c, rsp =
0x7fffffffe9b8, rbp = 0 ---
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Gavin Atkinson-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 2009-10-31 at 23:15 +0100, Kai Gallasch wrote:

> Hi.
>
> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>
> When I try to do a make buildworld or make buildkernel the server
> reboots without any message left in the logs. The same happens
> when building bigger ports (for example ruby18 or perl58)
>
> With 8.0-RC2 debug flags and witness seem to be disabled in the
> standard GENERIC kernel, so unfortunately it is not possible for me to
> build a debug kernel without my server crashing..

First place I think I'd start id by running memtest86 on the machine
overnight.  This sounds like possible hardware issue to me, it would be
good to see if we can confirm that that is the case.

> Now my idea was to install the old 8.0-BETA4 and upgrade to RC2 through
> makeworld + buildkernel (gdb+witness). But no luck. When trying to
> upgrade to RC2 the 8.0-BETA4 also crashes. At least 8.0-BETA4 has debug
> + witness active in the GENERIC kernel..
>
> So below some debug output of 8.0-BETA4 crashing. Has a vfs/ffs LOR
> problem with the BETA4 already been fixed?

The debug output you included were just lock order reversals, and don't
seem to be related to your crash.

I think 8.0-BETA4 still had the debugger compiled in (you can test by
pressing ctrl-alt-escape ion the console, if you do drop to the
debugger, give the "c" command to continue).

If the debugger is compiled in, then the spontaneous reboot without
dropping to the debugger suggests even more that it may be hardware
related.  If you do get to the debugger, a copy of all of the messages
on screen and the output of the "bt" command would be very useful.  When
you do your kernel recompile, please include full debugging, including
WITNESS, INVARIANTS, KDB, DDB etc.

FWIW, don't worry about building world now, a BETA4 world should work
fine with a RC2 kernel.  You may be able to get a kernel built even
though it keeps crashing by clearing out /usr/obj to start with and then
just repeating
cd /usr/src && make buildkernel -DKERNFAST
after every crash.

> Does it make sense to send in a pr with the old 8.0-BETA4?

It depends what the bug is to be honest.  So far there isn't really
enough information to determine the cause, and therefore there isn't
really enough info for a PR.

Gavin
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Mark Atkinson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kai Gallasch wrote:

> Hi.
>
> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>
> When I try to do a make buildworld or make buildkernel the server
> reboots without any message left in the logs. The same happens
> when building bigger ports (for example ruby18 or perl58)
>
> With 8.0-RC2 debug flags and witness seem to be disabled in the
> standard GENERIC kernel, so unfortunately it is not possible for me to
> build a debug kernel without my server crashing..
>
> Now my idea was to install the old 8.0-BETA4 and upgrade to RC2 through
> makeworld + buildkernel (gdb+witness). But no luck. When trying to
> upgrade to RC2 the 8.0-BETA4 also crashes. At least 8.0-BETA4 has debug
> + witness active in the GENERIC kernel..
>
> So below some debug output of 8.0-BETA4 crashing. Has a vfs/ffs LOR
> problem with the BETA4 already been fixed?
>
> Does it make sense to send in a pr with the old 8.0-BETA4?
>
> BTW. I installed 7.2-STABLE on this same server and did a "make
> buildworld" and "make buildkernel" which completed without any problem.
>
> Cheers,
> --Kai
>
>
> ----- make buildworld -j7 crash, freebsd 8.0-amd64-beta4 -----

Definitely try the usual memory testing, power supply testing, etc.

I had a similar problem, but with a HP DL385G5 that has some sort of
"memory issue," and it would just silently reboot (which turned out to
be a machine check exception.)  I could never finger the problem be it
with bios, the actual memory, or the fact that there's only one 4 core
cpu on a two socket board and only the associated memory bank filled.

I did various memory swaps to no avail, it would run memtest86 all day
with no errors, and in the end I just turned superpages off and it works
.  Like a champ.

If vm.pmap.pg_ps_enabled is 1 in 8.0-rc2, you might try rebooting
with

vm.pmap.pg_ps_enabled="0"

in /boot/loader.conf and try another buildworld.

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Kai Gallasch :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Tue, 03 Nov 2009 10:42:40 +0000
schrieb Gavin Atkinson <gavin@...>:

> On Sat, 2009-10-31 at 23:15 +0100, Kai Gallasch wrote:
> > Hi.
> >
> > I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
> >
> > When I try to do a make buildworld or make buildkernel the server
> > reboots without any message left in the logs. The same happens
> > when building bigger ports (for example ruby18 or perl58)

> First place I think I'd start id by running memtest86 on the machine
> overnight.  This sounds like possible hardware issue to me, it would
> be good to see if we can confirm that that is the case.

I will do so tomorrow. Following actions I have already taken to rule
out a hardware problem:

- ran several passes with diagnostic software from the manufacturer
- reset BIOS settings to default
- upgraded BIOS to newest release
- booted server from 2 year old backup BIOS
- took out the only pair of RAM modules that was different from the
  rest of the modules
- installed freebsd 7.2-STABLE on the server to repeat the kernel
  panic (no panic with 7.2)
- installed 8.0-BETA4 (crash)

Besides: The server was in production with 7.2 for some time, without
showing any such problems.

> > Now my idea was to install the old 8.0-BETA4 and upgrade to RC2
> > through makeworld + buildkernel (gdb+witness). But no luck. When
> > trying to upgrade to RC2 the 8.0-BETA4 also crashes. At least
> > 8.0-BETA4 has debug
> > + witness active in the GENERIC kernel..
> >
> > So below some debug output of 8.0-BETA4 crashing. Has a vfs/ffs LOR
> > problem with the BETA4 already been fixed?
>
> The debug output you included were just lock order reversals, and
> don't seem to be related to your crash.

Sorry for causing possible confusion about this. I realized this after
my mail was already out.

> I think 8.0-BETA4 still had the debugger compiled in (you can test by
> pressing ctrl-alt-escape ion the console, if you do drop to the
> debugger, give the "c" command to continue).
>
> If the debugger is compiled in, then the spontaneous reboot without
> dropping to the debugger suggests even more that it may be hardware
> related.  If you do get to the debugger, a copy of all of the messages
> on screen and the output of the "bt" command would be very useful.
> When you do your kernel recompile, please include full debugging,
> including WITNESS, INVARIANTS, KDB, DDB etc.

In the meantime I managed it to install a RELENG_8 world +  GENERIC
kernel with all debug options enabled on the crashing server. (mounted
/usr/src and /usr/obj on another server running 8.0RC1 through NFS and
did buildworld + buildkernel over there..)

So now I have a debug kernel available with dumpev + dumpdir defined.

Here are my latest findings on this issue:

- Running a makeworld in about 80% leads to a server crash without
  the server writing a crashdump to dumpdir. The server just reboots..
- In about 20% of the cases makeworld gets stuck in a not terminating
  process that eats up 100% cpu. This process cannot be killed. When
  restarting makeworld the server then reboots again
- It makes no difference doing makeworld -j1 or -j8, result is the same
 
> It depends what the bug is to be honest.  So far there isn't really
> enough information to determine the cause, and therefore there isn't
> really enough info for a PR.

Mark Atkinson also commented on my mail and he gave the
hint: "If vm.pmap.pg_ps_enabled is 1 in 8.0-rc2, you might try
rebooting with c in /boot/loader.conf and try
another buildworld."

So I thought why not and just tried it - and surprise:

Disabling vm.pmap.pg_ps_enabled=1 in loader.conf resolves my problem
with 8.0RC2 crashing when doing a makeworld..

After successful buildworld and buildkernel I rebooted the server
again with commented out vm.pmap.pg_ps_enabled=0 and the problem
was there again. And then I disabled the option again in loader.conf,
rebooted + make buildworld .. no problem.

Seems to be deterministic. With vm.pmap.pg_ps_enabled=1 the server
crashes without being able to write crashdumps to dumpdev. (at least on
this specific Proliant DL385G2 server)

--Kai.


--
You need more time; and you probably always will.

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Parent Message unknown Re: 8.0RC2 amd64 - kernel panic running make buildworld

by S.N.Grigoriev-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi list,

I can confirm I've seen the same problem. After upgrading from 7-stable
to 8.0-RC2 my machine just reboots during 'make buildworld' without
diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
work for me. My machine reboots every time I try to build world.
I don't think I have a hardware problem: under 7-stable I can build
world/kernel for both 7-stable and 8.0-RC2 without problems.

--
Regards,
S.Grigoriev.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Dag-Erling Smørgrav :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kai Gallasch <gallasch@...> writes:
> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>
> When I try to do a make buildworld or make buildkernel the server
> reboots without any message left in the logs. The same happens
> when building bigger ports (for example ruby18 or perl58)

Could it be related to this?  What's your CPUID?

Author: attilio
Date: Wed Nov  4 01:32:59 2009
New Revision: 198868
URL: http://svn.freebsd.org/changeset/base/198868

Log:
  Opteron rev E family of processor expose a bug where, in very rare
  ocassions, memory barriers semantic is not honoured by the hardware
  itself. As a result, some random breakage can happen in uninvestigable
  ways (for further explanation see at the content of the commit itself).
 
  As long as just a specific familly is bugged of an entire architecture
  is broken, a complete fix-up is impratical without harming to some
  extents the other correct cases.
  Considering that (and considering the frequency of the bug exposure)
  just print out a warning message if the affected machine is identified.
 
  Pointed out by: Samy Al Bahra <sbahra at repnop dot org>
  Help on wordings by: jeff
  MFC: 3 days

Modified:
  head/sys/amd64/amd64/identcpu.c
  head/sys/i386/i386/identcpu.c

DES
--
Dag-Erling Smørgrav - des@...
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Kai Gallasch :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Wed, 04 Nov 2009 16:24:01 +0100
schrieb Dag-Erling Smørgrav <des@...>:

> Kai Gallasch <gallasch@...> writes:
> > I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
> >
> > When I try to do a make buildworld or make buildkernel the server
> > reboots without any message left in the logs. The same happens
> > when building bigger ports (for example ruby18 or perl58)
>
> Could it be related to this?  What's your CPUID?

Found this in dmesg. Is this the CPUID? "Id = 0x100f23"

--Kai.


CPU: Quad-Core AMD Opteron(tm) Processor 2352 (2100.09-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f23  Stepping = 3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD
Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
AMD
Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
TSC: P-state invariant real memory  = 21474836480 (20480 MB) avail
memory = 20701110272 (19742 MB) ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7


--
If it wasn't for the last minute, nothing would get done.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Mark Atkinson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kai Gallasch wrote:

> Am Wed, 04 Nov 2009 16:24:01 +0100
> schrieb Dag-Erling Smørgrav <des@...>:
>
>> Kai Gallasch <gallasch@...> writes:
>>> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>>>
>>> When I try to do a make buildworld or make buildkernel the server
>>> reboots without any message left in the logs. The same happens
>>> when building bigger ports (for example ruby18 or perl58)
>> Could it be related to this?  What's your CPUID?
>
> Found this in dmesg. Is this the CPUID? "Id = 0x100f23"

That's generation 16 model 2 stepping 3.   This errata only effects
generation 0xe or 15.   BTW, I have the same processor/stepping/Mhz in
my system, but only a single physical processor.


> --Kai.
>
>
> CPU: Quad-Core AMD Opteron(tm) Processor 2352 (2100.09-MHz K8-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x100f23  Stepping = 3
>   Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD
> Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
> AMD
> Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
> TSC: P-state invariant real memory  = 21474836480 (20480 MB) avail
> memory = 20701110272 (19742 MB) ACPI APIC Table: <HP     ProLiant>
> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> FreeBSD/SMP: 2 package(s) x 4 core(s)
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
>  cpu2 (AP): APIC ID:  2
>  cpu3 (AP): APIC ID:  3
>  cpu4 (AP): APIC ID:  4
>  cpu5 (AP): APIC ID:  5
>  cpu6 (AP): APIC ID:  6
>  cpu7 (AP): APIC ID:  7
>
>

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Mark Atkinson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kai Gallasch wrote:

> Am Wed, 04 Nov 2009 16:24:01 +0100
> schrieb Dag-Erling Smørgrav <des@...>:
>
>> Kai Gallasch <gallasch@...> writes:
>>> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>>>
>>> When I try to do a make buildworld or make buildkernel the server
>>> reboots without any message left in the logs. The same happens
>>> when building bigger ports (for example ruby18 or perl58)
>> Could it be related to this?  What's your CPUID?
>

> Found this in dmesg. Is this the CPUID? "Id = 0x100f23"

That's generation 16 (0xf) model 2, stepping 3.   This errata apparently
only effects gen 15 (0xe) and some pre-release -- never released to
public (0xf).   I have the same processor in my system btw.

>
> --Kai.
>
>
> CPU: Quad-Core AMD Opteron(tm) Processor 2352 (2100.09-MHz K8-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x100f23  Stepping = 3
>   Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x802009<SSE3,MON,CX16,POPCNT>
>   AMD
> Features=0xee400800<SYSCALL,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
> AMD
> Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
> TSC: P-state invariant real memory  = 21474836480 (20480 MB) avail
> memory = 20701110272 (19742 MB) ACPI APIC Table: <HP     ProLiant>
> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> FreeBSD/SMP: 2 package(s) x 4 core(s)
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
>  cpu2 (AP): APIC ID:  2
>  cpu3 (AP): APIC ID:  3
>  cpu4 (AP): APIC ID:  4
>  cpu5 (AP): APIC ID:  5
>  cpu6 (AP): APIC ID:  6
>  cpu7 (AP): APIC ID:  7
>
>

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Mark Atkinson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Atkinson wrote:

> Kai Gallasch wrote:
>> Am Wed, 04 Nov 2009 16:24:01 +0100
>> schrieb Dag-Erling Smørgrav <des@...>:
>>
>>> Kai Gallasch <gallasch@...> writes:
>>>> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>>>>
>>>> When I try to do a make buildworld or make buildkernel the server
>>>> reboots without any message left in the logs. The same happens
>>>> when building bigger ports (for example ruby18 or perl58)
>>> Could it be related to this?  What's your CPUID?
>
>> Found this in dmesg. Is this the CPUID? "Id = 0x100f23"
>
> That's generation 16 (0xf) model 2, stepping 3.   This errata apparently
> only effects gen 15 (0xe) and some pre-release -- never released to
> public (0xf).   I have the same processor in my system btw.

sorry for the double wrong posting.  I see several webpages refer to 15
as f and 16 as f.  usr/ports/misc/cpuid refers to it as 15.

The pages referenced via the bugzilla entry in the commit refer to it as
 0xf but between 32 and 63.   Does the model 2 correctly put us in the
range in the commit 0x20 and 0x3f? (i.e. stepping is included?)




_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Alexandre "Sunny" Kovalenko-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 2009-11-04 at 13:44 +0300, S.N.Grigoriev wrote:
> Hi list,
>
> I can confirm I've seen the same problem. After upgrading from 7-stable
> to 8.0-RC2 my machine just reboots during 'make buildworld' without
> diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
> work for me. My machine reboots every time I try to build world.
> I don't think I have a hardware problem: under 7-stable I can build
> world/kernel for both 7-stable and 8.0-RC2 without problems.
>
Is it by any chance possible that you have 'debug.debugger_on_panic' set
to '0' and no valid dump device configured?

--
Alexandre Kovalenko (Олександр Коваленко)


_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Mark Atkinson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Atkinson wrote:

> Mark Atkinson wrote:
>> Kai Gallasch wrote:
>>> Am Wed, 04 Nov 2009 16:24:01 +0100
>>> schrieb Dag-Erling Smørgrav <des@...>:
>>>
>>>> Kai Gallasch <gallasch@...> writes:
>>>>> I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
>>>>>
>>>>> When I try to do a make buildworld or make buildkernel the server
>>>>> reboots without any message left in the logs. The same happens
>>>>> when building bigger ports (for example ruby18 or perl58)
>>>> Could it be related to this?  What's your CPUID?
>>> Found this in dmesg. Is this the CPUID? "Id = 0x100f23"
>> That's generation 16 (0xf) model 2, stepping 3.   This errata apparently
>> only effects gen 15 (0xe) and some pre-release -- never released to
>> public (0xf).   I have the same processor in my system btw.
>
> sorry for the double wrong posting.  I see several webpages refer to 15
> as f and 16 as f.  usr/ports/misc/cpuid refers to it as 15.
>
> The pages referenced via the bugzilla entry in the commit refer to it as
>  0xf but between 32 and 63.   Does the model 2 correctly put us in the
> range in the commit 0x20 and 0x3f? (i.e. stepping is included?)

I'll answer my own question, no:

http://support.amd.com/us/Processor_TechDocs/25481.pdf

Although the some of the posts in

http://bugzilla.kernel.org/show_bug.cgi?id=11305

indicate any model < 0x40.  Someone must have actually narrowed the range.



_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Adrian Chadd-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/11/5 Mark Atkinson <atkin901@...>:

>
> I'll answer my own question, no:
>
> http://support.amd.com/us/Processor_TechDocs/25481.pdf
>
> Although the some of the posts in
>
> http://bugzilla.kernel.org/show_bug.cgi?id=11305
>
> indicate any model < 0x40.  Someone must have actually narrowed the range.

Is there a FreeBSD PR or errata URL which can be linked to instead,
complete with copies of the above in it?


Adrian
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Kai Gallasch :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Am Tue, 03 Nov 2009 10:42:40 +0000
schrieb Gavin Atkinson <gavin@...>:

> On Sat, 2009-10-31 at 23:15 +0100, Kai Gallasch wrote:
> > Hi.
> >
> > I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago.
> >
> > When I try to do a make buildworld or make buildkernel the server
> > reboots without any message left in the logs. The same happens
> > when building bigger ports (for example ruby18 or perl58)
> >
> > With 8.0-RC2 debug flags and witness seem to be disabled in the
> > standard GENERIC kernel, so unfortunately it is not possible for me
> > to build a debug kernel without my server crashing..
>
> First place I think I'd start id by running memtest86 on the machine
> overnight.  This sounds like possible hardware issue to me, it would
> be good to see if we can confirm that that is the case.

Gavin.

memtest86 ran for 18 hours and showed no problem with RAM.

--Kai.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Parent Message unknown Re: 8.0RC2 amd64 - kernel panic running make buildworld

by S.N.Grigoriev-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


04.11.09, 16:51, "Alexandre \"Sunny\" Kovalenko" <gaijin.k@...>
wrote:

> On Wed, 2009-11-04 at 13:44 +0300, S.N.Grigoriev wrote:
> > Hi list,
> >
> > I can confirm I've seen the same problem. After upgrading from 7-stable
> > to 8.0-RC2 my machine just reboots during 'make buildworld' without
> > diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
> > work for me. My machine reboots every time I try to build world.
> > I don't think I have a hardware problem: under 7-stable I can build
> > world/kernel for both 7-stable and 8.0-RC2 without problems.
> >
> Is it by any chance possible that you have 'debug.debugger_on_panic' set
> to '0' and no valid dump device configured?

Hi Alexandre,

I've not found 'debug.debugger_on_panic' variable in 'sysctl -a'
output. Where cat I find it? All my sysctl variables are set by
default.
--
Regards,
S.Grigoriev.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Mark Atkinson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Adrian Chadd wrote:

> 2009/11/5 Mark Atkinson <atkin901@...>:
>
>> I'll answer my own question, no:
>>
>> http://support.amd.com/us/Processor_TechDocs/25481.pdf
>>
>> Although the some of the posts in
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=11305
>>
>> indicate any model < 0x40.  Someone must have actually narrowed the range.
>
> Is there a FreeBSD PR or errata URL which can be linked to instead,
> complete with copies of the above in it?

If you read the mysql related blog post on it:

http://timetobleed.com/mysql-doesnt-always-suck-this-time-its-amd/

Someone in the comments suggests this is AMD errata 147 and quotes the
text.   I'll include a copy of the comment here below for the mail
archives (and since urls tend to disappear).

#
silverjam
The kernel bug:

http://bugzilla.kernel.org/show_bug.cgi?id=11305

Which references an AMD "errata 147" from "Revision Guide for AMD
Athlon™ 64 and AMD Opteron™ Processors."

http://support.amd.com/us/Processor_TechDocs/25759.pdf

Which says:
"""
Potential Violation of Read Ordering Rules Between Semaphore Operations
and Unlocked Read-Modify-Write Instructions

Description

Under a highly specific set of internal timing circumstances, the memory
read ordering between a
semaphore operation and a subsequent read-modify-write instruction (an
instruction that uses the
same memory location as both a source and destination) may be incorrect
and allow the read-modifywrite
instruction to operate on the memory location ahead of the completion of
the semaphore
operation. The erratum will not occur if there is a LOCK prefix on the
read-modify-write instruction.
This erratum does not apply if the read-only value in MSRC001_1023h[33]
is 1b.

Potential Effect on System

In the unlikely event that the condition described above occurs, the
read-modify-write instruction (in
the critical section) may operate on data that existed prior to the
semaphore operation. This erratum
can only occur in multiprocessor or multicore configurations.

Suggested Workaround

To provide a workaround for this unlikely event, software can perform
any of the following actions
for multiprocessor or multicore systems:
• Place a LFENCE instruction between the semaphore operation and any
subsequent read-modifywrite
instruction(s) in the critical section.
• Use a LOCK prefix with the read-modify-write instruction.
• Decompose the read-modify-write instruction into separate instructions.

No workaround is necessary if software checks that MSRC001_1023h[33] is
set on all processors that
may execute the code. The value in MSRC001_1023h[33] may not be the same
on all processors in a
multi-processor system.

Fix Planned: Yes
"""

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Gary Jennejohn-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 05 Nov 2009 19:40:03 +0300
S.N.Grigoriev <serguey-grigoriev@...> wrote:

>
> 04.11.09, 16:51, "Alexandre \"Sunny\" Kovalenko" <gaijin.k@...>
> wrote:
>
> > On Wed, 2009-11-04 at 13:44 +0300, S.N.Grigoriev wrote:
> > > Hi list,
> > >
> > > I can confirm I've seen the same problem. After upgrading from 7-stable
> > > to 8.0-RC2 my machine just reboots during 'make buildworld' without
> > > diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
> > > work for me. My machine reboots every time I try to build world.
> > > I don't think I have a hardware problem: under 7-stable I can build
> > > world/kernel for both 7-stable and 8.0-RC2 without problems.
> > >
> > Is it by any chance possible that you have 'debug.debugger_on_panic' set
> > to '0' and no valid dump device configured?
>
> Hi Alexandre,
>
> I've not found 'debug.debugger_on_panic' variable in 'sysctl -a'
> output. Where cat I find it? All my sysctl variables are set by
> default.

Do you have "options DDB" in your kernel config file?

---
Gary Jennejohn
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by S.N.Grigoriev-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



05.11.09, 18:49, "Gary Jennejohn" <gary.jennejohn@...>:

> On Thu, 05 Nov 2009 19:40:03 +0300
> S.N.Grigoriev  wrote:
> >
> > 04.11.09, 16:51, "Alexandre \"Sunny\" Kovalenko"
> > wrote:
> >
> > > On Wed, 2009-11-04 at 13:44 +0300, S.N.Grigoriev wrote:
> > > > Hi list,
> > > >
> > > > I can confirm I've seen the same problem. After upgrading from 7-stable
> > > > to 8.0-RC2 my machine just reboots during 'make buildworld' without
> > > > diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
> > > > work for me. My machine reboots every time I try to build world.
> > > > I don't think I have a hardware problem: under 7-stable I can build
> > > > world/kernel for both 7-stable and 8.0-RC2 without problems.
> > > >
> > > Is it by any chance possible that you have 'debug.debugger_on_panic' set
> > > to '0' and no valid dump device configured?
> >
> > Hi Alexandre,
> >
> > I've not found 'debug.debugger_on_panic' variable in 'sysctl -a'
> > output. Where cat I find it? All my sysctl variables are set by
> > default.
> Do you have "options DDB" in your kernel config file?
> ---
> Gary Jennejohn

Hi Gary,

my current kernel is GENERIC, so I don't have "options DDB".
--
Regards,
S.Grigoriev.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by Etienne Robillard-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

S.N.Grigoriev wrote:

>
> 05.11.09, 18:49, "Gary Jennejohn" <gary.jennejohn@...>:
>
>> On Thu, 05 Nov 2009 19:40:03 +0300
>> S.N.Grigoriev  wrote:
>>> 04.11.09, 16:51, "Alexandre \"Sunny\" Kovalenko"
>>> wrote:
>>>
>>>> On Wed, 2009-11-04 at 13:44 +0300, S.N.Grigoriev wrote:
>>>>> Hi list,
>>>>>
>>>>> I can confirm I've seen the same problem. After upgrading from 7-stable
>>>>> to 8.0-RC2 my machine just reboots during 'make buildworld' without
>>>>> diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
>>>>> work for me. My machine reboots every time I try to build world.
>>>>> I don't think I have a hardware problem: under 7-stable I can build
>>>>> world/kernel for both 7-stable and 8.0-RC2 without problems.
>>>>>
>>>> Is it by any chance possible that you have 'debug.debugger_on_panic' set
>>>> to '0' and no valid dump device configured?
>>> Hi Alexandre,
>>>
>>> I've not found 'debug.debugger_on_panic' variable in 'sysctl -a'
>>> output. Where cat I find it? All my sysctl variables are set by
>>> default.
>> Do you have "options DDB" in your kernel config file?
>> ---
>> Gary Jennejohn
>
> Hi Gary,
>
> my current kernel is GENERIC, so I don't have "options DDB".

I have RC2 with amd64 and buildworld/installworld runs fine.
Maybe you memory (ram) problems ? I had to remove one 512mb clib
in order to boot... ;-)


Hope this helps,

Etienne


--
Etienne Robillard <robillard.etienne@...>
Green Tea Hackers Club <http://gthc.org/>
Blog: <http://gthc.org/blog/>
PGP Fingerprint: 178A BF04 23F0 2BF5 535D  4A57 FD53 FD31 98DC 4E57
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: 8.0RC2 amd64 - kernel panic running make buildworld

by S.N.Grigoriev-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



05.11.09, 13:46, "Etienne Robillard" <robillard.etienne@...>:

> S.N.Grigoriev wrote:
> >
> > 05.11.09, 18:49, "Gary Jennejohn" :
> >
> >> On Thu, 05 Nov 2009 19:40:03 +0300
> >> S.N.Grigoriev  wrote:
> >>> 04.11.09, 16:51, "Alexandre \"Sunny\" Kovalenko"
> >>> wrote:
> >>>
> >>>> On Wed, 2009-11-04 at 13:44 +0300, S.N.Grigoriev wrote:
> >>>>> Hi list,
> >>>>>
> >>>>> I can confirm I've seen the same problem. After upgrading from 7-stable
> >>>>> to 8.0-RC2 my machine just reboots during 'make buildworld' without
> >>>>> diagnostics. But switching vm.pmap.pg_ps_enabled on/off does not
> >>>>> work for me. My machine reboots every time I try to build world.
> >>>>> I don't think I have a hardware problem: under 7-stable I can build
> >>>>> world/kernel for both 7-stable and 8.0-RC2 without problems.
> >>>>>
> >>>> Is it by any chance possible that you have 'debug.debugger_on_panic' set
> >>>> to '0' and no valid dump device configured?
> >>> Hi Alexandre,
> >>>
> >>> I've not found 'debug.debugger_on_panic' variable in 'sysctl -a'
> >>> output. Where cat I find it? All my sysctl variables are set by
> >>> default.
> >> Do you have "options DDB" in your kernel config file?
> >> ---
> >> Gary Jennejohn
> >
> > Hi Gary,
> >
> > my current kernel is GENERIC, so I don't have "options DDB".
> I have RC2 with amd64 and buildworld/installworld runs fine.
> Maybe you memory (ram) problems ? I had to remove one 512mb clib
> in order to boot... ;-)
> Hope this helps,
> Etienne

Hi Etienne,

I think it is unlikely. I've done on this machine (under FreeBSD 7.1 and 7.2 and
some Linux versions) very much compilations without issues.  
--
Regards,
S.Grigoriev.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
< Prev | 1 - 2 - 3 - 4 | Next >