4.6 hang

View: New views
7 Messages — Rating Filter:   Alert me  

4.6 hang

by Steve Shockley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I recently upgraded my firewall box from 4.4 to 4.6.  At first it was
running well (about a week), but yesterday I started getting occasional
hangs where the screen would be blank and it'd stop responding to ping
(and passing traffic).  Figuring it was a hardware failure, I swapped
the drive into another box.  I still seem to be getting occasional
hangs; I even turned off screen blanking, and when it hangs there's
nothing on the screen (monitor goes to power save).  The only shared
hardware between the two machines is a Compaq fiber em NIC (which I'll
replace tonight) and the hard drive (which isn't showing any errors).
Assuming it is a software problem, how can I diagnose it?  I'll paste
the dmesg below.  I'm running 4.6 with patch 001 and 002 applied, and
I've tried both the sp and mp kernels.

OpenBSD 4.6-stable (GENERIC) #1: Tue Oct  6 05:40:03 EDT 2009
     root@...:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 3.06GHz ("GenuineIntel" 686-class) 3.07 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
real mem  = 3220668416 (3071MB)
avail mem = 3120185344 (2975MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 10/14/04, BIOS32 rev. 0 @ 0xffe90,
SMBIOS rev. 2.3 @ 0xfae10 (77 entries)
bios0: vendor Dell Computer Corporation version "A05" date 10/14/2004
bios0: Dell Computer Corporation PowerEdge 650
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP APIC SPCR
acpi0: wakeup devices PCI0(S5) PCI1(S5) PCI2(S5)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 133MHz
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 11, 16 pins
ioapic0: misconfigured as apic 0, remapped to apid 2
ioapic1 at mainbus0: apid 3 pa 0xfec01000, version 11, 16 pins
ioapic1: misconfigured as apic 0, remapped to apid 3
ioapic2 at mainbus0: apid 4 pa 0xfec02000, version 11, 16 pins
ioapic2: misconfigured as apic 0, remapped to apid 4
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (PCI1)
acpiprt2 at acpi0: bus 2 (PCI2)
acpicpu0 at acpi0
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x4800 0xec000/0x4000!
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "ServerWorks GCNB-LE Host" rev 0x32
pchb1 at pci0 dev 0 function 1 "ServerWorks GCNB-LE Host" rev 0x00
pci1 at pchb1 bus 1
em0 at pci1 dev 3 function 0 "Intel PRO/1000MT (82546EB)" rev 0x01: apic
3 int 3 (irq 7), address 00:04:23:a5:c8:6e
em1 at pci1 dev 3 function 1 "Intel PRO/1000MT (82546EB)" rev 0x01: apic
3 int 4 (irq 5), address 00:04:23:a5:c8:6f
em2 at pci0 dev 3 function 0 "Intel PRO/1000 (82542)" rev 0x03: apic 3
int 1 (irq 15), address 00:08:c7:86:39:f5
vga1 at pci0 dev 4 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pciide0 at pci0 dev 5 function 0 "CMD Technology PCI0680" rev 0x02
pciide0: bus-master DMA support present
pciide0: channel 0 wired to native-PCI mode
pciide0: using apic 3 int 7 (irq 11) for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: <ST340014A>
wd0: 16-sector PIO, LBA48, 38166MB, 78165360 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
pciide0: channel 1 wired to native-PCI mode
piixpm0 at pci0 dev 15 function 0 "ServerWorks CSB6" rev 0xa0: SMBus
disabled
pciide1 at pci0 dev 15 function 1 "ServerWorks CSB6 RAID/IDE" rev 0xa0: DMA
atapiscsi0 at pciide1 channel 0 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-224E, K.9A> ATAPI 5/cdrom removable
cd0(pciide1:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2
ohci0 at pci0 dev 15 function 2 "ServerWorks CSB6 USB" rev 0x05: apic 2
int 10 (irq 10), version 1.0, legacy support
pcib0 at pci0 dev 15 function 3 "ServerWorks GCLE-2 Host" rev 0x00
pchb2 at pci0 dev 16 function 0 "ServerWorks CIOB-E" rev 0x12
pchb3 at pci0 dev 16 function 2 "ServerWorks CIOB-E" rev 0x12
pci2 at pchb3 bus 2
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 "ServerWorks OHCI root hub" rev 1.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
mtrr: Pentium Pro MTRR support
softraid0 at root
root on wd0a swap on wd0b dump on wd0b
WARNING: / was not properly unmounted


Re: 4.6 hang

by Gregory Edigarov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 27 Oct 2009 07:10:24 -0400
Steve Shockley <steve.shockley@...> wrote:

> I recently upgraded my firewall box from 4.4 to 4.6.  At first it was
> running well (about a week), but yesterday I started getting
> occasional hangs where the screen would be blank and it'd stop
> responding to ping (and passing traffic).  Figuring it was a hardware
> failure, I swapped the drive into another box.  I still seem to be
> getting occasional hangs; I even turned off screen blanking, and when
> it hangs there's nothing on the screen (monitor goes to power save).
> The only shared hardware between the two machines is a Compaq fiber
> em NIC (which I'll replace tonight) and the hard drive (which isn't
> showing any errors). Assuming it is a software problem, how can I
> diagnose it?  I'll paste the dmesg below.  I'm running 4.6 with patch
> 001 and 002 applied, and I've tried both the sp and mp kernels.

Although that may not be the problem, try to turn of acpi in kernel.
Helps me in 90% of sporadic hangs or reboots.
I even made that the routine: if I have new hardware and would like to
test it, first i try run it with acpi on, if it hangs or shows speed
regression - i just turn acpi off, and in 90% i am happy. for
the rest 10% i change my hardware.  

> OpenBSD 4.6-stable (GENERIC) #1: Tue Oct  6 05:40:03 EDT 2009
>      root@...:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: Intel(R) Pentium(R) 4 CPU 3.06GHz ("GenuineIntel" 686-class)
> 3.07 GHz cpu0:
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
> real mem  = 3220668416 (3071MB)
> avail mem = 3120185344 (2975MB)
> mainbus0 at root
> bios0 at mainbus0: AT/286+ BIOS, date 10/14/04, BIOS32 rev. 0 @
> 0xffe90, SMBIOS rev. 2.3 @ 0xfae10 (77 entries)
> bios0: vendor Dell Computer Corporation version "A05" date 10/14/2004
> bios0: Dell Computer Corporation PowerEdge 650
> acpi0 at bios0: rev 0
> acpi0: tables DSDT FACP APIC SPCR
> acpi0: wakeup devices PCI0(S5) PCI1(S5) PCI2(S5)
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: apic clock running at 133MHz
> cpu at mainbus0: not configured
> ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 11, 16 pins
> ioapic0: misconfigured as apic 0, remapped to apid 2
> ioapic1 at mainbus0: apid 3 pa 0xfec01000, version 11, 16 pins
> ioapic1: misconfigured as apic 0, remapped to apid 3
> ioapic2 at mainbus0: apid 4 pa 0xfec02000, version 11, 16 pins
> ioapic2: misconfigured as apic 0, remapped to apid 4
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (PCI1)
> acpiprt2 at acpi0: bus 2 (PCI2)
> acpicpu0 at acpi0
> bios0: ROM list: 0xc0000/0x8000 0xc8000/0x4800 0xec000/0x4000!
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> pchb0 at pci0 dev 0 function 0 "ServerWorks GCNB-LE Host" rev 0x32
> pchb1 at pci0 dev 0 function 1 "ServerWorks GCNB-LE Host" rev 0x00
> pci1 at pchb1 bus 1
> em0 at pci1 dev 3 function 0 "Intel PRO/1000MT (82546EB)" rev 0x01:
> apic 3 int 3 (irq 7), address 00:04:23:a5:c8:6e
> em1 at pci1 dev 3 function 1 "Intel PRO/1000MT (82546EB)" rev 0x01:
> apic 3 int 4 (irq 5), address 00:04:23:a5:c8:6f
> em2 at pci0 dev 3 function 0 "Intel PRO/1000 (82542)" rev 0x03: apic
> 3 int 1 (irq 15), address 00:08:c7:86:39:f5
> vga1 at pci0 dev 4 function 0 "ATI Rage XL" rev 0x27
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> pciide0 at pci0 dev 5 function 0 "CMD Technology PCI0680" rev 0x02
> pciide0: bus-master DMA support present
> pciide0: channel 0 wired to native-PCI mode
> pciide0: using apic 3 int 7 (irq 11) for native-PCI interrupt
> wd0 at pciide0 channel 0 drive 0: <ST340014A>
> wd0: 16-sector PIO, LBA48, 38166MB, 78165360 sectors
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
> pciide0: channel 1 wired to native-PCI mode
> piixpm0 at pci0 dev 15 function 0 "ServerWorks CSB6" rev 0xa0: SMBus
> disabled
> pciide1 at pci0 dev 15 function 1 "ServerWorks CSB6 RAID/IDE" rev
> 0xa0: DMA atapiscsi0 at pciide1 channel 0 drive 0
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-224E, K.9A> ATAPI 5/cdrom
> removable cd0(pciide1:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA
> mode 2 ohci0 at pci0 dev 15 function 2 "ServerWorks CSB6 USB" rev
> 0x05: apic 2 int 10 (irq 10), version 1.0, legacy support
> pcib0 at pci0 dev 15 function 3 "ServerWorks GCLE-2 Host" rev 0x00
> pchb2 at pci0 dev 16 function 0 "ServerWorks CIOB-E" rev 0x12
> pchb3 at pci0 dev 16 function 2 "ServerWorks CIOB-E" rev 0x12
> pci2 at pchb3 bus 2
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 "ServerWorks OHCI root hub" rev 1.00/1.00 addr 1
> isa0 at pcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pms0 at pckbc0 (aux slot)
> pckbc0: using irq 12 for aux slot
> wsmouse0 at pms0 mux 0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0: <PC speaker>
> spkr0 at pcppi0
> npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> mtrr: Pentium Pro MTRR support
> softraid0 at root
> root on wd0a swap on wd0b dump on wd0b
> WARNING: / was not properly unmounted
>


--
With best regards,
        Gregory Edigarov


Re: 4.6 hang

by Steve Shockley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 10/27/2009 7:44 AM, Gregory Edigarov wrote:
> Although that may not be the problem, try to turn of acpi in kernel.
> Helps me in 90% of sporadic hangs or reboots.

Thanks for the reply.  I'm trying with ACPI disabled now, but during the
day today I did get a panic, details below.

panic: pool_do_get(mcl2k): free list modified: page 0xd99dd000; item
addr 0xd99dd800; offset 0x0=0x800aabb
Stopped at          Debugger+0x4:   leave

Trace:
Debugger(d9695800,d0894098,df670e30,d99dd800,d0894020) at Debugger+0x4
panic(d0716100,d08470a0,d99dd000,d99dd800,0) at panic+0x55
pool_do_get(d0894020,0,df670ea0,df670e50,d0363faf,d0894020) at
pool_do_get+0x2e3
pool_get(d0894020,0,df670ea0,d039afee,0) at pool_get+0x46
m_clget(d977c500,1,d3acb830,800) at m_clget+0x74
em_get_buf(d3acb800,d,200e0a0,d3acb830) at em_get_buf+0x64
em_rxfill(d3acb800,fffffffe,c0,0) at em_rxfill+0x3a
em_intr(d3acb800) at em_intr+0x9e
Xintr_ioapic() at Xintr_ioapic1+0x68
--- interrupt ---
cpu_idle_cycle(d09408e0) at cpu_idle_cycle+0xf
Bad frame pointer: 0xd09e9e78

ps on request, since I'm typing by hand from a digital photo.


Re: 4.6 hang

by Steve Shockley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just as an update, I've replaced the one NIC, so the only thing carried
over from the other machine is the hard drive, and I'm still getting the
exact same issue.


Re: 4.6 hang

by Steve Shockley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just as another update, I replaced the fiber em card with a bge, and the
problems went away.


Re: 4.6 hang

by Matthew Young-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Iam very curious about your problem, we all fear encountering
something similar in the future...

Why wasnt anybody able to help out based on your DDB trace ?  If I
ever get such event what is the best information then that one should
post to recieve help? (provided off course people are wanting to help)

Just curious! I want to be prepared to troubleshoot this better in the future..


Thanks

--Matt


On Thu, Oct 29, 2009 at 9:09 PM, Steve Shockley
<steve.shockley@...> wrote:
> Just as another update, I replaced the fiber em card with a bge, and the
> problems went away.


Re: 4.6 hang

by Nicholas Marriott-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

On Fri, Oct 30, 2009 at 05:09:21PM -0500, Matthew Young wrote:
> Iam very curious about your problem, we all fear encountering
> something similar in the future...
>
> Why wasnt anybody able to help out based on your DDB trace ?  If I
> ever get such event what is the best information then that one should
> post to recieve help? (provided off course people are wanting to help)

http://www.openbsd.org/report.html

> Just curious! I want to be prepared to troubleshoot this better in the future..
>
>
> Thanks
>
> --Matt
>
>
> On Thu, Oct 29, 2009 at 9:09 PM, Steve Shockley
> <steve.shockley@...> wrote:
> > Just as another update, I replaced the fiber em card with a bge, and the
> > problems went away.