|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
|
|
CARP node crashing reproducibly (4.3-stable)Hello,
Here's all data I was able to get off our crashing machine, the backup node of our CARP cluster, that used to run flawlessly since 3.7. We can reproduce the problem by (no joke) installing an openSUSE 10.3 machine in one of our labs over the network. After 40 minutes, our backup firewall crashes. Sounds preposterous, I know... We've not had time to examined what packets are exactly sent out on the network by this machine, yet. The crashed machine is still in ddb, so just asked if I should execute some more commands. Should I rather file a bug report? I never know when I should just ask here or rather file one, sorry... but thanks for your help anyway! ddb> ps PID PPID PGRP UID S FLAGS WAIT COMMAND 15843 7703 7703 556 2 0x100 nrpe *27717 1 7703 556 7 0x100 nrpe 26537 24561 27862 0 3 0x4082 ttyin more 24561 27862 27862 0 3 0x4082 pause sh 27862 3290 27862 0 3 0x4082 wait man 10574 1244 10574 0 3 0x4082 ttyin ksh 1244 21740 1244 0 3 0x4180 select sshd 19759 15807 19759 0 3 0x4082 ttyin ksh 15807 21740 15807 0 3 0x4180 select sshd 3290 1 3290 0 3 0x4082 pause ksh 2463 1 2463 0 3 0x4082 ttyin getty 4032 1 4032 0 3 0x4082 ttyin getty 29698 1 29698 0 3 0x4082 ttyin getty 25598 1 25598 0 3 0x4082 ttyin getty 2451 1 2451 0 3 0x4082 ttyin getty 26554 1 26554 0 3 0x80 poll ntpd 7819 1 7819 0 3 0x80 select cron 21981 1 21981 0 3 0x80 kqread apmd 7703 1 7703 556 3 0x180 wait nrpe 21908 1 7188 0 3 0x80 select snmpd 20436 1 20436 83 3 0x180 poll ntpd 10622 1 10622 0 3 0x40180 select sendmail 31903 1 31903 62 3 0x180 select spamd 21740 1 21740 0 3 0x80 select sshd 14513 1 14513 71 3 0x180 kqread ftp-proxy 22495 1 22495 77 3 0x180 poll dhcrelay 9478 1 9478 0 2 0x80 ifstated 17920 19868 19868 74 2 0x180 pflogd 19868 1 19868 0 3 0x80 netio pflogd 656 20418 20418 73 2 0x180 syslogd 20418 1 20418 0 3 0x88 netio syslogd 18 0 0 0 3 0x100200 bored crypto 17 0 0 0 3 0x100200 aiodoned aiodoned 16 0 0 0 3 0x100200 syncer update 15 0 0 0 3 0x100200 cleaner cleaner 14 0 0 0 3 0x100200 reaper reaper 13 0 0 0 3 0x100200 pgdaemon pagedaemon 12 0 0 0 3 0x100200 pftm pfpurge 11 0 0 0 3 0x100200 usbevt usb4 10 0 0 0 3 0x100200 usbevt usb3 9 0 0 0 3 0x100200 usbevt usb2 8 0 0 0 3 0x100200 usbevt usb1 7 0 0 0 3 0x100200 usbtsk usbtask 6 0 0 0 3 0x100200 usbevt usb0 5 0 0 0 3 0x100200 apmev apm0 4 0 0 0 3 0x100200 bored syswq 3 0 0 0 3 0x100200 idle0 2 0 0 0 2 0x100200 kmthread 1 0 1 0 3 0x4080 wait init 0 -1 0 0 3 0x80200 scheduler swapper ddb> trace pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8) at pf_send_icmp+0x2b pf_test_rule(db2a4e68,db2a4e60,1,d115d500,d62f3200) at pf_test_rule +0xc66 pf_test(1,d1447800,db2a4f88,0) at pf_test+0x941 ipv4_input(d62f3200,d03c5198,50,286,0) at ipv4_input+0x11d ipintr(27,12c0027,7db80027,cfbd0027,82ec2) at ipintr+0x70 Bad frame pointer: 0xdb2a4fa0 ddb> show registers ds 0xd0360010 shmget_existing+0x4c es 0xdb2a0010 end+0xaa06f1c fs 0x58 gs 0x10 edi 0x3 esi 0x2 ebp 0xdb2a4d60 end+0xaa0bc6c ebx 0xd67191b8 end+0x5e800c4 edx 0x4 ecx 0xd07ef600 mbpool eax 0 eip 0xd02f56db pf_send_icmp+0x2b cs 0x50 eflags 0x10246 esp 0xdb2a4d38 end+0xaa0bc44 ss 0xdb2a0010 end+0xaa06f1c pf_send_icmp+0x2b: orb $0x1,0x32(%eax) ddb> show panic the kernel did not panic ddb> show all callout ticks now: 229505 ticks wheel arg func -24 4/1024 d07c9314 nfs_timer -22 4/1024 d07a44ec pfslowtimo -22 4/1024 d11a0a00 uhci_poll_hub -22 4/1024 d11a0800 uhci_poll_hub -22 4/1024 d11a0600 uhci_poll_hub -22 4/1024 d11a0400 uhci_poll_hub -20 4/1024 d118f000 fxp_stats_update -19 4/1024 d6636414 endtsleep -19 4/1024 d670c160 endtsleep -18 4/1024 d07a44d4 pffasttimo 719985 4/1024 d6578e14 tcp_timer_keep -15 4/1024 d6582000 syn_cache_reaper 5 4/1024 d6578e14 tcp_delack 286 4/1024 d6582228 syn_cache_timer -12 4/1024 d1175000 em_local_timer 719988 4/1024 d65787d4 tcp_timer_keep -12 4/1024 d65822e0 syn_cache_reaper 8 4/1024 d65787d4 tcp_delack -11 4/1024 d670c818 endtsleep -8 4/1024 d117a800 em_local_timer -7 4/1024 d65784b4 tcp_delack 93 4/1024 d145e800 pfsync_timeout -6 4/1024 d6578644 tcp_delack 21 0/150 d663600c endtsleep 21 0/150 d07e1f40 pckbc_poll 21 0/150 d07a4528 if_slowtimo 21 0/150 0 nd6_timer 21 0/150 d07a5f90 rt_timer_timer 21 0/150 d07a4294 schedcpu 25 0/154 d6a5d004 endtsleep 298 0/185 d144a800 carp_master_down 298 0/185 d1225600 carp_master_down 298 0/185 d144b400 carp_master_down 298 0/185 d144d200 carp_master_down 60 0/189 d1179800 em_local_timer 62 0/191 d1170800 em_local_timer 171 1/385 d670c2b8 endtsleep 211 1/385 d670c970 endtsleep 225 1/385 d11a1d40 sensor_task_tick 770 1/387 d6636164 endtsleep 882 1/387 d670c160 realitexpire 2218 1/393 d6a5dd74 realitexpire 4539 1/402 d6a636b8 endtsleep 4539 1/402 d6a63968 endtsleep 4539 1/402 d6a63ac0 endtsleep 4539 1/402 d6a63c18 endtsleep 4539 1/402 d6a63d70 endtsleep 10894 1/427 0 arc4_reinit 10994 1/427 d07a63c0 arptimer 29973 1/501 d64ccc2c realitexpire 130495 2/517 0 nd6_cache_lladdr 100158 2/517 d66362bc endtsleep 131324 2/517 d663600c realitexpire 719098 2/523 d665de10 tcp_timer_keep 714623 2/526 d665dc80 tcp_timer_keep 719973 2/526 d65784b4 tcp_timer_keep 719974 2/526 d6578644 tcp_timer_keep ddb> dmesg from core of previous crash: # dmesg -N bsd.0 -M bsd.0.core OpenBSD 4.3 (GENERIC) #2: Fri May 9 20:54:13 CEST 2008 root@...:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Pentium(R) 4 CPU 2.66GHz ("GenuineIntel" 686-class) 2.67 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR real mem = 535515136 (510MB) avail mem = 509771776 (486MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 02/07/05, BIOS32 rev. 0 @ 0xf0010, SMBIOS rev. 2.3 @ 0xfbcb0 (72 entries) bios0: vendor Intel Corp. version "WP87510A.86B.0059.P18.0502071117" date 02/07/2005 bios0: Intel Corporation S875WP1 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown acpi at bios0 function 0x0 not configured pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf3d40/224 (12 entries) pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801EB/ER LPC" rev 0x00) pcibios0: PCI bus #4 is the last bus bios0: ROM list: 0xc0000/0x8000 cpu0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02 agp0 at pchb0: aperture at 0xf8000000, size 0x4000000 ppb0 at pci0 dev 1 function 0 "Intel 82875P AGP" rev 0x02 pci1 at ppb0 bus 1 ppb1 at pci0 dev 3 function 0 "Intel 82875P CSA" rev 0x02 pci2 at ppb1 bus 2 em0 at pci2 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00: irq 10, address 00:0c:f1:8f:a6:1d uhci0 at pci0 dev 29 function 0 "Intel 82801EB/ER USB" rev 0x02: irq 5 uhci1 at pci0 dev 29 function 1 "Intel 82801EB/ER USB" rev 0x02: irq 9 uhci2 at pci0 dev 29 function 2 "Intel 82801EB/ER USB" rev 0x02: irq 10 uhci3 at pci0 dev 29 function 3 "Intel 82801EB/ER USB" rev 0x02: irq 5 ehci0 at pci0 dev 29 function 7 "Intel 82801EB/ER USB2" rev 0x02: irq 9 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb2 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xc2 pci3 at ppb2 bus 3 ppb3 at pci3 dev 2 function 0 "Pericom PI7C21P100 PCIX-PCIX" rev 0x01 pci4 at ppb3 bus 4 em1 at pci4 dev 4 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03: irq 11, address 00:0e:0c:c3:48:04 em2 at pci4 dev 4 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03: irq 10, address 00:0e:0c:c3:48:05 em3 at pci4 dev 6 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03: irq 9, address 00:0e:0c:c3:48:06 em4 at pci4 dev 6 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03: irq 5, address 00:0e:0c:c3:48:07 vga1 at pci3 dev 6 function 0 "ATI Rage XL" rev 0x27 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) fxp0 at pci3 dev 8 function 0 "Intel PRO/100 VE" rev 0x01, i82562: irq 11, address 00:0c:f1:8f:a6:1f inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0 ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02: 24-bit timer at 3579545Hz pciide0 at pci0 dev 31 function 1 "Intel 82801EB/ER IDE" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility wd0 at pciide0 channel 0 drive 0: <HDS728080PLAT20> wd0: 16-sector PIO, LBA48, 78533MB, 160836480 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 atapiscsi0 at pciide0 channel 1 drive 0 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-552E, 1.00> SCSI0 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 pciide1 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide1: using irq 10 for native-PCI interrupt ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: irq 11 iic0 at ichiic0 adt0 at iic0 addr 0x2e: lm85 rev 0x62 spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM non-parity PC3200CL3.0 usb1 at uhci0: USB revision 1.0 uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb2 at uhci1: USB revision 1.0 uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb3 at uhci2: USB revision 1.0 uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb4 at uhci3: USB revision 1.0 uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1 isa0 at ichpcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pmsi0 at pckbc0 (aux slot) pckbc0: using irq 12 for aux slot wsmouse0 at pmsi0 mux 0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> spkr0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec biomask ef65 netmask ef65 ttymask ffe7 mtrr: Pentium Pro MTRR support softraid0 at root root on wd0a swap on wd0b dump on wd0b WARNING: / was not properly unmounted uvm_fault(0xd07e99a0, 0x0, 0, 3) -> e kernel: page fault trap, code=0 Stopped at pf_send_icmp+0x2b: orb $0x1,0x32(%eax) Vmstat from the core of the previous crash: # vmstat -N /usr/crash/bsd.0 -M /usr/crash/bsd.0.core -m Memory statistics by bucket size Size In Use Free Requests HighWater Couldfree 16 4340 13068 659510 1280 3337 32 333 179 49068 640 0 64 2283 1365 378567 320 10951 128 748 84 14842 160 0 256 221 147 26905 80 506 512 399 17 5112 40 0 1024 2319 25 8606 20 680 2048 45 5 734 10 0 4096 32 4 26834 5 0 8192 12 0 12 5 0 16384 2 0 2 5 0 32768 5 0 5 5 0 65536 1 0 1 5 0 Memory usage type by bucket size Size Type(s) 16 devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash, in_multi, exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB device, packet tags, temp 32 devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, proc, VFS cluster, in_multi, ether_multi, xform_data, VM swap, UVM amap, USB, temp, AGP Memory 64 devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash, ip_moptions, in_multi, pfkey data, UVM amap, USB, NDP, temp 128 devbuf, routetbl, ifaddr, vnodes, dirhash, ttys, exec, UVM amap, USB, USB device, NDP, temp, AGP Memory 256 devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM map, dirhash, proc, NFS srvsock, NFS daemon, newblk, UVM amap, USB, USB device, temp 512 devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash, ttys, exec, UVM amap, USB device, temp 1024 devbuf, ioctlops, namecache, proc, ttys, exec, UVM amap, UVM aobj, crypto data, temp 2048 devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM amap, temp 4096 devbuf, ioctlops, UFS mount, MSDOSFS mount, UVM amap, memdesc, temp 8192 devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount, inodedep 16384 devbuf, namecache 32768 devbuf, VM swap 65536 VM swap Memory statistics by type Type Kern Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) devbuf 3808 2545K 2545K 39322K 3880 0 0 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768 pcb 29 4K 4K 39322K 169 0 0 16,32,64,512 routetbl 303 28K 44K 39322K 1642 0 0 16,32,64,128,256 ifaddr 143 25K 25K 39322K 145 0 0 16,32,64,128,256,512,2048 sysctl 2 1K 1K 39322K 2 0 0 16,256 ioctlops 0 0K 4K 39322K 2737 0 0 256,512,1024,2048,4096 mount 4 2K 2K 39322K 4 0 0 512 NFS node 1 8K 8K 39322K 1 0 0 8192 vnodes 56 8K 87K 39322K 1315 0 0 64,128,256 namecache 3 25K 25K 39322K 3 0 0 1024,8192,16384 UFS quota 1 8K 8K 39322K 1 0 0 8192 UFS mount 17 35K 35K 39322K 17 0 0 16,32,512,2048,4096,8192 shm 2 1K 1K 39322K 2 0 0 256,512 VM map 4 1K 1K 39322K 4 0 0 256 sem 2 1K 1K 39322K 2 0 0 32,64 dirhash 195 37K 41K 39322K 459 0 0 16,32,64,128,256,512 proc 15 3K 3K 39322K 15 0 0 32,256,1024 VFS cluster 0 0K 1K 39322K 380 0 0 32 NFS srvsock 1 1K 1K 39322K 1 0 0 256 NFS daemon 1 1K 1K 39322K 1 0 0 256 ip_moptions 5 1K 1K 39322K 5 0 0 64 in_multi 123 5K 5K 39322K 124 0 0 16,32,64 ether_multi 64 2K 3K 39322K 65 0 0 32 ISOFS mount 1 8K 8K 39322K 1 0 0 8192 MSDOSFS mount 1 4K 4K 39322K 1 0 0 4096 ttys 420 263K 263K 39322K 420 0 0 128,512,1024 exec 0 0K 2K 39322K 4679 0 0 16,128,512,1024 pfkey data 1 1K 1K 39322K 2 0 0 64 xform_data 0 0K 1K 39322K 45 0 0 16,32 pagedep 1 2K 2K 39322K 1 0 0 2048 inodedep 1 8K 8K 39322K 1 0 0 8192 newblk 1 1K 1K 39322K 1 0 0 256 VM swap 7 75K 75K 39322K 7 0 0 16,32,2048,32768,65536 UVM amap 5289 306K 529K 39322K 760441 0 0 16,32,64,128,256,512,1024,2048,4096 UVM aobj 2 2K 2K 39322K 2 0 0 16,1024 USB 74 7K 7K 39322K 74 0 0 16,32,64,128,256 USB device 21 9K 9K 39322K 21 0 0 16,128,256,512 memdesc 1 4K 4K 39322K 1 0 0 4096 crypto data 1 1K 1K 39322K 1 0 0 1024 packet tags 0 0K 1K 39322K 83 0 0 16 NDP 24 3K 3K 39322K 28 0 0 64,128 temp 114 14K 18K 39322K 393413 0 0 16,32,64,128,256,512,1024,2048,4096 AGP Memory 2 1K 1K 39322K 2 0 0 32,128 Memory Totals: In Use Free Requests 3435K 402K 1170198 Memory resource pool statistics Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle phpool 32 1556 0 0 13 0 13 13 0 8 0 extentpl 20 248 0 197 1 0 1 1 0 8 0 pmappl 84 5679 0 5648 2 0 2 2 0 8 1 vmsppl 188 5679 0 5648 5 0 5 5 0 8 3 vmmpepl 88 517935 0 515492 252 12 240 241 0 179 179 vmmpekpl 88 298937 0 298904 2 0 2 2 0 8 1 aobjpl 52 1 0 0 1 0 1 1 0 8 0 amappl 44 249175 0 247468 70 5 65 66 0 45 44 anonpl 16 391737 0 386051 46 0 46 46 0 31 21 bufpl 124 15106 0 11816 103 0 103 103 0 8 0 mbpl 256 2533151 16215 2530831 145 0 145 145 1 384 0 mclpl 2048 772776 105 770492 1142 0 1142 1142 4 3072 0 sockpl 212 148172 0 148133 4 0 4 4 0 8 1 procpl 344 5696 0 5648 10 0 10 10 0 8 5 processpl 20 5696 0 5648 1 0 1 1 0 8 0 zombiepl 72 5648 0 5648 1 0 1 1 0 8 1 ucredpl 80 170 0 153 1 0 1 1 0 8 0 pgrppl 24 994 0 965 1 0 1 1 0 8 0 sessionpl 48 84 0 58 1 0 1 1 0 8 0 pcredpl 24 5696 0 5648 1 0 1 1 0 8 0 lockfpl 52 20 0 18 1 0 1 1 0 8 0 filepl 88 186689 0 186584 5 0 5 5 0 8 2 fdescpl 296 5697 0 5648 8 0 8 8 0 8 4 pipepl 72 6438 0 6430 2 0 2 2 0 8 1 kqueuepl 192 4 0 1 1 0 1 1 0 8 0 knotepl 64 12 0 3 1 0 1 1 0 8 0 sigapl 316 5679 0 5648 8 0 8 8 0 8 5 wqtasks 20 1153 0 1153 1 0 1 1 0 8 1 wdcspl 96 28209 0 28208 1 0 1 1 0 8 0 scxspl 132 3 0 3 1 0 1 1 0 8 1 namei 1024 57410 0 57410 3 0 3 3 0 8 3 vnodes 148 3141 0 0 117 0 117 117 0 8 0 nchpl 72 6495 0 4913 29 0 29 29 0 8 0 ffsino 184 13477 0 10343 143 0 143 143 0 8 0 dino1pl 128 13477 0 10343 102 0 102 102 0 8 0 dirhash 1024 636 0 346 77 0 77 77 0 128 3 pfrulepl 824 442 0 10 111 0 111 111 0 8 2 pfstatepl 204 114953 0 112771 241 0 241 241 0 264 110 pfstatekeypl 108 114953 0 112797 131 46 85 123 0 8 8 pfpooladdrpl 68 27 0 0 1 0 1 1 0 8 0 pfrktable 1240 146 0 74 48 0 48 48 0 334 0 pfrkentry 156 78200 0 5914 3008 0 3008 3008 0 13462 0 pfosfpen 108 1392 0 696 30 11 19 19 0 8 0 pfosfp 28 814 0 407 3 0 3 3 0 8 0 rtentpl 116 88 0 15 3 0 3 3 0 8 0 tcpcbpl 400 933 0 920 4 0 4 4 0 8 2 tcpqepl 16 27 0 27 1 0 1 1 0 13 1 synpl 184 897 0 897 1 0 1 1 0 8 1 plimitpl 152 660 0 647 1 0 1 1 0 8 0 inpcbpl 216 148012 0 147993 3 0 3 3 0 8 1 In use 20068K, total allocated 23264K; utilization 86.3% -- Stephan A. Rickauer ----------------------------------------------------------- Institute of Neuroinformatics Tel +41 44 635 30 50 University / ETH Zurich Sec +41 44 635 30 52 Winterthurerstrasse 190 Fax +41 44 635 30 53 CH-8057 Zurich Web www.ini.uzh.ch |
|
|
Re: CARP node crashing reproducibly (4.3-stable)hi stephan!
can you also show your carp configuration? reyk On Fri, Jul 11, 2008 at 04:55:33PM +0200, Stephan A. Rickauer wrote: > Hello, > > Here's all data I was able to get off our crashing machine, the backup > node of our CARP cluster, that used to run flawlessly since 3.7. > > We can reproduce the problem by (no joke) installing an openSUSE 10.3 > machine in one of our labs over the network. After 40 minutes, our > backup firewall crashes. Sounds preposterous, I know... We've not had > time to examined what packets are exactly sent out on the network by > this machine, yet. > > The crashed machine is still in ddb, so just asked if I should execute > some more commands. > > Should I rather file a bug report? I never know when I should just ask > here or rather file one, sorry... but thanks for your help anyway! > > ddb> ps > PID PPID PGRP UID S FLAGS WAIT COMMAND > 15843 7703 7703 556 2 0x100 nrpe > *27717 1 7703 556 7 0x100 nrpe > 26537 24561 27862 0 3 0x4082 ttyin more > 24561 27862 27862 0 3 0x4082 pause sh > 27862 3290 27862 0 3 0x4082 wait man > 10574 1244 10574 0 3 0x4082 ttyin ksh > 1244 21740 1244 0 3 0x4180 select sshd > 19759 15807 19759 0 3 0x4082 ttyin ksh > 15807 21740 15807 0 3 0x4180 select sshd > 3290 1 3290 0 3 0x4082 pause ksh > 2463 1 2463 0 3 0x4082 ttyin getty > 4032 1 4032 0 3 0x4082 ttyin getty > 29698 1 29698 0 3 0x4082 ttyin getty > 25598 1 25598 0 3 0x4082 ttyin getty > 2451 1 2451 0 3 0x4082 ttyin getty > 26554 1 26554 0 3 0x80 poll ntpd > 7819 1 7819 0 3 0x80 select cron > 21981 1 21981 0 3 0x80 kqread apmd > 7703 1 7703 556 3 0x180 wait nrpe > 21908 1 7188 0 3 0x80 select snmpd > 20436 1 20436 83 3 0x180 poll ntpd > 10622 1 10622 0 3 0x40180 select sendmail > 31903 1 31903 62 3 0x180 select spamd > 21740 1 21740 0 3 0x80 select sshd > 14513 1 14513 71 3 0x180 kqread ftp-proxy > 22495 1 22495 77 3 0x180 poll dhcrelay > 9478 1 9478 0 2 0x80 ifstated > 17920 19868 19868 74 2 0x180 pflogd > 19868 1 19868 0 3 0x80 netio pflogd > 656 20418 20418 73 2 0x180 syslogd > 20418 1 20418 0 3 0x88 netio syslogd > 18 0 0 0 3 0x100200 bored crypto > 17 0 0 0 3 0x100200 aiodoned aiodoned > 16 0 0 0 3 0x100200 syncer update > 15 0 0 0 3 0x100200 cleaner cleaner > 14 0 0 0 3 0x100200 reaper reaper > 13 0 0 0 3 0x100200 pgdaemon pagedaemon > 12 0 0 0 3 0x100200 pftm pfpurge > 11 0 0 0 3 0x100200 usbevt usb4 > 10 0 0 0 3 0x100200 usbevt usb3 > 9 0 0 0 3 0x100200 usbevt usb2 > 8 0 0 0 3 0x100200 usbevt usb1 > 7 0 0 0 3 0x100200 usbtsk usbtask > 6 0 0 0 3 0x100200 usbevt usb0 > 5 0 0 0 3 0x100200 apmev apm0 > 4 0 0 0 3 0x100200 bored syswq > 3 0 0 0 3 0x100200 idle0 > 2 0 0 0 2 0x100200 kmthread > 1 0 1 0 3 0x4080 wait init > 0 -1 0 0 3 0x80200 scheduler swapper > > ddb> trace > pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8) at > pf_send_icmp+0x2b > pf_test_rule(db2a4e68,db2a4e60,1,d115d500,d62f3200) at pf_test_rule > +0xc66 > pf_test(1,d1447800,db2a4f88,0) at pf_test+0x941 > ipv4_input(d62f3200,d03c5198,50,286,0) at ipv4_input+0x11d > ipintr(27,12c0027,7db80027,cfbd0027,82ec2) at ipintr+0x70 > Bad frame pointer: 0xdb2a4fa0 > > ddb> show registers > ds 0xd0360010 shmget_existing+0x4c > es 0xdb2a0010 end+0xaa06f1c > fs 0x58 > gs 0x10 > edi 0x3 > esi 0x2 > ebp 0xdb2a4d60 end+0xaa0bc6c > ebx 0xd67191b8 end+0x5e800c4 > edx 0x4 > ecx 0xd07ef600 mbpool > eax 0 > eip 0xd02f56db pf_send_icmp+0x2b > cs 0x50 > eflags 0x10246 > esp 0xdb2a4d38 end+0xaa0bc44 > ss 0xdb2a0010 end+0xaa06f1c > pf_send_icmp+0x2b: orb $0x1,0x32(%eax) > > ddb> show panic > the kernel did not panic > > ddb> show all callout > ticks now: 229505 > ticks wheel arg func > -24 4/1024 d07c9314 nfs_timer > -22 4/1024 d07a44ec pfslowtimo > -22 4/1024 d11a0a00 uhci_poll_hub > -22 4/1024 d11a0800 uhci_poll_hub > -22 4/1024 d11a0600 uhci_poll_hub > -22 4/1024 d11a0400 uhci_poll_hub > -20 4/1024 d118f000 fxp_stats_update > -19 4/1024 d6636414 endtsleep > -19 4/1024 d670c160 endtsleep > -18 4/1024 d07a44d4 pffasttimo > 719985 4/1024 d6578e14 tcp_timer_keep > -15 4/1024 d6582000 syn_cache_reaper > 5 4/1024 d6578e14 tcp_delack > 286 4/1024 d6582228 syn_cache_timer > -12 4/1024 d1175000 em_local_timer > 719988 4/1024 d65787d4 tcp_timer_keep > -12 4/1024 d65822e0 syn_cache_reaper > 8 4/1024 d65787d4 tcp_delack > -11 4/1024 d670c818 endtsleep > -8 4/1024 d117a800 em_local_timer > -7 4/1024 d65784b4 tcp_delack > 93 4/1024 d145e800 pfsync_timeout > -6 4/1024 d6578644 tcp_delack > 21 0/150 d663600c endtsleep > 21 0/150 d07e1f40 pckbc_poll > 21 0/150 d07a4528 if_slowtimo > 21 0/150 0 nd6_timer > 21 0/150 d07a5f90 rt_timer_timer > 21 0/150 d07a4294 schedcpu > 25 0/154 d6a5d004 endtsleep > 298 0/185 d144a800 carp_master_down > 298 0/185 d1225600 carp_master_down > 298 0/185 d144b400 carp_master_down > 298 0/185 d144d200 carp_master_down > 60 0/189 d1179800 em_local_timer > 62 0/191 d1170800 em_local_timer > 171 1/385 d670c2b8 endtsleep > 211 1/385 d670c970 endtsleep > 225 1/385 d11a1d40 sensor_task_tick > 770 1/387 d6636164 endtsleep > 882 1/387 d670c160 realitexpire > 2218 1/393 d6a5dd74 realitexpire > 4539 1/402 d6a636b8 endtsleep > 4539 1/402 d6a63968 endtsleep > 4539 1/402 d6a63ac0 endtsleep > 4539 1/402 d6a63c18 endtsleep > 4539 1/402 d6a63d70 endtsleep > 10894 1/427 0 arc4_reinit > 10994 1/427 d07a63c0 arptimer > 29973 1/501 d64ccc2c realitexpire > 130495 2/517 0 nd6_cache_lladdr > 100158 2/517 d66362bc endtsleep > 131324 2/517 d663600c realitexpire > 719098 2/523 d665de10 tcp_timer_keep > 714623 2/526 d665dc80 tcp_timer_keep > 719973 2/526 d65784b4 tcp_timer_keep > 719974 2/526 d6578644 tcp_timer_keep > ddb> > > dmesg from core of previous crash: > > # dmesg -N bsd.0 -M bsd.0.core > OpenBSD 4.3 (GENERIC) #2: Fri May 9 20:54:13 CEST 2008 > root@...:/usr/src/sys/arch/i386/compile/GENERIC > cpu0: Intel(R) Pentium(R) 4 CPU 2.66GHz ("GenuineIntel" 686-class) 2.67 > GHz > cpu0: > FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR > real mem = 535515136 (510MB) > avail mem = 509771776 (486MB) > mainbus0 at root > bios0 at mainbus0: AT/286+ BIOS, date 02/07/05, BIOS32 rev. 0 @ 0xf0010, > SMBIOS rev. 2.3 @ 0xfbcb0 (72 entries) > bios0: vendor Intel Corp. version "WP87510A.86B.0059.P18.0502071117" > date 02/07/2005 > bios0: Intel Corporation S875WP1 > apm0 at bios0: Power Management spec V1.2 > apm0: AC on, battery charge unknown > acpi at bios0 function 0x0 not configured > pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000 > pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf3d40/224 (12 entries) > pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801EB/ER LPC" rev > 0x00) > pcibios0: PCI bus #4 is the last bus > bios0: ROM list: 0xc0000/0x8000 > cpu0 at mainbus0 > pci0 at mainbus0 bus 0: configuration mode 1 (no bios) > pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02 > agp0 at pchb0: aperture at 0xf8000000, size 0x4000000 > ppb0 at pci0 dev 1 function 0 "Intel 82875P AGP" rev 0x02 > pci1 at ppb0 bus 1 > ppb1 at pci0 dev 3 function 0 "Intel 82875P CSA" rev 0x02 > pci2 at ppb1 bus 2 > em0 at pci2 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00: irq > 10, address 00:0c:f1:8f:a6:1d > uhci0 at pci0 dev 29 function 0 "Intel 82801EB/ER USB" rev 0x02: irq 5 > uhci1 at pci0 dev 29 function 1 "Intel 82801EB/ER USB" rev 0x02: irq 9 > uhci2 at pci0 dev 29 function 2 "Intel 82801EB/ER USB" rev 0x02: irq 10 > uhci3 at pci0 dev 29 function 3 "Intel 82801EB/ER USB" rev 0x02: irq 5 > ehci0 at pci0 dev 29 function 7 "Intel 82801EB/ER USB2" rev 0x02: irq 9 > usb0 at ehci0: USB revision 2.0 > uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 > ppb2 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xc2 > pci3 at ppb2 bus 3 > ppb3 at pci3 dev 2 function 0 "Pericom PI7C21P100 PCIX-PCIX" rev 0x01 > pci4 at ppb3 bus 4 > em1 at pci4 dev 4 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03: > irq 11, address 00:0e:0c:c3:48:04 > em2 at pci4 dev 4 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03: > irq 10, address 00:0e:0c:c3:48:05 > em3 at pci4 dev 6 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03: > irq 9, address 00:0e:0c:c3:48:06 > em4 at pci4 dev 6 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03: > irq 5, address 00:0e:0c:c3:48:07 > vga1 at pci3 dev 6 function 0 "ATI Rage XL" rev 0x27 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > fxp0 at pci3 dev 8 function 0 "Intel PRO/100 VE" rev 0x01, i82562: irq > 11, address 00:0c:f1:8f:a6:1f > inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0 > ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02: > 24-bit timer at 3579545Hz > pciide0 at pci0 dev 31 function 1 "Intel 82801EB/ER IDE" rev 0x02: DMA, > channel 0 configured to compatibility, channel 1 configured to > compatibility > wd0 at pciide0 channel 0 drive 0: <HDS728080PLAT20> > wd0: 16-sector PIO, LBA48, 78533MB, 160836480 sectors > wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5 > atapiscsi0 at pciide0 channel 1 drive 0 > scsibus0 at atapiscsi0: 2 targets > cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-552E, 1.00> SCSI0 5/cdrom > removable > cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 > pciide1 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA, > channel 0 configured to native-PCI, channel 1 configured to native-PCI > pciide1: using irq 10 for native-PCI interrupt > ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: irq > 11 > iic0 at ichiic0 > adt0 at iic0 addr 0x2e: lm85 rev 0x62 > spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM non-parity PC3200CL3.0 > usb1 at uhci0: USB revision 1.0 > uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb2 at uhci1: USB revision 1.0 > uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb3 at uhci2: USB revision 1.0 > uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > usb4 at uhci3: USB revision 1.0 > uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1 > isa0 at ichpcib0 > isadma0 at isa0 > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > pmsi0 at pckbc0 (aux slot) > pckbc0: using irq 12 for aux slot > wsmouse0 at pmsi0 mux 0 > pcppi0 at isa0 port 0x61 > midi0 at pcppi0: <PC speaker> > spkr0 at pcppi0 > lpt0 at isa0 port 0x378/4 irq 7 > npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 > pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec > biomask ef65 netmask ef65 ttymask ffe7 > mtrr: Pentium Pro MTRR support > softraid0 at root > root on wd0a swap on wd0b dump on wd0b > WARNING: / was not properly unmounted > uvm_fault(0xd07e99a0, 0x0, 0, 3) -> e > kernel: page fault trap, code=0 > Stopped at pf_send_icmp+0x2b: orb $0x1,0x32(%eax) > > Vmstat from the core of the previous crash: > > # vmstat -N /usr/crash/bsd.0 -M /usr/crash/bsd.0.core -m > Memory statistics by bucket size > Size In Use Free Requests HighWater Couldfree > 16 4340 13068 659510 1280 3337 > 32 333 179 49068 640 0 > 64 2283 1365 378567 320 10951 > 128 748 84 14842 160 0 > 256 221 147 26905 80 506 > 512 399 17 5112 40 0 > 1024 2319 25 8606 20 680 > 2048 45 5 734 10 0 > 4096 32 4 26834 5 0 > 8192 12 0 12 5 0 > 16384 2 0 2 5 0 > 32768 5 0 5 5 0 > 65536 1 0 1 5 0 > > Memory usage type by bucket size > Size Type(s) > 16 devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash, > in_multi, > exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB > device, > packet tags, temp > 32 devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, proc, > VFS cluster, in_multi, ether_multi, xform_data, VM swap, UVM > amap, > USB, temp, AGP Memory > 64 devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash, > ip_moptions, > in_multi, pfkey data, UVM amap, USB, NDP, temp > 128 devbuf, routetbl, ifaddr, vnodes, dirhash, ttys, exec, UVM > amap, USB, > USB device, NDP, temp, AGP Memory > 256 devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM > map, > dirhash, proc, NFS srvsock, NFS daemon, newblk, UVM amap, USB, > USB device, temp > 512 devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash, > ttys, > exec, UVM amap, USB device, temp > 1024 devbuf, ioctlops, namecache, proc, ttys, exec, UVM amap, UVM > aobj, > crypto data, temp > 2048 devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM > amap, temp > 4096 devbuf, ioctlops, UFS mount, MSDOSFS mount, UVM amap, memdesc, > temp > 8192 devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS > mount, > inodedep > 16384 devbuf, namecache > 32768 devbuf, VM swap > 65536 VM swap > > Memory statistics by type Type Kern > Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) > devbuf 3808 2545K 2545K 39322K 3880 0 0 > 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768 > pcb 29 4K 4K 39322K 169 0 0 > 16,32,64,512 > routetbl 303 28K 44K 39322K 1642 0 0 > 16,32,64,128,256 > ifaddr 143 25K 25K 39322K 145 0 0 > 16,32,64,128,256,512,2048 > sysctl 2 1K 1K 39322K 2 0 0 16,256 > ioctlops 0 0K 4K 39322K 2737 0 0 > 256,512,1024,2048,4096 > mount 4 2K 2K 39322K 4 0 0 512 > NFS node 1 8K 8K 39322K 1 0 0 8192 > vnodes 56 8K 87K 39322K 1315 0 0 > 64,128,256 > namecache 3 25K 25K 39322K 3 0 0 > 1024,8192,16384 > UFS quota 1 8K 8K 39322K 1 0 0 8192 > UFS mount 17 35K 35K 39322K 17 0 0 > 16,32,512,2048,4096,8192 > shm 2 1K 1K 39322K 2 0 0 256,512 > VM map 4 1K 1K 39322K 4 0 0 256 > sem 2 1K 1K 39322K 2 0 0 32,64 > dirhash 195 37K 41K 39322K 459 0 0 > 16,32,64,128,256,512 > proc 15 3K 3K 39322K 15 0 0 > 32,256,1024 > VFS cluster 0 0K 1K 39322K 380 0 0 32 > NFS srvsock 1 1K 1K 39322K 1 0 0 256 > NFS daemon 1 1K 1K 39322K 1 0 0 256 > ip_moptions 5 1K 1K 39322K 5 0 0 64 > in_multi 123 5K 5K 39322K 124 0 0 16,32,64 > ether_multi 64 2K 3K 39322K 65 0 0 32 > ISOFS mount 1 8K 8K 39322K 1 0 0 8192 > MSDOSFS mount 1 4K 4K 39322K 1 0 0 4096 > ttys 420 263K 263K 39322K 420 0 0 > 128,512,1024 > exec 0 0K 2K 39322K 4679 0 0 > 16,128,512,1024 > pfkey data 1 1K 1K 39322K 2 0 0 64 > xform_data 0 0K 1K 39322K 45 0 0 16,32 > pagedep 1 2K 2K 39322K 1 0 0 2048 > inodedep 1 8K 8K 39322K 1 0 0 8192 > newblk 1 1K 1K 39322K 1 0 0 256 > VM swap 7 75K 75K 39322K 7 0 0 > 16,32,2048,32768,65536 > UVM amap 5289 306K 529K 39322K 760441 0 0 > 16,32,64,128,256,512,1024,2048,4096 > UVM aobj 2 2K 2K 39322K 2 0 0 16,1024 > USB 74 7K 7K 39322K 74 0 0 > 16,32,64,128,256 > USB device 21 9K 9K 39322K 21 0 0 > 16,128,256,512 > memdesc 1 4K 4K 39322K 1 0 0 4096 > crypto data 1 1K 1K 39322K 1 0 0 1024 > packet tags 0 0K 1K 39322K 83 0 0 16 > NDP 24 3K 3K 39322K 28 0 0 64,128 > temp 114 14K 18K 39322K 393413 0 0 > 16,32,64,128,256,512,1024,2048,4096 > AGP Memory 2 1K 1K 39322K 2 0 0 32,128 > > Memory Totals: In Use Free Requests > 3435K 402K 1170198 > Memory resource pool statistics > Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg > Maxpg Idle > phpool 32 1556 0 0 13 0 13 13 0 > 8 0 > extentpl 20 248 0 197 1 0 1 1 0 > 8 0 > pmappl 84 5679 0 5648 2 0 2 2 0 > 8 1 > vmsppl 188 5679 0 5648 5 0 5 5 0 > 8 3 > vmmpepl 88 517935 0 515492 252 12 240 241 0 > 179 179 > vmmpekpl 88 298937 0 298904 2 0 2 2 0 > 8 1 > aobjpl 52 1 0 0 1 0 1 1 0 > 8 0 > amappl 44 249175 0 247468 70 5 65 66 0 > 45 44 > anonpl 16 391737 0 386051 46 0 46 46 0 > 31 21 > bufpl 124 15106 0 11816 103 0 103 103 0 > 8 0 > mbpl 256 2533151 16215 2530831 145 0 145 145 1 > 384 0 > mclpl 2048 772776 105 770492 1142 0 1142 1142 4 > 3072 0 > sockpl 212 148172 0 148133 4 0 4 4 0 > 8 1 > procpl 344 5696 0 5648 10 0 10 10 0 > 8 5 > processpl 20 5696 0 5648 1 0 1 1 0 > 8 0 > zombiepl 72 5648 0 5648 1 0 1 1 0 > 8 1 > ucredpl 80 170 0 153 1 0 1 1 0 > 8 0 > pgrppl 24 994 0 965 1 0 1 1 0 > 8 0 > sessionpl 48 84 0 58 1 0 1 1 0 > 8 0 > pcredpl 24 5696 0 5648 1 0 1 1 0 > 8 0 > lockfpl 52 20 0 18 1 0 1 1 0 > 8 0 > filepl 88 186689 0 186584 5 0 5 5 0 > 8 2 > fdescpl 296 5697 0 5648 8 0 8 8 0 > 8 4 > pipepl 72 6438 0 6430 2 0 2 2 0 > 8 1 > kqueuepl 192 4 0 1 1 0 1 1 0 > 8 0 > knotepl 64 12 0 3 1 0 1 1 0 > 8 0 > sigapl 316 5679 0 5648 8 0 8 8 0 > 8 5 > wqtasks 20 1153 0 1153 1 0 1 1 0 > 8 1 > wdcspl 96 28209 0 28208 1 0 1 1 0 > 8 0 > scxspl 132 3 0 3 1 0 1 1 0 > 8 1 > namei 1024 57410 0 57410 3 0 3 3 0 > 8 3 > vnodes 148 3141 0 0 117 0 117 117 0 > 8 0 > nchpl 72 6495 0 4913 29 0 29 29 0 > 8 0 > ffsino 184 13477 0 10343 143 0 143 143 0 > 8 0 > dino1pl 128 13477 0 10343 102 0 102 102 0 > 8 0 > dirhash 1024 636 0 346 77 0 77 77 0 > 128 3 > pfrulepl 824 442 0 10 111 0 111 111 0 > 8 2 > pfstatepl 204 114953 0 112771 241 0 241 241 0 > 264 110 > pfstatekeypl 108 114953 0 112797 131 46 85 123 0 > 8 8 > pfpooladdrpl 68 27 0 0 1 0 1 1 0 > 8 0 > pfrktable 1240 146 0 74 48 0 48 48 0 > 334 0 > pfrkentry 156 78200 0 5914 3008 0 3008 3008 0 > 13462 0 > pfosfpen 108 1392 0 696 30 11 19 19 0 > 8 0 > pfosfp 28 814 0 407 3 0 3 3 0 > 8 0 > rtentpl 116 88 0 15 3 0 3 3 0 > 8 0 > tcpcbpl 400 933 0 920 4 0 4 4 0 > 8 2 > tcpqepl 16 27 0 27 1 0 1 1 0 > 13 1 > synpl 184 897 0 897 1 0 1 1 0 > 8 1 > plimitpl 152 660 0 647 1 0 1 1 0 > 8 0 > inpcbpl 216 148012 0 147993 3 0 3 3 0 > 8 1 > > In use 20068K, total allocated 23264K; utilization 86.3% > > -- > > Stephan A. Rickauer > > ----------------------------------------------------------- > Institute of Neuroinformatics Tel +41 44 635 30 50 > University / ETH Zurich Sec +41 44 635 30 52 > Winterthurerstrasse 190 Fax +41 44 635 30 53 > CH-8057 Zurich Web www.ini.uzh.ch |
|
|
Re: CARP node crashing reproducibly (4.3-stable)On Fri, 2008-07-11 at 17:09 +0200, Reyk Floeter wrote:
> hi stephan! o;?That was quick! Hi Reyk. > can you also show your carp configuration? Sure (just x'ed out the external IPs as well as passwords). We have a simple master/backup system: carp0: LAN carp1: DMZ carp2: WLAN carp3: Internet # cat /etc/host*.carp* inet 172.16.3.254 255.255.254.0 172.16.3.255 vhid 1 advskew 50 pass xxx carpdev em0 inet 130.60.230.xxx 255.255.255.224 130.60.230.xxx vhid 2 advskew 50 pass xxx carpdev em1 inet 192.168.91.254 255.255.255.0 192.168.91.255 vhid 4 advskew 50 pass xxx carpdev fxp0 inet 130.60.x.xx 255.255.255.252 130.60.x.xxx vhid 3 advskew 50 pass xxx carpdev em2 ('advskew 100' on backup node) # sysctl net.inet.carp net.inet.carp.allow=1 net.inet.carp.preempt=1 net.inet.carp.log=0 cat /etc/hostname.carp* # ifconfig lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33208 groups: lo inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8 em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:0c:f1:8f:a9:c4 media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet 172.16.3.252 netmask 0xfffffe00 broadcast 172.16.3.255 inet6 fe80::20c:f1ff:fe8f:a9c4%em0 prefixlen 64 scopeid 0x1 em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:0e:0c:c3:39:74 media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx inet6 fe80::20e:cff:fec3:3974%em1 prefixlen 64 scopeid 0x2 em2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:0e:0c:c3:39:75 media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet6 fe80::20e:cff:fec3:3975%em2 prefixlen 64 scopeid 0x3 em3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:0e:0c:c3:39:76 media: Ethernet autoselect (none) status: no carrier em4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:0e:0c:c3:39:77 media: Ethernet autoselect (1000baseT full-duplex,master,rxpause,txpause) status: active inet 1.1.1.252 netmask 0xff000000 broadcast 1.255.255.255 inet6 fe80::20e:cff:fec3:3977%em4 prefixlen 64 scopeid 0x5 fxp0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:0c:f1:8f:a9:c5 media: Ethernet autoselect (100baseTX full-duplex) status: active inet 192.168.91.252 netmask 0xffffff00 broadcast 192.168.91.255 inet6 fe80::20c:f1ff:fe8f:a9c5%fxp0 prefixlen 64 scopeid 0x6 enc0: flags=0<> mtu 1536 pfsync0: flags=41<UP,RUNNING> mtu 1460 pfsync: syncdev: em4 syncpeer: 1.1.1.253 maxupd: 128 groups: carp pfsync pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33208 groups: pflog carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:00:5e:00:01:01 carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 50 groups: carp inet6 fe80::200:5eff:fe00:101%carp0 prefixlen 64 scopeid 0x9 inet 172.16.3.254 netmask 0xfffffe00 broadcast 172.16.3.255 carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:00:5e:00:01:02 carp: MASTER carpdev em1 vhid 2 advbase 1 advskew 50 groups: carp inet6 fe80::200:5eff:fe00:102%carp1 prefixlen 64 scopeid 0xa inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:00:5e:00:01:04 carp: MASTER carpdev fxp0 vhid 4 advbase 1 advskew 50 groups: carp inet6 fe80::200:5eff:fe00:104%carp2 prefixlen 64 scopeid 0xb inet 192.168.91.254 netmask 0xffffff00 broadcast 192.168.91.255 carp3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:00:5e:00:01:03 carp: MASTER carpdev em2 vhid 3 advbase 1 advskew 50 groups: carp egress inet6 fe80::200:5eff:fe00:103%carp3 prefixlen 64 scopeid 0xc inet 130.60.x.xxx netmask 0xfffffffc broadcast 130.60.x.xxx I think this it ;) -- Stephan A. Rickauer ----------------------------------------------------------- Institute of Neuroinformatics Tel +41 44 635 30 50 University / ETH Zurich Sec +41 44 635 30 52 Winterthurerstrasse 190 Fax +41 44 635 30 53 CH-8057 Zurich Web www.ini.uzh.ch |
|
|
Re: CARP node crashing reproducibly (4.3-stable)Stephan A. Rickauer escreveu:
> On Fri, 2008-07-11 at 17:09 +0200, Reyk Floeter wrote: > >> hi stephan! >> > > o;?That was quick! Hi Reyk. > > >> can you also show your carp configuration? >> > > Sure (just x'ed out the external IPs as well as passwords). We have a > simple master/backup system: > > carp0: LAN > carp1: DMZ > carp2: WLAN > carp3: Internet > > # > cat /etc/host*.carp* > inet 172.16.3.254 255.255.254.0 172.16.3.255 vhid 1 advskew 50 pass xxx > carpdev em0 > inet 130.60.230.xxx 255.255.255.224 130.60.230.xxx vhid 2 advskew 50 > pass xxx carpdev em1 > inet 192.168.91.254 255.255.255.0 192.168.91.255 vhid 4 advskew 50 pass > xxx carpdev fxp0 > inet 130.60.x.xx 255.255.255.252 130.60.x.xxx vhid 3 advskew 50 pass xxx > carpdev em2 > > ('advskew 100' on backup node) > > # sysctl net.inet.carp > net.inet.carp.allow=1 > net.inet.carp.preempt=1 > net.inet.carp.log=0 > > cat /etc/hostname.carp* > > # ifconfig > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33208 > groups: lo > inet 127.0.0.1 netmask 0xff000000 > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8 > em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:0c:f1:8f:a9:c4 > media: Ethernet autoselect (1000baseT > full-duplex,rxpause,txpause) > status: active > inet 172.16.3.252 netmask 0xfffffe00 broadcast 172.16.3.255 > inet6 fe80::20c:f1ff:fe8f:a9c4%em0 prefixlen 64 scopeid 0x1 > em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:0e:0c:c3:39:74 > media: Ethernet autoselect (1000baseT > full-duplex,rxpause,txpause) > status: active > inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx > inet6 fe80::20e:cff:fec3:3974%em1 prefixlen 64 scopeid 0x2 > em2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:0e:0c:c3:39:75 > media: Ethernet autoselect (1000baseT > full-duplex,rxpause,txpause) > status: active > inet6 fe80::20e:cff:fec3:3975%em2 prefixlen 64 scopeid 0x3 > em3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:0e:0c:c3:39:76 > media: Ethernet autoselect (none) > status: no carrier > em4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:0e:0c:c3:39:77 > media: Ethernet autoselect (1000baseT > full-duplex,master,rxpause,txpause) > status: active > inet 1.1.1.252 netmask 0xff000000 broadcast 1.255.255.255 > inet6 fe80::20e:cff:fec3:3977%em4 prefixlen 64 scopeid 0x5 > fxp0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu > 1500 > lladdr 00:0c:f1:8f:a9:c5 > media: Ethernet autoselect (100baseTX full-duplex) > status: active > inet 192.168.91.252 netmask 0xffffff00 broadcast 192.168.91.255 > inet6 fe80::20c:f1ff:fe8f:a9c5%fxp0 prefixlen 64 scopeid 0x6 > enc0: flags=0<> mtu 1536 > pfsync0: flags=41<UP,RUNNING> mtu 1460 > pfsync: syncdev: em4 syncpeer: 1.1.1.253 maxupd: 128 > groups: carp pfsync > pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33208 > groups: pflog > carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:00:5e:00:01:01 > carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 50 > groups: carp > inet6 fe80::200:5eff:fe00:101%carp0 prefixlen 64 scopeid 0x9 > inet 172.16.3.254 netmask 0xfffffe00 broadcast 172.16.3.255 > carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:00:5e:00:01:02 > carp: MASTER carpdev em1 vhid 2 advbase 1 advskew 50 > groups: carp > inet6 fe80::200:5eff:fe00:102%carp1 prefixlen 64 scopeid 0xa > inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx > carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:00:5e:00:01:04 > carp: MASTER carpdev fxp0 vhid 4 advbase 1 advskew 50 > groups: carp > inet6 fe80::200:5eff:fe00:104%carp2 prefixlen 64 scopeid 0xb > inet 192.168.91.254 netmask 0xffffff00 broadcast 192.168.91.255 > carp3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:00:5e:00:01:03 > carp: MASTER carpdev em2 vhid 3 advbase 1 advskew 50 > groups: carp egress > inet6 fe80::200:5eff:fe00:103%carp3 prefixlen 64 scopeid 0xc > inet 130.60.x.xxx netmask 0xfffffffc broadcast 130.60.x.xxx > > > I think this it ;) > > > the avahi-daemon. It just started to use my gateway's ip address, causing machines in the internal net not being able to navigate to the internet. It was an openbsd 4.0, NOT using carp. I suggest you take a look to see if the avahi-daemon is running on the suse machine. If it is, shut it down and see it again. Also, try capturing some packets. My regards, -- Giancarlo Razzolini http://lock.razzolini.adm.br Linux User 172199 Red Hat Certified Engineer no:804006389722501 Verify:https://www.redhat.com/certification/rhce/current/ Moleque Sem Conteudo Numero #002 OpenBSD Stable Ubuntu 8.04 Hardy Herom 4386 2A6F FFD4 4D5F 5842 6EA0 7ABE BBAB 9C0E 6B85 |
|
|
Re: CARP node crashing reproducibly (4.3-stable)* Stephan A. Rickauer <stephan.rickauer@...> [2008-07-11 16:59]:
> Here's all data I was able to get off our crashing machine, the backup > node of our CARP cluster, that used to run flawlessly since 3.7. > > We can reproduce the problem if you follow http://www.benzedrine.cx/crashreport.html we have a chance to actually fix the bug... -- Henning Brauer, hb@..., henning@... BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam |
|
|
Re: CARP node crashing reproducibly (4.3-stable)On Fri, 2008-07-11 at 21:32 +0200, Henning Brauer wrote:
> * Stephan A. Rickauer <stephan.rickauer@...> [2008-07-11 16:59]: > > Here's all data I was able to get off our crashing machine, the backup > > node of our CARP cluster, that used to run flawlessly since 3.7. > > > > We can reproduce the problem > > if you follow http://www.benzedrine.cx/crashreport.html we have a > chance to actually fix the bug... Nice page. I'll have a look on Monday. Thanks. -- Stephan A. Rickauer ----------------------------------------------------------- Institute of Neuroinformatics Tel +41 44 635 30 50 University / ETH Zurich Sec +41 44 635 30 52 Winterthurerstrasse 190 Fax +41 44 635 30 53 CH-8057 Zurich Web www.ini.uzh.ch |
|
|
Re: CARP node crashing reproducibly (4.3-stable)-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Henning Brauer wrote: | * Stephan A. Rickauer <stephan.rickauer@...> [2008-07-11 16:59]: |> Here's all data I was able to get off our crashing machine, the backup |> node of our CARP cluster, that used to run flawlessly since 3.7. |> |> We can reproduce the problem | | if you follow http://www.benzedrine.cx/crashreport.html we have a | chance to actually fix the bug... | Hello, I'm a colleague of Stephan Rickauer and I've been taking a look at this problem. It's a NULL pointer bug! dmesg shows kernel: page fault trap, code=0 Stopped at pf_send_icmp+0x2b: orb and ddb trace shows: $0x1,0x32(%eax)pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8) at pf_send_icmp+0x2b ddb registers shows (among others): eax 0 eip 0xd02f56db pf_send_icmp+0x2b and helpfully disassembles the faulting instruction thus: pf_send_icmp+0x2b: orb $0x1,0x32(%eax) which is from line 1726 in pf_send_icmp() in pf.c: m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED; The beginning of this function (up to the line with the or) is as follows: pf_send_icmp(struct mbuf *m, u_int8_t type, u_int8_t code, sa_family_t af, ~ struct pf_rule *r) { struct mbuf *m0; m0 = m_copy(m, 0, M_COPYALL); m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED; Thus we have m_copy (actually m_copym, since m_copy is a macro defined in /usr/src/sys/sys/mbuf.h in terms of m_copym, which itself is a one-line wrapper around m_copym0) returning a NULL pointer in eax (= m0) and the subsequent OR getting a page fault when it tries to use it. Looking at m_copym0, it looks like it can legitimately fail and return NULL (it even increments a global variable MCFail when it does so) and therefore the bug is that its return value is not being checked in pf_send_icmp. As far as I can see, the precise nature of the packet being handled at the time of the crash is not important. Using ddb on the crashed machine, it looks as if the packet being handled at the time is a (relatively) innocent UDP broadcast as follows: IP header: 45 0 0 1d 0 0 0 0 40 11 1b a2 ac 10 3 f ac 10 3 ff ip header length = 5 32-bit words length = 29 id = 0 flags = 0 fragmentation offset = 0 TTL = 64 Protocol = 17, UDP Source address = 172.16.3.15 (zynapse.lan.ini.uzh.ch) Dest address = 172.16.3.255 UDP header: bb b5 22 3d 0 9 a5 ba source port = bbb5 = 48053 dest port = 223d = 8765 (Ultraseek HTTP ?) length = 9 Data: 1d Adrian - -- Adrian M. Whatley Universitaet/ETH Zuerich, Institut fuer Neuroinformatik, Winterthurerstrasse 190, CH-8057 Zuerich, Switzerland. Phone: +41 44 635 3067 Fax: +41 44 635 3053 Email: amw@... WWW: http://www.ini.uzh.ch/~amw/ Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFIeyy7Lgk3RqYSp9YRAlgfAJ4wYygStPwwScv9eScXXjIRtwc4oQCghkTb rUhs3B5ZZPkyMQwXxyg9Xys= =0Dyq -----END PGP SIGNATURE----- |
|
|
Re: CARP node crashing reproducibly (4.3-stable)* Adrian M. Whatley <amw@...> [2008-07-14 13:54]:
> It's a NULL pointer bug! > which is from line 1726 in pf_send_icmp() in pf.c: > > m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED; > Looking at m_copym0, it looks like it can legitimately fail and return > NULL (it even increments a global variable MCFail when it does so) and > therefore the bug is that its return value is not being checked in > pf_send_icmp. perfect analysis! looks like the only sane thing to do in that case is to bail and not send the icmp. Index: pf.c =================================================================== RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.609 diff -u -p -r1.609 pf.c --- pf.c 10 Jul 2008 07:41:21 -0000 1.609 +++ pf.c 14 Jul 2008 12:20:27 -0000 @@ -1819,7 +1819,9 @@ pf_send_icmp(struct mbuf *m, u_int8_t ty { struct mbuf *m0; - m0 = m_copy(m, 0, M_COPYALL); + if ((m0 = m_copy(m, 0, M_COPYALL)) == NULL) + return; + m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED; if (r->rtableid >= 0) -- Henning Brauer, hb@..., henning@... BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam |
|
|
Re: CARP node crashing reproducibly (4.3-stable)On Mon, 2008-07-14 at 14:22 +0200, Henning Brauer wrote:
> perfect analysis! > > looks like the only sane thing to do in that case is to bail and not > send the icmp. I've compiled a new kernel with the patch. The machine is no longer crashing on pf_send_icmp(). However, I now see memory leaking until the machine locks up (it doesn't crash but its network becomes unusable). Unfortunately, it then also puts all CARP interfaces in MASTER state, though the other node works perfectly as master already. This will, of course, knock down our entire network until I manually put down the carp interfaces. I have increased kern.maxclusters to gain more time for debugging of the memory leak. However, all I could find out so far is that lots of mbufs are allocated while there is no significant traffic to be handled (remember the machine is the CARP backup). The machine crashes within 15 minutes after reboot. Because of the line wrapping in this email, I've also put the output of netstat and vmstat online) http://www.ini.uzh.ch/~stephan/vmstat+netstat.txt # vmstat -m Memory statistics by bucket size Size In Use Free Requests HighWater Couldfree 16 3549 10275 304244 1280 7725 32 303 209 51063 640 0 64 2968 360 93244 320 89 128 511 65 5665 160 0 256 189 131 12817 80 1065 512 351 9 3326 40 0 1024 2313 11 3302 20 0 2048 33 1 1536 10 0 4096 28 1 6834 5 0 8192 12 0 12 5 0 16384 6 0 6 5 0 32768 5 0 5 5 0 65536 1 0 1 5 0 Memory usage type by bucket size Size Type(s) 16 devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash, in_multi, exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB device, packet tags, temp 32 devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, proc, VFS cluster, in_multi, ether_multi, xform_data, VM swap, UVM amap, USB, temp, AGP Memory 64 devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash, ip_moptions, in_multi, pfkey data, UVM amap, USB, NDP, temp 128 devbuf, routetbl, ifaddr, vnodes, ttys, exec, UVM amap, USB, USB device, NDP, temp, AGP Memory 256 devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM map, proc, NFS srvsock, NFS daemon, newblk, UVM amap, USB, USB device, temp 512 devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash, ttys, exec, UVM amap, USB device, temp 1024 devbuf, ioctlops, namecache, proc, ttys, exec, UVM amap, UVM aobj, crypto data, temp 2048 devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM amap, temp 4096 devbuf, ioctlops, UFS mount, MSDOSFS mount, memdesc, temp 8192 devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount, inodedep 16384 devbuf, namecache, UVM amap 32768 devbuf, VM swap 65536 VM swap Memory statistics by type Type Kern Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) devbuf 3808 2545K 2545K 39322K 3880 0 0 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768 pcb 30 4K 4K 39322K 78 0 0 16,32,64,512 routetbl 280 27K 44K 39322K 1400 0 0 16,32,64,128,256 ifaddr 143 25K 25K 39322K 145 0 0 16,32,64,128,256,512,2048 sysctl 2 1K 1K 39322K 2 0 0 16,256 ioctlops 0 0K 4K 39322K 5457 0 0 256,512,1024,2048,4096 mount 4 2K 2K 39322K 4 0 0 512 NFS node 1 8K 8K 39322K 1 0 0 8192 vnodes 1256 83K 87K 39322K 1312 0 0 64,128,256 namecache 3 25K 25K 39322K 3 0 0 1024,8192,16384 UFS quota 1 8K 8K 39322K 1 0 0 8192 UFS mount 17 35K 35K 39322K 17 0 0 16,32,512,2048,4096,8192 shm 2 1K 1K 39322K 2 0 0 256,512 VM map 4 1K 1K 39322K 4 0 0 256 sem 2 1K 1K 39322K 2 0 0 32,64 dirhash 30 6K 6K 39322K 30 0 0 16,32,64,512 proc 15 3K 3K 39322K 15 0 0 32,256,1024 VFS cluster 0 0K 1K 39322K 26 0 0 32 NFS srvsock 1 1K 1K 39322K 1 0 0 256 NFS daemon 1 1K 1K 39322K 1 0 0 256 ip_moptions 5 1K 1K 39322K 5 0 0 64 in_multi 123 5K 5K 39322K 124 0 0 16,32,64 ether_multi 64 2K 3K 39322K 65 0 0 32 ISOFS mount 1 8K 8K 39322K 1 0 0 8192 MSDOSFS mount 1 4K 4K 39322K 1 0 0 4096 ttys 420 263K 263K 39322K 420 0 0 128,512,1024 exec 0 0K 2K 39322K 3090 0 0 16,128,512,1024 pfkey data 1 1K 1K 39322K 2 0 0 64 xform_data 0 0K 1K 39322K 18 0 0 16,32 pagedep 1 2K 2K 39322K 1 0 0 2048 inodedep 1 8K 8K 39322K 1 0 0 8192 newblk 1 1K 1K 39322K 1 0 0 256 VM swap 7 75K 75K 39322K 7 0 0 16,32,2048,32768,65536 UVM amap 3819 233K 349K 39322K 349090 0 0 16,32,64,128,256,512,1024,2048,16384 UVM aobj 2 2K 2K 39322K 2 0 0 16,1024 USB 74 7K 7K 39322K 74 0 0 16,32,64,128,256 USB device 21 9K 9K 39322K 21 0 0 16,128,256,512 memdesc 1 4K 4K 39322K 1 0 0 4096 crypto data 1 1K 1K 39322K 1 0 0 1024 packet tags 0 0K 1K 39322K 8 0 0 16 NDP 24 3K 3K 39322K 28 0 0 64,128 temp 112 14K 18K 39322K 116753 0 0 16,32,64,128,256,512,1024,2048,4096 AGP Memory 2 1K 1K 39322K 2 0 0 32,128 Memory Totals: In Use Free Requests 3405K 252K 482097 Memory resource pool statistics Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle extentpl 20 248 0 197 1 0 1 1 0 8 0 phpool 32 36864 0 0 291 0 291 291 0 8 0 pmappl 84 2469 0 2443 2 0 2 2 0 8 1 vmsppl 188 2469 0 2443 4 0 4 4 0 8 2 vmmpepl 88 207056 0 205462 106 0 106 106 0 179 71 vmmpekpl 88 59313 0 59192 3 0 3 3 0 8 0 aobjpl 52 1 0 0 1 0 1 1 0 8 0 amappl 44 113985 0 112738 50 0 50 50 0 45 30 anonpl 16 161598 0 156491 34 0 34 34 0 31 11 bufpl 124 2817 0 123 85 0 85 85 0 8 0 mbpl 256 618808 0 553629 4075 0 4075 4075 1 4096 1 mclpl 2048 185575 0 120403 32591 0 32591 32591 4 32768 4 sockpl 212 1944 0 1906 3 0 3 3 0 8 0 procpl 344 2486 0 2443 9 0 9 9 0 8 4 processpl 20 2486 0 2443 1 0 1 1 0 8 0 zombiepl 72 2443 0 2443 1 0 1 1 0 8 1 ucredpl 80 848 0 836 1 0 1 1 0 8 0 pgrppl 24 1448 0 1424 1 0 1 1 0 8 0 sessionpl 48 33 0 10 1 0 1 1 0 8 0 pcredpl 24 2486 0 2443 1 0 1 1 0 8 0 lockfpl 52 12 0 10 1 0 1 1 0 8 0 filepl 88 15335 0 15245 4 0 4 4 0 8 2 fdescpl 296 2487 0 2443 8 0 8 8 0 8 4 pipepl 72 1246 0 1242 2 0 2 2 0 8 1 kqueuepl 192 3 0 0 1 0 1 1 0 8 0 knotepl 64 9 0 0 1 0 1 1 0 8 0 sigapl 316 2469 0 2443 7 0 7 7 0 8 4 wqtasks 20 227 0 227 1 0 1 1 0 8 1 wdcspl 96 3416 0 3416 1 0 1 1 0 8 1 scxspl 132 3 0 3 1 0 1 1 0 8 1 namei 1024 26059 0 26059 2 0 2 2 0 8 2 vnodes 148 1582 0 0 59 0 59 59 0 8 0 nchpl 72 1636 0 57 29 0 29 29 0 8 0 ffsino 184 1664 0 91 72 0 72 72 0 8 0 dino1pl 128 1664 0 91 51 0 51 51 0 8 0 dirhash 1024 37 0 0 10 0 10 10 0 128 0 pfrulepl 824 442 0 10 111 0 111 111 0 8 2 pfstatepl 204 23449 0 21322 173 0 173 173 0 264 30 pfstatekeypl 108 23449 0 21374 86 5 81 86 0 8 8 pfpooladdrpl 68 27 0 0 1 0 1 1 0 8 0 pfrktable 1240 144 0 72 48 0 48 48 0 334 0 pfrkentry 156 1089 0 0 42 0 42 42 0 13462 0 pfosfpen 108 1392 0 696 30 11 19 19 0 8 0 pfosfp 28 814 0 407 3 0 3 3 0 8 0 rtentpl 116 74 0 4 2 0 2 2 0 8 0 tcpcbpl 400 190 0 179 3 0 3 3 0 8 1 tcpqepl 16 6 0 6 1 0 1 1 0 13 1 synpl 184 185 0 185 1 0 1 1 0 8 1 plimitpl 152 133 0 122 1 0 1 1 0 8 0 inpcbpl 216 1875 0 1858 2 0 2 2 0 8 1 # netstat -m In use 150665K, total allocated 154056K; utilization 97.8% 65183 mbufs in use: 65178 mbufs allocated to data 1 mbuf allocated to packet headers 4 mbufs allocated to socket names and addresses 65178/65186/65536 mbuf clusters in use (current/peak/max) 146676 Kbytes allocated to network (14% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines |
|
|
Re: CARP node crashing reproducibly (4.3-stable)* Stephan A. Rickauer <stephan.rickauer@...> [2008-07-14 17:27]:
> On Mon, 2008-07-14 at 14:22 +0200, Henning Brauer wrote: > > perfect analysis! > > > > looks like the only sane thing to do in that case is to bail and not > > send the icmp. > > I've compiled a new kernel with the patch. The machine is no longer > crashing on pf_send_icmp(). However, I now see memory leaking until the > machine locks up (it doesn't crash but its network becomes unusable). > Unfortunately, it then also puts all CARP interfaces in MASTER state, > though the other node works perfectly as master already. This will, of > course, knock down our entire network until I manually put down the carp > interfaces. > > I have increased kern.maxclusters to gain more time for debugging of the > memory leak. However, all I could find out so far is that lots of mbufs > are allocated while there is no significant traffic to be handled > (remember the machine is the CARP backup). The machine crashes within 15 > minutes after reboot. ok that is weird. icmp_error as called in pf_send_icmp does not m_free anything but the passed mbuf, and we now just bail if tghe allocation of it fails. so i have a hard time seeing this as related... might be something completely different. and finding mbuf leaks tends to be damn hard and following a lot of code... -- Henning Brauer, hb@..., henning@... BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam |
|
|
Re: CARP node crashing reproducibly (4.3-stable)On Mon, 2008-07-14 at 17:38 +0200, Henning Brauer wrote:
> > I have increased kern.maxclusters to gain more time for debugging of the > > memory leak. However, all I could find out so far is that lots of mbufs > > are allocated while there is no significant traffic to be handled > > (remember the machine is the CARP backup). The machine crashes within 15 > > minutes after reboot. > > ok that is weird. icmp_error as called in pf_send_icmp does not m_free > anything but the passed mbuf, and we now just bail if tghe allocation > of it fails. so i have a hard time seeing this as related... Yes, you are right. The leak we've seen is due to a kernel build we must have introduced by using an unclean source tree. Problem solved. However, the patch you've implemented in 1.610 of pf.c does fix the crashes we've seen before. Thanks a lot! -- Stephan A. Rickauer ----------------------------------------------------------- Institute of Neuroinformatics Tel +41 44 635 30 50 University / ETH Zurich Sec +41 44 635 30 52 Winterthurerstrasse 190 Fax +41 44 635 30 53 CH-8057 Zurich Web www.ini.uzh.ch |
| Free embeddable forum powered by Nabble | Forum Help |