CARP node crashing reproducibly (4.3-stable)

View: New views
11 Messages — Rating Filter:   Alert me  

CARP node crashing reproducibly (4.3-stable)

by Stephan A. Rickauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

Here's all data I was able to get off our crashing machine, the backup
node of our CARP cluster, that used to run flawlessly since 3.7.

We can reproduce the problem by (no joke) installing an openSUSE 10.3
machine in one of our labs over the network. After 40 minutes, our
backup firewall crashes. Sounds preposterous, I know... We've not had
time to examined what packets are exactly sent out on the network by
this machine, yet.

The crashed machine is still in ddb, so just asked if I should execute
some more commands.

Should I rather file a bug report? I never know when I should just ask
here or rather file one, sorry... but thanks for your help anyway!

ddb> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 15843   7703   7703    556  2       0x100                nrpe
*27717      1   7703    556  7       0x100                nrpe
 26537  24561  27862      0  3      0x4082  ttyin         more
 24561  27862  27862      0  3      0x4082  pause         sh
 27862   3290  27862      0  3      0x4082  wait          man
 10574   1244  10574      0  3      0x4082  ttyin         ksh
  1244  21740   1244      0  3      0x4180  select        sshd
 19759  15807  19759      0  3      0x4082  ttyin         ksh
 15807  21740  15807      0  3      0x4180  select        sshd
  3290      1   3290      0  3      0x4082  pause         ksh
  2463      1   2463      0  3      0x4082  ttyin         getty
  4032      1   4032      0  3      0x4082  ttyin         getty
 29698      1  29698      0  3      0x4082  ttyin         getty
 25598      1  25598      0  3      0x4082  ttyin         getty
  2451      1   2451      0  3      0x4082  ttyin         getty
 26554      1  26554      0  3        0x80  poll          ntpd
  7819      1   7819      0  3        0x80  select        cron
 21981      1  21981      0  3        0x80  kqread        apmd
  7703      1   7703    556  3       0x180  wait          nrpe
 21908      1   7188      0  3        0x80  select        snmpd
 20436      1  20436     83  3       0x180  poll          ntpd
 10622      1  10622      0  3     0x40180  select        sendmail
 31903      1  31903     62  3       0x180  select        spamd
 21740      1  21740      0  3        0x80  select        sshd
 14513      1  14513     71  3       0x180  kqread        ftp-proxy
 22495      1  22495     77  3       0x180  poll          dhcrelay
  9478      1   9478      0  2        0x80                ifstated
 17920  19868  19868     74  2       0x180                pflogd
 19868      1  19868      0  3        0x80  netio         pflogd
   656  20418  20418     73  2       0x180                syslogd
 20418      1  20418      0  3        0x88  netio         syslogd
    18      0      0      0  3    0x100200  bored         crypto
    17      0      0      0  3    0x100200  aiodoned      aiodoned
    16      0      0      0  3    0x100200  syncer        update
    15      0      0      0  3    0x100200  cleaner       cleaner
    14      0      0      0  3    0x100200  reaper        reaper
    13      0      0      0  3    0x100200  pgdaemon      pagedaemon
    12      0      0      0  3    0x100200  pftm          pfpurge
    11      0      0      0  3    0x100200  usbevt        usb4
    10      0      0      0  3    0x100200  usbevt        usb3
     9      0      0      0  3    0x100200  usbevt        usb2
     8      0      0      0  3    0x100200  usbevt        usb1
     7      0      0      0  3    0x100200  usbtsk        usbtask
     6      0      0      0  3    0x100200  usbevt        usb0
     5      0      0      0  3    0x100200  apmev         apm0
     4      0      0      0  3    0x100200  bored         syswq
     3      0      0      0  3    0x100200                idle0
     2      0      0      0  2    0x100200                kmthread
     1      0      1      0  3      0x4080  wait          init
     0     -1      0      0  3     0x80200  scheduler     swapper

ddb> trace
pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8) at
pf_send_icmp+0x2b
pf_test_rule(db2a4e68,db2a4e60,1,d115d500,d62f3200) at pf_test_rule
+0xc66
pf_test(1,d1447800,db2a4f88,0) at pf_test+0x941
ipv4_input(d62f3200,d03c5198,50,286,0) at ipv4_input+0x11d
ipintr(27,12c0027,7db80027,cfbd0027,82ec2) at ipintr+0x70
Bad frame pointer: 0xdb2a4fa0

ddb> show registers
ds            0xd0360010        shmget_existing+0x4c
es            0xdb2a0010        end+0xaa06f1c
fs                  0x58
gs                  0x10
edi                  0x3
esi                  0x2
ebp           0xdb2a4d60        end+0xaa0bc6c
ebx           0xd67191b8        end+0x5e800c4
edx                  0x4
ecx           0xd07ef600        mbpool
eax                    0
eip           0xd02f56db        pf_send_icmp+0x2b
cs                  0x50
eflags           0x10246
esp           0xdb2a4d38        end+0xaa0bc44
ss            0xdb2a0010        end+0xaa06f1c
pf_send_icmp+0x2b:      orb     $0x1,0x32(%eax)

ddb> show panic
the kernel did not panic

ddb> show all callout
ticks now: 229505
    ticks  wheel       arg  func
      -24  4/1024 d07c9314  nfs_timer
      -22  4/1024 d07a44ec  pfslowtimo
      -22  4/1024 d11a0a00  uhci_poll_hub
      -22  4/1024 d11a0800  uhci_poll_hub
      -22  4/1024 d11a0600  uhci_poll_hub
      -22  4/1024 d11a0400  uhci_poll_hub
      -20  4/1024 d118f000  fxp_stats_update
      -19  4/1024 d6636414  endtsleep
      -19  4/1024 d670c160  endtsleep
      -18  4/1024 d07a44d4  pffasttimo
   719985  4/1024 d6578e14  tcp_timer_keep
      -15  4/1024 d6582000  syn_cache_reaper
        5  4/1024 d6578e14  tcp_delack
      286  4/1024 d6582228  syn_cache_timer
      -12  4/1024 d1175000  em_local_timer
   719988  4/1024 d65787d4  tcp_timer_keep
      -12  4/1024 d65822e0  syn_cache_reaper
        8  4/1024 d65787d4  tcp_delack
      -11  4/1024 d670c818  endtsleep
       -8  4/1024 d117a800  em_local_timer
       -7  4/1024 d65784b4  tcp_delack
       93  4/1024 d145e800  pfsync_timeout
       -6  4/1024 d6578644  tcp_delack
       21  0/150  d663600c  endtsleep
       21  0/150  d07e1f40  pckbc_poll
       21  0/150  d07a4528  if_slowtimo
       21  0/150         0  nd6_timer
       21  0/150  d07a5f90  rt_timer_timer
       21  0/150  d07a4294  schedcpu
       25  0/154  d6a5d004  endtsleep
      298  0/185  d144a800  carp_master_down
      298  0/185  d1225600  carp_master_down
      298  0/185  d144b400  carp_master_down
      298  0/185  d144d200  carp_master_down
       60  0/189  d1179800  em_local_timer
       62  0/191  d1170800  em_local_timer
      171  1/385  d670c2b8  endtsleep
      211  1/385  d670c970  endtsleep
      225  1/385  d11a1d40  sensor_task_tick
      770  1/387  d6636164  endtsleep
      882  1/387  d670c160  realitexpire
     2218  1/393  d6a5dd74  realitexpire
     4539  1/402  d6a636b8  endtsleep
     4539  1/402  d6a63968  endtsleep
     4539  1/402  d6a63ac0  endtsleep
     4539  1/402  d6a63c18  endtsleep
     4539  1/402  d6a63d70  endtsleep
    10894  1/427         0  arc4_reinit
    10994  1/427  d07a63c0  arptimer
    29973  1/501  d64ccc2c  realitexpire
   130495  2/517         0  nd6_cache_lladdr
   100158  2/517  d66362bc  endtsleep
   131324  2/517  d663600c  realitexpire
   719098  2/523  d665de10  tcp_timer_keep
   714623  2/526  d665dc80  tcp_timer_keep
   719973  2/526  d65784b4  tcp_timer_keep
   719974  2/526  d6578644  tcp_timer_keep
ddb>

dmesg from core of previous crash:
                             
# dmesg -N bsd.0 -M bsd.0.core    
OpenBSD 4.3 (GENERIC) #2: Fri May  9 20:54:13 CEST 2008
    root@...:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 2.66GHz ("GenuineIntel" 686-class) 2.67
GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
real mem  = 535515136 (510MB)
avail mem = 509771776 (486MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 02/07/05, BIOS32 rev. 0 @ 0xf0010,
SMBIOS rev. 2.3 @ 0xfbcb0 (72 entries)
bios0: vendor Intel Corp. version "WP87510A.86B.0059.P18.0502071117"
date 02/07/2005
bios0: Intel Corporation S875WP1
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf3d40/224 (12 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801EB/ER LPC" rev
0x00)
pcibios0: PCI bus #4 is the last bus
bios0: ROM list: 0xc0000/0x8000
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02
agp0 at pchb0: aperture at 0xf8000000, size 0x4000000
ppb0 at pci0 dev 1 function 0 "Intel 82875P AGP" rev 0x02
pci1 at ppb0 bus 1
ppb1 at pci0 dev 3 function 0 "Intel 82875P CSA" rev 0x02
pci2 at ppb1 bus 2
em0 at pci2 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00: irq
10, address 00:0c:f1:8f:a6:1d
uhci0 at pci0 dev 29 function 0 "Intel 82801EB/ER USB" rev 0x02: irq 5
uhci1 at pci0 dev 29 function 1 "Intel 82801EB/ER USB" rev 0x02: irq 9
uhci2 at pci0 dev 29 function 2 "Intel 82801EB/ER USB" rev 0x02: irq 10
uhci3 at pci0 dev 29 function 3 "Intel 82801EB/ER USB" rev 0x02: irq 5
ehci0 at pci0 dev 29 function 7 "Intel 82801EB/ER USB2" rev 0x02: irq 9
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb2 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xc2
pci3 at ppb2 bus 3
ppb3 at pci3 dev 2 function 0 "Pericom PI7C21P100 PCIX-PCIX" rev 0x01
pci4 at ppb3 bus 4
em1 at pci4 dev 4 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
irq 11, address 00:0e:0c:c3:48:04
em2 at pci4 dev 4 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
irq 10, address 00:0e:0c:c3:48:05
em3 at pci4 dev 6 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
irq 9, address 00:0e:0c:c3:48:06
em4 at pci4 dev 6 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
irq 5, address 00:0e:0c:c3:48:07
vga1 at pci3 dev 6 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
fxp0 at pci3 dev 8 function 0 "Intel PRO/100 VE" rev 0x01, i82562: irq
11, address 00:0c:f1:8f:a6:1f
inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0
ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02:
24-bit timer at 3579545Hz
pciide0 at pci0 dev 31 function 1 "Intel 82801EB/ER IDE" rev 0x02: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
wd0 at pciide0 channel 0 drive 0: <HDS728080PLAT20>
wd0: 16-sector PIO, LBA48, 78533MB, 160836480 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-552E, 1.00> SCSI0 5/cdrom
removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
pciide1 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide1: using irq 10 for native-PCI interrupt
ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: irq
11
iic0 at ichiic0
adt0 at iic0 addr 0x2e: lm85 rev 0x62
spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM non-parity PC3200CL3.0
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb4 at uhci3: USB revision 1.0
uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask ef65 netmask ef65 ttymask ffe7
mtrr: Pentium Pro MTRR support
softraid0 at root
root on wd0a swap on wd0b dump on wd0b
WARNING: / was not properly unmounted
uvm_fault(0xd07e99a0, 0x0, 0, 3) -> e
kernel: page fault trap, code=0
Stopped at      pf_send_icmp+0x2b:      orb     $0x1,0x32(%eax)

Vmstat from the core of the previous crash:

# vmstat -N /usr/crash/bsd.0 -M /usr/crash/bsd.0.core -m
Memory statistics by bucket size
    Size   In Use   Free           Requests  HighWater  Couldfree
      16     4340  13068             659510    1280       3337
      32      333    179              49068     640          0
      64     2283   1365             378567     320      10951
     128      748     84              14842     160          0
     256      221    147              26905      80        506
     512      399     17               5112      40          0
    1024     2319     25               8606      20        680
    2048       45      5                734      10          0
    4096       32      4              26834       5          0
    8192       12      0                 12       5          0
   16384        2      0                  2       5          0
   32768        5      0                  5       5          0
   65536        1      0                  1       5          0

Memory usage type by bucket size
    Size  Type(s)
      16  devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash,
in_multi,
          exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB
device,
          packet tags, temp
      32  devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, proc,
          VFS cluster, in_multi, ether_multi, xform_data, VM swap, UVM
amap,
          USB, temp, AGP Memory
      64  devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash,
ip_moptions,
          in_multi, pfkey data, UVM amap, USB, NDP, temp
     128  devbuf, routetbl, ifaddr, vnodes, dirhash, ttys, exec, UVM
amap, USB,
          USB device, NDP, temp, AGP Memory
     256  devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM
map,
          dirhash, proc, NFS srvsock, NFS daemon, newblk, UVM amap, USB,
          USB device, temp
     512  devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash,
ttys,
          exec, UVM amap, USB device, temp
    1024  devbuf, ioctlops, namecache, proc, ttys, exec, UVM amap, UVM
aobj,
          crypto data, temp
    2048  devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM
amap, temp
    4096  devbuf, ioctlops, UFS mount, MSDOSFS mount, UVM amap, memdesc,
temp
    8192  devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS
mount,
          inodedep
   16384  devbuf, namecache
   32768  devbuf, VM swap
   65536  VM swap

Memory statistics by type                           Type  Kern
          Type InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
        devbuf  3808  2545K   2545K 39322K     3880    0     0
16,32,64,128,256,512,1024,2048,4096,8192,16384,32768
           pcb    29     4K      4K 39322K      169    0     0
16,32,64,512
      routetbl   303    28K     44K 39322K     1642    0     0
16,32,64,128,256
        ifaddr   143    25K     25K 39322K      145    0     0
16,32,64,128,256,512,2048
        sysctl     2     1K      1K 39322K        2    0     0  16,256
      ioctlops     0     0K      4K 39322K     2737    0     0
256,512,1024,2048,4096
         mount     4     2K      2K 39322K        4    0     0  512
      NFS node     1     8K      8K 39322K        1    0     0  8192
        vnodes    56     8K     87K 39322K     1315    0     0
64,128,256
     namecache     3    25K     25K 39322K        3    0     0
1024,8192,16384
     UFS quota     1     8K      8K 39322K        1    0     0  8192
     UFS mount    17    35K     35K 39322K       17    0     0
16,32,512,2048,4096,8192
           shm     2     1K      1K 39322K        2    0     0  256,512
        VM map     4     1K      1K 39322K        4    0     0  256
           sem     2     1K      1K 39322K        2    0     0  32,64
       dirhash   195    37K     41K 39322K      459    0     0
16,32,64,128,256,512
          proc    15     3K      3K 39322K       15    0     0
32,256,1024
   VFS cluster     0     0K      1K 39322K      380    0     0  32
   NFS srvsock     1     1K      1K 39322K        1    0     0  256
    NFS daemon     1     1K      1K 39322K        1    0     0  256
   ip_moptions     5     1K      1K 39322K        5    0     0  64
      in_multi   123     5K      5K 39322K      124    0     0  16,32,64
   ether_multi    64     2K      3K 39322K       65    0     0  32
   ISOFS mount     1     8K      8K 39322K        1    0     0  8192
 MSDOSFS mount     1     4K      4K 39322K        1    0     0  4096
          ttys   420   263K    263K 39322K      420    0     0
128,512,1024
          exec     0     0K      2K 39322K     4679    0     0
16,128,512,1024
    pfkey data     1     1K      1K 39322K        2    0     0  64
    xform_data     0     0K      1K 39322K       45    0     0  16,32
       pagedep     1     2K      2K 39322K        1    0     0  2048
      inodedep     1     8K      8K 39322K        1    0     0  8192
        newblk     1     1K      1K 39322K        1    0     0  256
       VM swap     7    75K     75K 39322K        7    0     0
16,32,2048,32768,65536
      UVM amap  5289   306K    529K 39322K   760441    0     0
16,32,64,128,256,512,1024,2048,4096
      UVM aobj     2     2K      2K 39322K        2    0     0  16,1024
           USB    74     7K      7K 39322K       74    0     0
16,32,64,128,256
    USB device    21     9K      9K 39322K       21    0     0
16,128,256,512
       memdesc     1     4K      4K 39322K        1    0     0  4096
   crypto data     1     1K      1K 39322K        1    0     0  1024
   packet tags     0     0K      1K 39322K       83    0     0  16
           NDP    24     3K      3K 39322K       28    0     0  64,128
          temp   114    14K     18K 39322K   393413    0     0
16,32,64,128,256,512,1024,2048,4096
    AGP Memory     2     1K      1K 39322K        2    0     0  32,128

Memory Totals:  In Use    Free    Requests
                 3435K    402K     1170198
Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idle
phpool        32     1556    0        0    13     0    13    13     0
8    0
extentpl      20      248    0      197     1     0     1     1     0
8    0
pmappl        84     5679    0     5648     2     0     2     2     0
8    1
vmsppl       188     5679    0     5648     5     0     5     5     0
8    3
vmmpepl       88   517935    0   515492   252    12   240   241     0
179  179
vmmpekpl      88   298937    0   298904     2     0     2     2     0
8    1
aobjpl        52        1    0        0     1     0     1     1     0
8    0
amappl        44   249175    0   247468    70     5    65    66     0
45   44
anonpl        16   391737    0   386051    46     0    46    46     0
31   21
bufpl        124    15106    0    11816   103     0   103   103     0
8    0
mbpl         256  2533151 16215 2530831   145     0   145   145     1
384    0
mclpl       2048   772776  105   770492  1142     0  1142  1142     4
3072    0
sockpl       212   148172    0   148133     4     0     4     4     0
8    1
procpl       344     5696    0     5648    10     0    10    10     0
8    5
processpl     20     5696    0     5648     1     0     1     1     0
8    0
zombiepl      72     5648    0     5648     1     0     1     1     0
8    1
ucredpl       80      170    0      153     1     0     1     1     0
8    0
pgrppl        24      994    0      965     1     0     1     1     0
8    0
sessionpl     48       84    0       58     1     0     1     1     0
8    0
pcredpl       24     5696    0     5648     1     0     1     1     0
8    0
lockfpl       52       20    0       18     1     0     1     1     0
8    0
filepl        88   186689    0   186584     5     0     5     5     0
8    2
fdescpl      296     5697    0     5648     8     0     8     8     0
8    4
pipepl        72     6438    0     6430     2     0     2     2     0
8    1
kqueuepl     192        4    0        1     1     0     1     1     0
8    0
knotepl       64       12    0        3     1     0     1     1     0
8    0
sigapl       316     5679    0     5648     8     0     8     8     0
8    5
wqtasks       20     1153    0     1153     1     0     1     1     0
8    1
wdcspl        96    28209    0    28208     1     0     1     1     0
8    0
scxspl       132        3    0        3     1     0     1     1     0
8    1
namei       1024    57410    0    57410     3     0     3     3     0
8    3
vnodes       148     3141    0        0   117     0   117   117     0
8    0
nchpl         72     6495    0     4913    29     0    29    29     0
8    0
ffsino       184    13477    0    10343   143     0   143   143     0
8    0
dino1pl      128    13477    0    10343   102     0   102   102     0
8    0
dirhash     1024      636    0      346    77     0    77    77     0
128    3
pfrulepl     824      442    0       10   111     0   111   111     0
8    2
pfstatepl    204   114953    0   112771   241     0   241   241     0
264  110
pfstatekeypl 108   114953    0   112797   131    46    85   123     0
8    8
pfpooladdrpl  68       27    0        0     1     0     1     1     0
8    0
pfrktable   1240      146    0       74    48     0    48    48     0
334    0
pfrkentry    156    78200    0     5914  3008     0  3008  3008     0
13462    0
pfosfpen     108     1392    0      696    30    11    19    19     0
8    0
pfosfp        28      814    0      407     3     0     3     3     0
8    0
rtentpl      116       88    0       15     3     0     3     3     0
8    0
tcpcbpl      400      933    0      920     4     0     4     4     0
8    2
tcpqepl       16       27    0       27     1     0     1     1     0
13    1
synpl        184      897    0      897     1     0     1     1     0
8    1
plimitpl     152      660    0      647     1     0     1     1     0
8    0
inpcbpl      216   148012    0   147993     3     0     3     3     0
8    1

In use 20068K, total allocated 23264K; utilization 86.3%

--

 Stephan A. Rickauer

 -----------------------------------------------------------
 Institute of Neuroinformatics         Tel  +41 44 635 30 50
 University / ETH Zurich               Sec  +41 44 635 30 52
 Winterthurerstrasse 190               Fax  +41 44 635 30 53
 CH-8057 Zurich                        Web    www.ini.uzh.ch


Re: CARP node crashing reproducibly (4.3-stable)

by Reyk Floeter-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi stephan!

can you also show your carp configuration?

reyk

On Fri, Jul 11, 2008 at 04:55:33PM +0200, Stephan A. Rickauer wrote:

> Hello,
>
> Here's all data I was able to get off our crashing machine, the backup
> node of our CARP cluster, that used to run flawlessly since 3.7.
>
> We can reproduce the problem by (no joke) installing an openSUSE 10.3
> machine in one of our labs over the network. After 40 minutes, our
> backup firewall crashes. Sounds preposterous, I know... We've not had
> time to examined what packets are exactly sent out on the network by
> this machine, yet.
>
> The crashed machine is still in ddb, so just asked if I should execute
> some more commands.
>
> Should I rather file a bug report? I never know when I should just ask
> here or rather file one, sorry... but thanks for your help anyway!
>
> ddb> ps
>    PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
>  15843   7703   7703    556  2       0x100                nrpe
> *27717      1   7703    556  7       0x100                nrpe
>  26537  24561  27862      0  3      0x4082  ttyin         more
>  24561  27862  27862      0  3      0x4082  pause         sh
>  27862   3290  27862      0  3      0x4082  wait          man
>  10574   1244  10574      0  3      0x4082  ttyin         ksh
>   1244  21740   1244      0  3      0x4180  select        sshd
>  19759  15807  19759      0  3      0x4082  ttyin         ksh
>  15807  21740  15807      0  3      0x4180  select        sshd
>   3290      1   3290      0  3      0x4082  pause         ksh
>   2463      1   2463      0  3      0x4082  ttyin         getty
>   4032      1   4032      0  3      0x4082  ttyin         getty
>  29698      1  29698      0  3      0x4082  ttyin         getty
>  25598      1  25598      0  3      0x4082  ttyin         getty
>   2451      1   2451      0  3      0x4082  ttyin         getty
>  26554      1  26554      0  3        0x80  poll          ntpd
>   7819      1   7819      0  3        0x80  select        cron
>  21981      1  21981      0  3        0x80  kqread        apmd
>   7703      1   7703    556  3       0x180  wait          nrpe
>  21908      1   7188      0  3        0x80  select        snmpd
>  20436      1  20436     83  3       0x180  poll          ntpd
>  10622      1  10622      0  3     0x40180  select        sendmail
>  31903      1  31903     62  3       0x180  select        spamd
>  21740      1  21740      0  3        0x80  select        sshd
>  14513      1  14513     71  3       0x180  kqread        ftp-proxy
>  22495      1  22495     77  3       0x180  poll          dhcrelay
>   9478      1   9478      0  2        0x80                ifstated
>  17920  19868  19868     74  2       0x180                pflogd
>  19868      1  19868      0  3        0x80  netio         pflogd
>    656  20418  20418     73  2       0x180                syslogd
>  20418      1  20418      0  3        0x88  netio         syslogd
>     18      0      0      0  3    0x100200  bored         crypto
>     17      0      0      0  3    0x100200  aiodoned      aiodoned
>     16      0      0      0  3    0x100200  syncer        update
>     15      0      0      0  3    0x100200  cleaner       cleaner
>     14      0      0      0  3    0x100200  reaper        reaper
>     13      0      0      0  3    0x100200  pgdaemon      pagedaemon
>     12      0      0      0  3    0x100200  pftm          pfpurge
>     11      0      0      0  3    0x100200  usbevt        usb4
>     10      0      0      0  3    0x100200  usbevt        usb3
>      9      0      0      0  3    0x100200  usbevt        usb2
>      8      0      0      0  3    0x100200  usbevt        usb1
>      7      0      0      0  3    0x100200  usbtsk        usbtask
>      6      0      0      0  3    0x100200  usbevt        usb0
>      5      0      0      0  3    0x100200  apmev         apm0
>      4      0      0      0  3    0x100200  bored         syswq
>      3      0      0      0  3    0x100200                idle0
>      2      0      0      0  2    0x100200                kmthread
>      1      0      1      0  3      0x4080  wait          init
>      0     -1      0      0  3     0x80200  scheduler     swapper
>
> ddb> trace
> pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8) at
> pf_send_icmp+0x2b
> pf_test_rule(db2a4e68,db2a4e60,1,d115d500,d62f3200) at pf_test_rule
> +0xc66
> pf_test(1,d1447800,db2a4f88,0) at pf_test+0x941
> ipv4_input(d62f3200,d03c5198,50,286,0) at ipv4_input+0x11d
> ipintr(27,12c0027,7db80027,cfbd0027,82ec2) at ipintr+0x70
> Bad frame pointer: 0xdb2a4fa0
>
> ddb> show registers
> ds            0xd0360010        shmget_existing+0x4c
> es            0xdb2a0010        end+0xaa06f1c
> fs                  0x58
> gs                  0x10
> edi                  0x3
> esi                  0x2
> ebp           0xdb2a4d60        end+0xaa0bc6c
> ebx           0xd67191b8        end+0x5e800c4
> edx                  0x4
> ecx           0xd07ef600        mbpool
> eax                    0
> eip           0xd02f56db        pf_send_icmp+0x2b
> cs                  0x50
> eflags           0x10246
> esp           0xdb2a4d38        end+0xaa0bc44
> ss            0xdb2a0010        end+0xaa06f1c
> pf_send_icmp+0x2b:      orb     $0x1,0x32(%eax)
>
> ddb> show panic
> the kernel did not panic
>
> ddb> show all callout
> ticks now: 229505
>     ticks  wheel       arg  func
>       -24  4/1024 d07c9314  nfs_timer
>       -22  4/1024 d07a44ec  pfslowtimo
>       -22  4/1024 d11a0a00  uhci_poll_hub
>       -22  4/1024 d11a0800  uhci_poll_hub
>       -22  4/1024 d11a0600  uhci_poll_hub
>       -22  4/1024 d11a0400  uhci_poll_hub
>       -20  4/1024 d118f000  fxp_stats_update
>       -19  4/1024 d6636414  endtsleep
>       -19  4/1024 d670c160  endtsleep
>       -18  4/1024 d07a44d4  pffasttimo
>    719985  4/1024 d6578e14  tcp_timer_keep
>       -15  4/1024 d6582000  syn_cache_reaper
>         5  4/1024 d6578e14  tcp_delack
>       286  4/1024 d6582228  syn_cache_timer
>       -12  4/1024 d1175000  em_local_timer
>    719988  4/1024 d65787d4  tcp_timer_keep
>       -12  4/1024 d65822e0  syn_cache_reaper
>         8  4/1024 d65787d4  tcp_delack
>       -11  4/1024 d670c818  endtsleep
>        -8  4/1024 d117a800  em_local_timer
>        -7  4/1024 d65784b4  tcp_delack
>        93  4/1024 d145e800  pfsync_timeout
>        -6  4/1024 d6578644  tcp_delack
>        21  0/150  d663600c  endtsleep
>        21  0/150  d07e1f40  pckbc_poll
>        21  0/150  d07a4528  if_slowtimo
>        21  0/150         0  nd6_timer
>        21  0/150  d07a5f90  rt_timer_timer
>        21  0/150  d07a4294  schedcpu
>        25  0/154  d6a5d004  endtsleep
>       298  0/185  d144a800  carp_master_down
>       298  0/185  d1225600  carp_master_down
>       298  0/185  d144b400  carp_master_down
>       298  0/185  d144d200  carp_master_down
>        60  0/189  d1179800  em_local_timer
>        62  0/191  d1170800  em_local_timer
>       171  1/385  d670c2b8  endtsleep
>       211  1/385  d670c970  endtsleep
>       225  1/385  d11a1d40  sensor_task_tick
>       770  1/387  d6636164  endtsleep
>       882  1/387  d670c160  realitexpire
>      2218  1/393  d6a5dd74  realitexpire
>      4539  1/402  d6a636b8  endtsleep
>      4539  1/402  d6a63968  endtsleep
>      4539  1/402  d6a63ac0  endtsleep
>      4539  1/402  d6a63c18  endtsleep
>      4539  1/402  d6a63d70  endtsleep
>     10894  1/427         0  arc4_reinit
>     10994  1/427  d07a63c0  arptimer
>     29973  1/501  d64ccc2c  realitexpire
>    130495  2/517         0  nd6_cache_lladdr
>    100158  2/517  d66362bc  endtsleep
>    131324  2/517  d663600c  realitexpire
>    719098  2/523  d665de10  tcp_timer_keep
>    714623  2/526  d665dc80  tcp_timer_keep
>    719973  2/526  d65784b4  tcp_timer_keep
>    719974  2/526  d6578644  tcp_timer_keep
> ddb>
>
> dmesg from core of previous crash:
>                              
> # dmesg -N bsd.0 -M bsd.0.core    
> OpenBSD 4.3 (GENERIC) #2: Fri May  9 20:54:13 CEST 2008
>     root@...:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: Intel(R) Pentium(R) 4 CPU 2.66GHz ("GenuineIntel" 686-class) 2.67
> GHz
> cpu0:
> FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID,xTPR
> real mem  = 535515136 (510MB)
> avail mem = 509771776 (486MB)
> mainbus0 at root
> bios0 at mainbus0: AT/286+ BIOS, date 02/07/05, BIOS32 rev. 0 @ 0xf0010,
> SMBIOS rev. 2.3 @ 0xfbcb0 (72 entries)
> bios0: vendor Intel Corp. version "WP87510A.86B.0059.P18.0502071117"
> date 02/07/2005
> bios0: Intel Corporation S875WP1
> apm0 at bios0: Power Management spec V1.2
> apm0: AC on, battery charge unknown
> acpi at bios0 function 0x0 not configured
> pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
> pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf3d40/224 (12 entries)
> pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82801EB/ER LPC" rev
> 0x00)
> pcibios0: PCI bus #4 is the last bus
> bios0: ROM list: 0xc0000/0x8000
> cpu0 at mainbus0
> pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
> pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02
> agp0 at pchb0: aperture at 0xf8000000, size 0x4000000
> ppb0 at pci0 dev 1 function 0 "Intel 82875P AGP" rev 0x02
> pci1 at ppb0 bus 1
> ppb1 at pci0 dev 3 function 0 "Intel 82875P CSA" rev 0x02
> pci2 at ppb1 bus 2
> em0 at pci2 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00: irq
> 10, address 00:0c:f1:8f:a6:1d
> uhci0 at pci0 dev 29 function 0 "Intel 82801EB/ER USB" rev 0x02: irq 5
> uhci1 at pci0 dev 29 function 1 "Intel 82801EB/ER USB" rev 0x02: irq 9
> uhci2 at pci0 dev 29 function 2 "Intel 82801EB/ER USB" rev 0x02: irq 10
> uhci3 at pci0 dev 29 function 3 "Intel 82801EB/ER USB" rev 0x02: irq 5
> ehci0 at pci0 dev 29 function 7 "Intel 82801EB/ER USB2" rev 0x02: irq 9
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb2 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xc2
> pci3 at ppb2 bus 3
> ppb3 at pci3 dev 2 function 0 "Pericom PI7C21P100 PCIX-PCIX" rev 0x01
> pci4 at ppb3 bus 4
> em1 at pci4 dev 4 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
> irq 11, address 00:0e:0c:c3:48:04
> em2 at pci4 dev 4 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
> irq 10, address 00:0e:0c:c3:48:05
> em3 at pci4 dev 6 function 0 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
> irq 9, address 00:0e:0c:c3:48:06
> em4 at pci4 dev 6 function 1 "Intel PRO/1000MT QP (82546GB)" rev 0x03:
> irq 5, address 00:0e:0c:c3:48:07
> vga1 at pci3 dev 6 function 0 "ATI Rage XL" rev 0x27
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> fxp0 at pci3 dev 8 function 0 "Intel PRO/100 VE" rev 0x01, i82562: irq
> 11, address 00:0c:f1:8f:a6:1f
> inphy0 at fxp0 phy 1: i82562ET 10/100 PHY, rev. 0
> ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02:
> 24-bit timer at 3579545Hz
> pciide0 at pci0 dev 31 function 1 "Intel 82801EB/ER IDE" rev 0x02: DMA,
> channel 0 configured to compatibility, channel 1 configured to
> compatibility
> wd0 at pciide0 channel 0 drive 0: <HDS728080PLAT20>
> wd0: 16-sector PIO, LBA48, 78533MB, 160836480 sectors
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
> atapiscsi0 at pciide0 channel 1 drive 0
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-552E, 1.00> SCSI0 5/cdrom
> removable
> cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> pciide1 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA,
> channel 0 configured to native-PCI, channel 1 configured to native-PCI
> pciide1: using irq 10 for native-PCI interrupt
> ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: irq
> 11
> iic0 at ichiic0
> adt0 at iic0 addr 0x2e: lm85 rev 0x62
> spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM non-parity PC3200CL3.0
> usb1 at uhci0: USB revision 1.0
> uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb2 at uhci1: USB revision 1.0
> uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb3 at uhci2: USB revision 1.0
> uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> usb4 at uhci3: USB revision 1.0
> uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1
> isa0 at ichpcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pmsi0 at pckbc0 (aux slot)
> pckbc0: using irq 12 for aux slot
> wsmouse0 at pmsi0 mux 0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0: <PC speaker>
> spkr0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> biomask ef65 netmask ef65 ttymask ffe7
> mtrr: Pentium Pro MTRR support
> softraid0 at root
> root on wd0a swap on wd0b dump on wd0b
> WARNING: / was not properly unmounted
> uvm_fault(0xd07e99a0, 0x0, 0, 3) -> e
> kernel: page fault trap, code=0
> Stopped at      pf_send_icmp+0x2b:      orb     $0x1,0x32(%eax)
>
> Vmstat from the core of the previous crash:
>
> # vmstat -N /usr/crash/bsd.0 -M /usr/crash/bsd.0.core -m
> Memory statistics by bucket size
>     Size   In Use   Free           Requests  HighWater  Couldfree
>       16     4340  13068             659510    1280       3337
>       32      333    179              49068     640          0
>       64     2283   1365             378567     320      10951
>      128      748     84              14842     160          0
>      256      221    147              26905      80        506
>      512      399     17               5112      40          0
>     1024     2319     25               8606      20        680
>     2048       45      5                734      10          0
>     4096       32      4              26834       5          0
>     8192       12      0                 12       5          0
>    16384        2      0                  2       5          0
>    32768        5      0                  5       5          0
>    65536        1      0                  1       5          0
>
> Memory usage type by bucket size
>     Size  Type(s)
>       16  devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash,
> in_multi,
>           exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB
> device,
>           packet tags, temp
>       32  devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, proc,
>           VFS cluster, in_multi, ether_multi, xform_data, VM swap, UVM
> amap,
>           USB, temp, AGP Memory
>       64  devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash,
> ip_moptions,
>           in_multi, pfkey data, UVM amap, USB, NDP, temp
>      128  devbuf, routetbl, ifaddr, vnodes, dirhash, ttys, exec, UVM
> amap, USB,
>           USB device, NDP, temp, AGP Memory
>      256  devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM
> map,
>           dirhash, proc, NFS srvsock, NFS daemon, newblk, UVM amap, USB,
>           USB device, temp
>      512  devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash,
> ttys,
>           exec, UVM amap, USB device, temp
>     1024  devbuf, ioctlops, namecache, proc, ttys, exec, UVM amap, UVM
> aobj,
>           crypto data, temp
>     2048  devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM
> amap, temp
>     4096  devbuf, ioctlops, UFS mount, MSDOSFS mount, UVM amap, memdesc,
> temp
>     8192  devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS
> mount,
>           inodedep
>    16384  devbuf, namecache
>    32768  devbuf, VM swap
>    65536  VM swap
>
> Memory statistics by type                           Type  Kern
>           Type InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
>         devbuf  3808  2545K   2545K 39322K     3880    0     0
> 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768
>            pcb    29     4K      4K 39322K      169    0     0
> 16,32,64,512
>       routetbl   303    28K     44K 39322K     1642    0     0
> 16,32,64,128,256
>         ifaddr   143    25K     25K 39322K      145    0     0
> 16,32,64,128,256,512,2048
>         sysctl     2     1K      1K 39322K        2    0     0  16,256
>       ioctlops     0     0K      4K 39322K     2737    0     0
> 256,512,1024,2048,4096
>          mount     4     2K      2K 39322K        4    0     0  512
>       NFS node     1     8K      8K 39322K        1    0     0  8192
>         vnodes    56     8K     87K 39322K     1315    0     0
> 64,128,256
>      namecache     3    25K     25K 39322K        3    0     0
> 1024,8192,16384
>      UFS quota     1     8K      8K 39322K        1    0     0  8192
>      UFS mount    17    35K     35K 39322K       17    0     0
> 16,32,512,2048,4096,8192
>            shm     2     1K      1K 39322K        2    0     0  256,512
>         VM map     4     1K      1K 39322K        4    0     0  256
>            sem     2     1K      1K 39322K        2    0     0  32,64
>        dirhash   195    37K     41K 39322K      459    0     0
> 16,32,64,128,256,512
>           proc    15     3K      3K 39322K       15    0     0
> 32,256,1024
>    VFS cluster     0     0K      1K 39322K      380    0     0  32
>    NFS srvsock     1     1K      1K 39322K        1    0     0  256
>     NFS daemon     1     1K      1K 39322K        1    0     0  256
>    ip_moptions     5     1K      1K 39322K        5    0     0  64
>       in_multi   123     5K      5K 39322K      124    0     0  16,32,64
>    ether_multi    64     2K      3K 39322K       65    0     0  32
>    ISOFS mount     1     8K      8K 39322K        1    0     0  8192
>  MSDOSFS mount     1     4K      4K 39322K        1    0     0  4096
>           ttys   420   263K    263K 39322K      420    0     0
> 128,512,1024
>           exec     0     0K      2K 39322K     4679    0     0
> 16,128,512,1024
>     pfkey data     1     1K      1K 39322K        2    0     0  64
>     xform_data     0     0K      1K 39322K       45    0     0  16,32
>        pagedep     1     2K      2K 39322K        1    0     0  2048
>       inodedep     1     8K      8K 39322K        1    0     0  8192
>         newblk     1     1K      1K 39322K        1    0     0  256
>        VM swap     7    75K     75K 39322K        7    0     0
> 16,32,2048,32768,65536
>       UVM amap  5289   306K    529K 39322K   760441    0     0
> 16,32,64,128,256,512,1024,2048,4096
>       UVM aobj     2     2K      2K 39322K        2    0     0  16,1024
>            USB    74     7K      7K 39322K       74    0     0
> 16,32,64,128,256
>     USB device    21     9K      9K 39322K       21    0     0
> 16,128,256,512
>        memdesc     1     4K      4K 39322K        1    0     0  4096
>    crypto data     1     1K      1K 39322K        1    0     0  1024
>    packet tags     0     0K      1K 39322K       83    0     0  16
>            NDP    24     3K      3K 39322K       28    0     0  64,128
>           temp   114    14K     18K 39322K   393413    0     0
> 16,32,64,128,256,512,1024,2048,4096
>     AGP Memory     2     1K      1K 39322K        2    0     0  32,128
>
> Memory Totals:  In Use    Free    Requests
>                  3435K    402K     1170198
> Memory resource pool statistics
> Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
> Maxpg Idle
> phpool        32     1556    0        0    13     0    13    13     0
> 8    0
> extentpl      20      248    0      197     1     0     1     1     0
> 8    0
> pmappl        84     5679    0     5648     2     0     2     2     0
> 8    1
> vmsppl       188     5679    0     5648     5     0     5     5     0
> 8    3
> vmmpepl       88   517935    0   515492   252    12   240   241     0
> 179  179
> vmmpekpl      88   298937    0   298904     2     0     2     2     0
> 8    1
> aobjpl        52        1    0        0     1     0     1     1     0
> 8    0
> amappl        44   249175    0   247468    70     5    65    66     0
> 45   44
> anonpl        16   391737    0   386051    46     0    46    46     0
> 31   21
> bufpl        124    15106    0    11816   103     0   103   103     0
> 8    0
> mbpl         256  2533151 16215 2530831   145     0   145   145     1
> 384    0
> mclpl       2048   772776  105   770492  1142     0  1142  1142     4
> 3072    0
> sockpl       212   148172    0   148133     4     0     4     4     0
> 8    1
> procpl       344     5696    0     5648    10     0    10    10     0
> 8    5
> processpl     20     5696    0     5648     1     0     1     1     0
> 8    0
> zombiepl      72     5648    0     5648     1     0     1     1     0
> 8    1
> ucredpl       80      170    0      153     1     0     1     1     0
> 8    0
> pgrppl        24      994    0      965     1     0     1     1     0
> 8    0
> sessionpl     48       84    0       58     1     0     1     1     0
> 8    0
> pcredpl       24     5696    0     5648     1     0     1     1     0
> 8    0
> lockfpl       52       20    0       18     1     0     1     1     0
> 8    0
> filepl        88   186689    0   186584     5     0     5     5     0
> 8    2
> fdescpl      296     5697    0     5648     8     0     8     8     0
> 8    4
> pipepl        72     6438    0     6430     2     0     2     2     0
> 8    1
> kqueuepl     192        4    0        1     1     0     1     1     0
> 8    0
> knotepl       64       12    0        3     1     0     1     1     0
> 8    0
> sigapl       316     5679    0     5648     8     0     8     8     0
> 8    5
> wqtasks       20     1153    0     1153     1     0     1     1     0
> 8    1
> wdcspl        96    28209    0    28208     1     0     1     1     0
> 8    0
> scxspl       132        3    0        3     1     0     1     1     0
> 8    1
> namei       1024    57410    0    57410     3     0     3     3     0
> 8    3
> vnodes       148     3141    0        0   117     0   117   117     0
> 8    0
> nchpl         72     6495    0     4913    29     0    29    29     0
> 8    0
> ffsino       184    13477    0    10343   143     0   143   143     0
> 8    0
> dino1pl      128    13477    0    10343   102     0   102   102     0
> 8    0
> dirhash     1024      636    0      346    77     0    77    77     0
> 128    3
> pfrulepl     824      442    0       10   111     0   111   111     0
> 8    2
> pfstatepl    204   114953    0   112771   241     0   241   241     0
> 264  110
> pfstatekeypl 108   114953    0   112797   131    46    85   123     0
> 8    8
> pfpooladdrpl  68       27    0        0     1     0     1     1     0
> 8    0
> pfrktable   1240      146    0       74    48     0    48    48     0
> 334    0
> pfrkentry    156    78200    0     5914  3008     0  3008  3008     0
> 13462    0
> pfosfpen     108     1392    0      696    30    11    19    19     0
> 8    0
> pfosfp        28      814    0      407     3     0     3     3     0
> 8    0
> rtentpl      116       88    0       15     3     0     3     3     0
> 8    0
> tcpcbpl      400      933    0      920     4     0     4     4     0
> 8    2
> tcpqepl       16       27    0       27     1     0     1     1     0
> 13    1
> synpl        184      897    0      897     1     0     1     1     0
> 8    1
> plimitpl     152      660    0      647     1     0     1     1     0
> 8    0
> inpcbpl      216   148012    0   147993     3     0     3     3     0
> 8    1
>
> In use 20068K, total allocated 23264K; utilization 86.3%
>
> --
>
>  Stephan A. Rickauer
>
>  -----------------------------------------------------------
>  Institute of Neuroinformatics         Tel  +41 44 635 30 50
>  University / ETH Zurich               Sec  +41 44 635 30 52
>  Winterthurerstrasse 190               Fax  +41 44 635 30 53
>  CH-8057 Zurich                        Web    www.ini.uzh.ch


Re: CARP node crashing reproducibly (4.3-stable)

by Stephan A. Rickauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2008-07-11 at 17:09 +0200, Reyk Floeter wrote:
> hi stephan!

o;?That was quick! Hi Reyk.

> can you also show your carp configuration?

Sure (just x'ed out the external IPs as well as passwords). We have a
simple master/backup system:

carp0: LAN
carp1: DMZ
carp2: WLAN
carp3: Internet

#
cat /etc/host*.carp*                                                          
inet 172.16.3.254 255.255.254.0 172.16.3.255 vhid 1 advskew 50 pass xxx
carpdev em0
inet 130.60.230.xxx 255.255.255.224 130.60.230.xxx vhid 2 advskew 50
pass xxx carpdev em1
inet 192.168.91.254 255.255.255.0 192.168.91.255 vhid 4 advskew 50 pass
xxx carpdev fxp0
inet 130.60.x.xx 255.255.255.252 130.60.x.xxx vhid 3 advskew 50 pass xxx
carpdev em2

('advskew 100' on backup node)

# sysctl net.inet.carp  
net.inet.carp.allow=1
net.inet.carp.preempt=1
net.inet.carp.log=0

cat /etc/hostname.carp*  

# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33208
        groups: lo
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0c:f1:8f:a9:c4
        media: Ethernet autoselect (1000baseT
full-duplex,rxpause,txpause)
        status: active
        inet 172.16.3.252 netmask 0xfffffe00 broadcast 172.16.3.255
        inet6 fe80::20c:f1ff:fe8f:a9c4%em0 prefixlen 64 scopeid 0x1
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0e:0c:c3:39:74
        media: Ethernet autoselect (1000baseT
full-duplex,rxpause,txpause)
        status: active
        inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx
        inet6 fe80::20e:cff:fec3:3974%em1 prefixlen 64 scopeid 0x2
em2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0e:0c:c3:39:75
        media: Ethernet autoselect (1000baseT
full-duplex,rxpause,txpause)
        status: active
        inet6 fe80::20e:cff:fec3:3975%em2 prefixlen 64 scopeid 0x3
em3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0e:0c:c3:39:76
        media: Ethernet autoselect (none)
        status: no carrier
em4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:0e:0c:c3:39:77
        media: Ethernet autoselect (1000baseT
full-duplex,master,rxpause,txpause)
        status: active
        inet 1.1.1.252 netmask 0xff000000 broadcast 1.255.255.255
        inet6 fe80::20e:cff:fec3:3977%em4 prefixlen 64 scopeid 0x5
fxp0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
1500
        lladdr 00:0c:f1:8f:a9:c5
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet 192.168.91.252 netmask 0xffffff00 broadcast 192.168.91.255
        inet6 fe80::20c:f1ff:fe8f:a9c5%fxp0 prefixlen 64 scopeid 0x6
enc0: flags=0<> mtu 1536
pfsync0: flags=41<UP,RUNNING> mtu 1460
        pfsync: syncdev: em4 syncpeer: 1.1.1.253 maxupd: 128
        groups: carp pfsync
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33208
        groups: pflog
carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:01
        carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 50
        groups: carp
        inet6 fe80::200:5eff:fe00:101%carp0 prefixlen 64 scopeid 0x9
        inet 172.16.3.254 netmask 0xfffffe00 broadcast 172.16.3.255
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:02
        carp: MASTER carpdev em1 vhid 2 advbase 1 advskew 50
        groups: carp
        inet6 fe80::200:5eff:fe00:102%carp1 prefixlen 64 scopeid 0xa
        inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx
carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:04
        carp: MASTER carpdev fxp0 vhid 4 advbase 1 advskew 50
        groups: carp
        inet6 fe80::200:5eff:fe00:104%carp2 prefixlen 64 scopeid 0xb
        inet 192.168.91.254 netmask 0xffffff00 broadcast 192.168.91.255
carp3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:03
        carp: MASTER carpdev em2 vhid 3 advbase 1 advskew 50
        groups: carp egress
        inet6 fe80::200:5eff:fe00:103%carp3 prefixlen 64 scopeid 0xc
        inet 130.60.x.xxx netmask 0xfffffffc broadcast 130.60.x.xxx


I think this it ;)


--

 Stephan A. Rickauer

 -----------------------------------------------------------
 Institute of Neuroinformatics         Tel  +41 44 635 30 50
 University / ETH Zurich               Sec  +41 44 635 30 52
 Winterthurerstrasse 190               Fax  +41 44 635 30 53
 CH-8057 Zurich                        Web    www.ini.uzh.ch


Re: CARP node crashing reproducibly (4.3-stable)

by Giancarlo Razzolini :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stephan A. Rickauer escreveu:

> On Fri, 2008-07-11 at 17:09 +0200, Reyk Floeter wrote:
>  
>> hi stephan!
>>    
>
> o;?That was quick! Hi Reyk.
>
>  
>> can you also show your carp configuration?
>>    
>
> Sure (just x'ed out the external IPs as well as passwords). We have a
> simple master/backup system:
>
> carp0: LAN
> carp1: DMZ
> carp2: WLAN
> carp3: Internet
>
> #
> cat /etc/host*.carp*                                                          
> inet 172.16.3.254 255.255.254.0 172.16.3.255 vhid 1 advskew 50 pass xxx
> carpdev em0
> inet 130.60.230.xxx 255.255.255.224 130.60.230.xxx vhid 2 advskew 50
> pass xxx carpdev em1
> inet 192.168.91.254 255.255.255.0 192.168.91.255 vhid 4 advskew 50 pass
> xxx carpdev fxp0
> inet 130.60.x.xx 255.255.255.252 130.60.x.xxx vhid 3 advskew 50 pass xxx
> carpdev em2
>
> ('advskew 100' on backup node)
>
> # sysctl net.inet.carp  
> net.inet.carp.allow=1
> net.inet.carp.preempt=1
> net.inet.carp.log=0
>
> cat /etc/hostname.carp*  
>
> # ifconfig
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33208
>         groups: lo
>         inet 127.0.0.1 netmask 0xff000000
>         inet6 ::1 prefixlen 128
>         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
> em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0c:f1:8f:a9:c4
>         media: Ethernet autoselect (1000baseT
> full-duplex,rxpause,txpause)
>         status: active
>         inet 172.16.3.252 netmask 0xfffffe00 broadcast 172.16.3.255
>         inet6 fe80::20c:f1ff:fe8f:a9c4%em0 prefixlen 64 scopeid 0x1
> em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0e:0c:c3:39:74
>         media: Ethernet autoselect (1000baseT
> full-duplex,rxpause,txpause)
>         status: active
>         inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx
>         inet6 fe80::20e:cff:fec3:3974%em1 prefixlen 64 scopeid 0x2
> em2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0e:0c:c3:39:75
>         media: Ethernet autoselect (1000baseT
> full-duplex,rxpause,txpause)
>         status: active
>         inet6 fe80::20e:cff:fec3:3975%em2 prefixlen 64 scopeid 0x3
> em3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0e:0c:c3:39:76
>         media: Ethernet autoselect (none)
>         status: no carrier
> em4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:0e:0c:c3:39:77
>         media: Ethernet autoselect (1000baseT
> full-duplex,master,rxpause,txpause)
>         status: active
>         inet 1.1.1.252 netmask 0xff000000 broadcast 1.255.255.255
>         inet6 fe80::20e:cff:fec3:3977%em4 prefixlen 64 scopeid 0x5
> fxp0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
> 1500
>         lladdr 00:0c:f1:8f:a9:c5
>         media: Ethernet autoselect (100baseTX full-duplex)
>         status: active
>         inet 192.168.91.252 netmask 0xffffff00 broadcast 192.168.91.255
>         inet6 fe80::20c:f1ff:fe8f:a9c5%fxp0 prefixlen 64 scopeid 0x6
> enc0: flags=0<> mtu 1536
> pfsync0: flags=41<UP,RUNNING> mtu 1460
>         pfsync: syncdev: em4 syncpeer: 1.1.1.253 maxupd: 128
>         groups: carp pfsync
> pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33208
>         groups: pflog
> carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:00:5e:00:01:01
>         carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 50
>         groups: carp
>         inet6 fe80::200:5eff:fe00:101%carp0 prefixlen 64 scopeid 0x9
>         inet 172.16.3.254 netmask 0xfffffe00 broadcast 172.16.3.255
> carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:00:5e:00:01:02
>         carp: MASTER carpdev em1 vhid 2 advbase 1 advskew 50
>         groups: carp
>         inet6 fe80::200:5eff:fe00:102%carp1 prefixlen 64 scopeid 0xa
>         inet 130.60.230.xxx netmask 0xffffffe0 broadcast 130.60.230.xxx
> carp2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:00:5e:00:01:04
>         carp: MASTER carpdev fxp0 vhid 4 advbase 1 advskew 50
>         groups: carp
>         inet6 fe80::200:5eff:fe00:104%carp2 prefixlen 64 scopeid 0xb
>         inet 192.168.91.254 netmask 0xffffff00 broadcast 192.168.91.255
> carp3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>         lladdr 00:00:5e:00:01:03
>         carp: MASTER carpdev em2 vhid 3 advbase 1 advskew 50
>         groups: carp egress
>         inet6 fe80::200:5eff:fe00:103%carp3 prefixlen 64 scopeid 0xc
>         inet 130.60.x.xxx netmask 0xfffffffc broadcast 130.60.x.xxx
>
>
> I think this it ;)
>
>
>  
I'm just guessing, but i had some problems with centos 4.3 (AFAICR), and
the avahi-daemon. It just started to use my gateway's ip address,
causing machines in the internal net not being able to navigate to the
internet. It was an openbsd 4.0, NOT using carp. I suggest you take a
look to see if the avahi-daemon is running on the suse machine. If it
is, shut it down and see it again. Also, try capturing some packets.

My regards,

--
Giancarlo Razzolini
http://lock.razzolini.adm.br
Linux User 172199
Red Hat Certified Engineer no:804006389722501
Verify:https://www.redhat.com/certification/rhce/current/
Moleque Sem Conteudo Numero #002
OpenBSD Stable
Ubuntu 8.04 Hardy Herom
4386 2A6F FFD4 4D5F 5842  6EA0 7ABE BBAB 9C0E 6B85


Re: CARP node crashing reproducibly (4.3-stable)

by Henning Brauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Stephan A. Rickauer <stephan.rickauer@...> [2008-07-11 16:59]:
> Here's all data I was able to get off our crashing machine, the backup
> node of our CARP cluster, that used to run flawlessly since 3.7.
>
> We can reproduce the problem

if you follow http://www.benzedrine.cx/crashreport.html we have a
chance to actually fix the bug...

--
Henning Brauer, hb@..., henning@...
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam


Re: CARP node crashing reproducibly (4.3-stable)

by Stephan A. Rickauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2008-07-11 at 21:32 +0200, Henning Brauer wrote:
> * Stephan A. Rickauer <stephan.rickauer@...> [2008-07-11 16:59]:
> > Here's all data I was able to get off our crashing machine, the backup
> > node of our CARP cluster, that used to run flawlessly since 3.7.
> >
> > We can reproduce the problem
>
> if you follow http://www.benzedrine.cx/crashreport.html we have a
> chance to actually fix the bug...

Nice page. I'll have a look on Monday. Thanks.

--

 Stephan A. Rickauer

 -----------------------------------------------------------
 Institute of Neuroinformatics         Tel  +41 44 635 30 50
 University / ETH Zurich               Sec  +41 44 635 30 52
 Winterthurerstrasse 190               Fax  +41 44 635 30 53
 CH-8057 Zurich                        Web    www.ini.uzh.ch


Re: CARP node crashing reproducibly (4.3-stable)

by Adrian M. Whatley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Henning Brauer wrote:
| * Stephan A. Rickauer <stephan.rickauer@...> [2008-07-11
16:59]:
|> Here's all data I was able to get off our crashing machine, the backup
|> node of our CARP cluster, that used to run flawlessly since 3.7.
|>
|> We can reproduce the problem
|
| if you follow http://www.benzedrine.cx/crashreport.html we have a
| chance to actually fix the bug...
|

Hello,
I'm a colleague of Stephan Rickauer and I've been taking a look at this
problem.

It's a NULL pointer bug!

dmesg shows
kernel: page fault trap, code=0
Stopped at      pf_send_icmp+0x2b:      orb

and ddb trace shows:

$0x1,0x32(%eax)pf_send_icmp(d62f3200,3,3,2,d67191b8,d115d500,2,db2a4eb8)
at pf_send_icmp+0x2b

ddb registers shows (among others):

eax                    0
eip           0xd02f56db        pf_send_icmp+0x2b

and helpfully disassembles the faulting instruction thus:

pf_send_icmp+0x2b:      orb     $0x1,0x32(%eax)

which is from line 1726 in pf_send_icmp() in pf.c:

        m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;

The beginning of this function (up to the line with the or) is as follows:

pf_send_icmp(struct mbuf *m, u_int8_t type, u_int8_t code, sa_family_t af,
~    struct pf_rule *r)
{
        struct mbuf *m0;

        m0 = m_copy(m, 0, M_COPYALL);
        m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;

Thus we have m_copy (actually m_copym, since m_copy is a macro defined
in /usr/src/sys/sys/mbuf.h in terms of m_copym, which itself is a
one-line wrapper around m_copym0) returning a NULL pointer in eax (= m0)
and the subsequent OR getting a page fault when it tries to use it.

Looking at m_copym0, it looks like it can legitimately fail and return
NULL (it even increments a global variable MCFail when it does so) and
therefore the bug is that its return value is not being checked in
pf_send_icmp.


As far as I can see, the precise nature of the packet being handled at
the time of the crash is not important. Using ddb on the crashed
machine, it looks as if the packet being handled at the time is a
(relatively) innocent UDP broadcast as follows:

IP header:
45  0   0   1d
0   0   0   0
40  11  1b  a2
ac  10  3   f
ac  10  3   ff

ip header length = 5 32-bit words
length = 29
id = 0
flags = 0
fragmentation offset = 0
TTL = 64
Protocol = 17, UDP
Source address = 172.16.3.15 (zynapse.lan.ini.uzh.ch)
Dest address = 172.16.3.255

UDP header:
bb  b5  22  3d
0   9   a5  ba

source port = bbb5 = 48053
dest port = 223d = 8765 (Ultraseek HTTP ?)
length = 9

Data:
1d



Adrian


- --
Adrian M. Whatley
Universitaet/ETH Zuerich,
Institut fuer Neuroinformatik,
Winterthurerstrasse 190,
CH-8057 Zuerich, Switzerland.
Phone: +41 44 635 3067 Fax: +41 44 635 3053
Email: amw@... WWW: http://www.ini.uzh.ch/~amw/
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFIeyy7Lgk3RqYSp9YRAlgfAJ4wYygStPwwScv9eScXXjIRtwc4oQCghkTb
rUhs3B5ZZPkyMQwXxyg9Xys=
=0Dyq
-----END PGP SIGNATURE-----


Re: CARP node crashing reproducibly (4.3-stable)

by Henning Brauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Adrian M. Whatley <amw@...> [2008-07-14 13:54]:
> It's a NULL pointer bug!

> which is from line 1726 in pf_send_icmp() in pf.c:
>
> m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;

> Looking at m_copym0, it looks like it can legitimately fail and return
> NULL (it even increments a global variable MCFail when it does so) and
> therefore the bug is that its return value is not being checked in
> pf_send_icmp.

perfect analysis!

looks like the only sane thing to do in that case is to bail and not
send the icmp.

Index: pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.609
diff -u -p -r1.609 pf.c
--- pf.c 10 Jul 2008 07:41:21 -0000 1.609
+++ pf.c 14 Jul 2008 12:20:27 -0000
@@ -1819,7 +1819,9 @@ pf_send_icmp(struct mbuf *m, u_int8_t ty
 {
  struct mbuf *m0;
 
- m0 = m_copy(m, 0, M_COPYALL);
+ if ((m0 = m_copy(m, 0, M_COPYALL)) == NULL)
+ return;
+
  m0->m_pkthdr.pf.flags |= PF_TAG_GENERATED;
 
  if (r->rtableid >= 0)


--
Henning Brauer, hb@..., henning@...
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam


Re: CARP node crashing reproducibly (4.3-stable)

by Stephan A. Rickauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2008-07-14 at 14:22 +0200, Henning Brauer wrote:
> perfect analysis!
>
> looks like the only sane thing to do in that case is to bail and not
> send the icmp.

I've compiled a new kernel with the patch. The machine is no longer
crashing on pf_send_icmp(). However, I now see memory leaking until the
machine locks up (it doesn't crash but its network becomes unusable).
Unfortunately, it then also puts all CARP interfaces in MASTER state,
though the other node works perfectly as master already. This will, of
course, knock down our entire network until I manually put down the carp
interfaces.

I have increased kern.maxclusters to gain more time for debugging of the
memory leak. However, all I could find out so far is that lots of mbufs
are allocated while there is no significant traffic to be handled
(remember the machine is the CARP backup). The machine crashes within 15
minutes after reboot.

Because of the line wrapping in this email, I've also put the output of
netstat and vmstat online)

 http://www.ini.uzh.ch/~stephan/vmstat+netstat.txt


# vmstat -m
Memory statistics by bucket size
    Size   In Use   Free           Requests  HighWater  Couldfree
      16     3549  10275             304244    1280       7725
      32      303    209              51063     640          0
      64     2968    360              93244     320         89
     128      511     65               5665     160          0
     256      189    131              12817      80       1065
     512      351      9               3326      40          0
    1024     2313     11               3302      20          0
    2048       33      1               1536      10          0
    4096       28      1               6834       5          0
    8192       12      0                 12       5          0
   16384        6      0                  6       5          0
   32768        5      0                  5       5          0
   65536        1      0                  1       5          0

Memory usage type by bucket size
    Size  Type(s)
      16  devbuf, pcb, routetbl, ifaddr, sysctl, UFS mount, dirhash,
in_multi,
          exec, xform_data, VM swap, UVM amap, UVM aobj, USB, USB
device,
          packet tags, temp
      32  devbuf, pcb, routetbl, ifaddr, UFS mount, sem, dirhash, proc,
          VFS cluster, in_multi, ether_multi, xform_data, VM swap, UVM
amap,
          USB, temp, AGP Memory
      64  devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash,
ip_moptions,
          in_multi, pfkey data, UVM amap, USB, NDP, temp
     128  devbuf, routetbl, ifaddr, vnodes, ttys, exec, UVM amap, USB,
          USB device, NDP, temp, AGP Memory
     256  devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM
map, proc,
          NFS srvsock, NFS daemon, newblk, UVM amap, USB, USB device,
temp
     512  devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash,
ttys,
          exec, UVM amap, USB device, temp
    1024  devbuf, ioctlops, namecache, proc, ttys, exec, UVM amap, UVM
aobj,
          crypto data, temp
    2048  devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM
amap, temp
    4096  devbuf, ioctlops, UFS mount, MSDOSFS mount, memdesc, temp
    8192  devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS
mount,
          inodedep
   16384  devbuf, namecache, UVM amap
   32768  devbuf, VM swap
   65536  VM swap

Memory statistics by type                           Type  Kern
          Type InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
        devbuf  3808  2545K   2545K 39322K     3880    0     0
16,32,64,128,256,512,1024,2048,4096,8192,16384,32768
           pcb    30     4K      4K 39322K       78    0     0
16,32,64,512
      routetbl   280    27K     44K 39322K     1400    0     0
16,32,64,128,256
        ifaddr   143    25K     25K 39322K      145    0     0
16,32,64,128,256,512,2048
        sysctl     2     1K      1K 39322K        2    0     0  16,256
      ioctlops     0     0K      4K 39322K     5457    0     0
256,512,1024,2048,4096
         mount     4     2K      2K 39322K        4    0     0  512
      NFS node     1     8K      8K 39322K        1    0     0  8192
        vnodes  1256    83K     87K 39322K     1312    0     0
64,128,256
     namecache     3    25K     25K 39322K        3    0     0
1024,8192,16384
     UFS quota     1     8K      8K 39322K        1    0     0  8192
     UFS mount    17    35K     35K 39322K       17    0     0
16,32,512,2048,4096,8192
           shm     2     1K      1K 39322K        2    0     0  256,512
        VM map     4     1K      1K 39322K        4    0     0  256
           sem     2     1K      1K 39322K        2    0     0  32,64
       dirhash    30     6K      6K 39322K       30    0     0
16,32,64,512
          proc    15     3K      3K 39322K       15    0     0
32,256,1024
   VFS cluster     0     0K      1K 39322K       26    0     0  32
   NFS srvsock     1     1K      1K 39322K        1    0     0  256
    NFS daemon     1     1K      1K 39322K        1    0     0  256
   ip_moptions     5     1K      1K 39322K        5    0     0  64
      in_multi   123     5K      5K 39322K      124    0     0  16,32,64
   ether_multi    64     2K      3K 39322K       65    0     0  32
   ISOFS mount     1     8K      8K 39322K        1    0     0  8192
 MSDOSFS mount     1     4K      4K 39322K        1    0     0  4096
          ttys   420   263K    263K 39322K      420    0     0
128,512,1024
          exec     0     0K      2K 39322K     3090    0     0
16,128,512,1024
    pfkey data     1     1K      1K 39322K        2    0     0  64
    xform_data     0     0K      1K 39322K       18    0     0  16,32
       pagedep     1     2K      2K 39322K        1    0     0  2048
      inodedep     1     8K      8K 39322K        1    0     0  8192
        newblk     1     1K      1K 39322K        1    0     0  256
       VM swap     7    75K     75K 39322K        7    0     0
16,32,2048,32768,65536
      UVM amap  3819   233K    349K 39322K   349090    0     0
16,32,64,128,256,512,1024,2048,16384
      UVM aobj     2     2K      2K 39322K        2    0     0  16,1024
           USB    74     7K      7K 39322K       74    0     0
16,32,64,128,256
    USB device    21     9K      9K 39322K       21    0     0
16,128,256,512
       memdesc     1     4K      4K 39322K        1    0     0  4096
   crypto data     1     1K      1K 39322K        1    0     0  1024
   packet tags     0     0K      1K 39322K        8    0     0  16
           NDP    24     3K      3K 39322K       28    0     0  64,128
          temp   112    14K     18K 39322K   116753    0     0
16,32,64,128,256,512,1024,2048,4096
    AGP Memory     2     1K      1K 39322K        2    0     0  32,128

Memory Totals:  In Use    Free    Requests
                 3405K    252K      482097
Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idle
extentpl      20      248    0      197     1     0     1     1     0
8    0
phpool        32    36864    0        0   291     0   291   291     0
8    0
pmappl        84     2469    0     2443     2     0     2     2     0
8    1
vmsppl       188     2469    0     2443     4     0     4     4     0
8    2
vmmpepl       88   207056    0   205462   106     0   106   106     0
179   71
vmmpekpl      88    59313    0    59192     3     0     3     3     0
8    0
aobjpl        52        1    0        0     1     0     1     1     0
8    0
amappl        44   113985    0   112738    50     0    50    50     0
45   30
anonpl        16   161598    0   156491    34     0    34    34     0
31   11
bufpl        124     2817    0      123    85     0    85    85     0
8    0
mbpl         256   618808    0   553629  4075     0  4075  4075     1
4096    1
mclpl       2048   185575    0   120403 32591     0 32591 32591     4
32768    4
sockpl       212     1944    0     1906     3     0     3     3     0
8    0
procpl       344     2486    0     2443     9     0     9     9     0
8    4
processpl     20     2486    0     2443     1     0     1     1     0
8    0
zombiepl      72     2443    0     2443     1     0     1     1     0
8    1
ucredpl       80      848    0      836     1     0     1     1     0
8    0
pgrppl        24     1448    0     1424     1     0     1     1     0
8    0
sessionpl     48       33    0       10     1     0     1     1     0
8    0
pcredpl       24     2486    0     2443     1     0     1     1     0
8    0
lockfpl       52       12    0       10     1     0     1     1     0
8    0
filepl        88    15335    0    15245     4     0     4     4     0
8    2
fdescpl      296     2487    0     2443     8     0     8     8     0
8    4
pipepl        72     1246    0     1242     2     0     2     2     0
8    1
kqueuepl     192        3    0        0     1     0     1     1     0
8    0
knotepl       64        9    0        0     1     0     1     1     0
8    0
sigapl       316     2469    0     2443     7     0     7     7     0
8    4
wqtasks       20      227    0      227     1     0     1     1     0
8    1
wdcspl        96     3416    0     3416     1     0     1     1     0
8    1
scxspl       132        3    0        3     1     0     1     1     0
8    1
namei       1024    26059    0    26059     2     0     2     2     0
8    2
vnodes       148     1582    0        0    59     0    59    59     0
8    0
nchpl         72     1636    0       57    29     0    29    29     0
8    0
ffsino       184     1664    0       91    72     0    72    72     0
8    0
dino1pl      128     1664    0       91    51     0    51    51     0
8    0
dirhash     1024       37    0        0    10     0    10    10     0
128    0
pfrulepl     824      442    0       10   111     0   111   111     0
8    2
pfstatepl    204    23449    0    21322   173     0   173   173     0
264   30
pfstatekeypl 108    23449    0    21374    86     5    81    86     0
8    8
pfpooladdrpl  68       27    0        0     1     0     1     1     0
8    0
pfrktable   1240      144    0       72    48     0    48    48     0
334    0
pfrkentry    156     1089    0        0    42     0    42    42     0
13462    0
pfosfpen     108     1392    0      696    30    11    19    19     0
8    0
pfosfp        28      814    0      407     3     0     3     3     0
8    0
rtentpl      116       74    0        4     2     0     2     2     0
8    0
tcpcbpl      400      190    0      179     3     0     3     3     0
8    1
tcpqepl       16        6    0        6     1     0     1     1     0
13    1
synpl        184      185    0      185     1     0     1     1     0
8    1
plimitpl     152      133    0      122     1     0     1     1     0
8    0
inpcbpl      216     1875    0     1858     2     0     2     2     0
8    1

# netstat -m
In use 150665K, total allocated 154056K; utilization 97.8%
65183 mbufs in use:
        65178 mbufs allocated to data
        1 mbuf allocated to packet headers
        4 mbufs allocated to socket names and addresses
65178/65186/65536 mbuf clusters in use (current/peak/max)
146676 Kbytes allocated to network (14% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines


Re: CARP node crashing reproducibly (4.3-stable)

by Henning Brauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Stephan A. Rickauer <stephan.rickauer@...> [2008-07-14 17:27]:

> On Mon, 2008-07-14 at 14:22 +0200, Henning Brauer wrote:
> > perfect analysis!
> >
> > looks like the only sane thing to do in that case is to bail and not
> > send the icmp.
>
> I've compiled a new kernel with the patch. The machine is no longer
> crashing on pf_send_icmp(). However, I now see memory leaking until the
> machine locks up (it doesn't crash but its network becomes unusable).
> Unfortunately, it then also puts all CARP interfaces in MASTER state,
> though the other node works perfectly as master already. This will, of
> course, knock down our entire network until I manually put down the carp
> interfaces.
>
> I have increased kern.maxclusters to gain more time for debugging of the
> memory leak. However, all I could find out so far is that lots of mbufs
> are allocated while there is no significant traffic to be handled
> (remember the machine is the CARP backup). The machine crashes within 15
> minutes after reboot.

ok that is weird. icmp_error as called in pf_send_icmp does not m_free
anything but the passed mbuf, and we now just bail if tghe allocation
of it fails. so i have a hard time seeing this as related... might be
something completely different. and finding mbuf leaks tends to be
damn hard and following a lot of code...

--
Henning Brauer, hb@..., henning@...
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam


Re: CARP node crashing reproducibly (4.3-stable)

by Stephan A. Rickauer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2008-07-14 at 17:38 +0200, Henning Brauer wrote:
> > I have increased kern.maxclusters to gain more time for debugging of the
> > memory leak. However, all I could find out so far is that lots of mbufs
> > are allocated while there is no significant traffic to be handled
> > (remember the machine is the CARP backup). The machine crashes within 15
> > minutes after reboot.
>
> ok that is weird. icmp_error as called in pf_send_icmp does not m_free
> anything but the passed mbuf, and we now just bail if tghe allocation
> of it fails. so i have a hard time seeing this as related...

Yes, you are right. The leak we've seen is due to a kernel build we must
have introduced by using an unclean source tree. Problem solved.
However, the patch you've implemented in 1.610 of pf.c does fix the
crashes we've seen before.

Thanks a lot!

--

 Stephan A. Rickauer

 -----------------------------------------------------------
 Institute of Neuroinformatics         Tel  +41 44 635 30 50
 University / ETH Zurich               Sec  +41 44 635 30 52
 Winterthurerstrasse 190               Fax  +41 44 635 30 53
 CH-8057 Zurich                        Web    www.ini.uzh.ch