FreeBSD 8.0 - network stack crashes?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Up until yesterday, we have been running FreeBSD-CURRENT of 12/08.  We
started to see a couple months ago some very odd network behavior.
Something happens to the stack that causes processes accessing the network
to just hang.  After the problem happens, usually (but not always), you
can't ssh in.  Always, you can't ssh or telnet out, and nothing can access
the NFS shares on the server.  You can ping everything from the server.
You can't even do a route add, you can't ssh if you use just the IP
address (although pinging with hostnames it doesn't have cached or in
hosts table resolves).  When you try to ssh out, do a route add from the
box, the process just hangs.  You can't control C it at all, it hangs
forever.  There is nothing in dmesg or messages to indicate an issue.  I
try to up/down the interfaces.  In CURRENT-12/08, it may allow things to
work for like 30s.

We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to
happen a lot more often.  We expected that was related with the increase
in network performance.  At least in 8.0-RC2, I did see a large amount of
input errors with netstat -in on the heavily loaded interface before it
started the locking up behavior.  I have replaced the ethernet cable and
move ports.  The Catalyst 3650 never records any errors.  The problem
would reoccur in about 5 minutes once our load kicked in this morning.


One change in this upgrade, we switched from NFS v2 to v3.  When we
downgraded to the previous OS, we stayed at v3.  The problem was just
about as bad with v3 with the 12/08 OS

We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
It ran for about an hour and a half and then the issue came up

We are currently back to the 12/08 version using NFS2 and watching things.

We are using a Dell PowerEdge 2950-iii, the problem happens when using the
onboard nics using the bce driver and with an Intel card using the em
driver

I am hunting down any MTU/duplex/speed problems that could cause it
(haven't found any so far).  Of course, any problems on the network
wouldn't (ideally) freak out the network stack on the server).  I don't
know how to troubleshoot this further on the server since I am not getting
any problems indicated in logging, panics, cores, etc.

Any help is appreciated.

Thanks,

Weldon
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Gavin Atkinson-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-11-02 at 10:52 -0500, Weldon S Godfrey 3 wrote:

> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08.  We
> started to see a couple months ago some very odd network behavior.
> Something happens to the stack that causes processes accessing the network
> to just hang.  After the problem happens, usually (but not always), you
> can't ssh in.  Always, you can't ssh or telnet out, and nothing can access
> the NFS shares on the server.  You can ping everything from the server.
> You can't even do a route add, you can't ssh if you use just the IP
> address (although pinging with hostnames it doesn't have cached or in
> hosts table resolves).  When you try to ssh out, do a route add from the
> box, the process just hangs.  You can't control C it at all, it hangs
> forever.  There is nothing in dmesg or messages to indicate an issue.  I
> try to up/down the interfaces.  In CURRENT-12/08, it may allow things to
> work for like 30s.

Some things that would be useful:

- Does "arp -da" fix things?
- What's the output of "netstat -m" while the networking is broken?
- What does CTRL-T show for the hung SSH or route processes?
- What does "procstat -kk" on the same processes show?
- Does going to single user mode ("init 1" and killing off any leftover
processes) cause the machine to start working again?  If so, what's the
output of "netstat -m" afterwards?

If you look to be hitting some of the limits shown by "netstat -m", try
logging the date, "netstat -m" and "vmstat -m" to a file every 30
seconds or similar so that we can see if it is a memory leak, and what
may be leaking.

Gavin
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told me:

>
> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08.  We started
> to see a couple months ago some very odd network behavior. Something happens
> to the stack that causes processes accessing the network to just hang.  After
> the problem happens, usually (but not always), you can't ssh in.  Always, you
> can't ssh or telnet out, and nothing can access the NFS shares on the server.
> You can ping everything from the server. You can't even do a route add, you
> can't ssh if you use just the IP address (although pinging with hostnames it
> doesn't have cached or in hosts table resolves).  When you try to ssh out, do
> a route add from the box, the process just hangs.  You can't control C it at
> all, it hangs forever.  There is nothing in dmesg or messages to indicate an
> issue.  I try to up/down the interfaces.  In CURRENT-12/08, it may allow
> things to work for like 30s.
>
> We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to happen
> a lot more often.  We expected that was related with the increase in network
> performance.  At least in 8.0-RC2, I did see a large amount of input errors
> with netstat -in on the heavily loaded interface before it started the locking
> up behavior.  I have replaced the ethernet cable and move ports.  The Catalyst
> 3650 never records any errors.  The problem would reoccur in about 5 minutes
> once our load kicked in this morning.
>
>
> One change in this upgrade, we switched from NFS v2 to v3.  When we downgraded
> to the previous OS, we stayed at v3.  The problem was just about as bad with
> v3 with the 12/08 OS
>
> We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
> It ran for about an hour and a half and then the issue came up
>
> We are currently back to the 12/08 version using NFS2 and watching things.
>
> We are using a Dell PowerEdge 2950-iii, the problem happens when using the
> onboard nics using the bce driver and with an Intel card using the em driver
>
> I am hunting down any MTU/duplex/speed problems that could cause it (haven't
> found any so far).  Of course, any problems on the network wouldn't (ideally)
> freak out the network stack on the server).  I don't know how to troubleshoot
> this further on the server since I am not getting any problems indicated in
> logging, panics, cores, etc.
>
> Any help is appreciated.
>


I have swapped out the computer, switch, ethernet card, 3ware card.  We
are running on 8.0-CURRENT 12/08 that was what we where using with a lot
less issues.  No help.

If it happens again, I am going to try to do a netif restart and routing
restart.  Although I believe I tried that at the begining and it did not
help.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around 4:11pm, Weldon S Godfrey 3 told me:

>
>
> If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told
> me:
>
>>
>> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08.  We
>> started to see a couple months ago some very odd network behavior. Something
>> happens to the stack that causes processes accessing the network to just
>> hang.  After the problem happens, usually (but not always), you can't ssh
>> in.  Always, you can't ssh or telnet out, and nothing can access the NFS
>> shares on the server. You can ping everything from the server. You can't
>> even do a route add, you can't ssh if you use just the IP address (although
>> pinging with hostnames it doesn't have cached or in hosts table resolves).
>> When you try to ssh out, do a route add from the box, the process just
>> hangs.  You can't control C it at all, it hangs forever.  There is nothing
>> in dmesg or messages to indicate an issue.  I try to up/down the interfaces.
>> In CURRENT-12/08, it may allow things to work for like 30s.
>>
>> We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to
>> happen a lot more often.  We expected that was related with the increase in
>> network performance.  At least in 8.0-RC2, I did see a large amount of input
>> errors with netstat -in on the heavily loaded interface before it started
>> the locking up behavior.  I have replaced the ethernet cable and move ports.
>> The Catalyst 3650 never records any errors.  The problem would reoccur in
>> about 5 minutes once our load kicked in this morning.
>>
>>
>> One change in this upgrade, we switched from NFS v2 to v3.  When we
>> downgraded to the previous OS, we stayed at v3.  The problem was just about
>> as bad with v3 with the 12/08 OS
>>
>> We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
>> It ran for about an hour and a half and then the issue came up
>>
>> We are currently back to the 12/08 version using NFS2 and watching things.
>>
>> We are using a Dell PowerEdge 2950-iii, the problem happens when using the
>> onboard nics using the bce driver and with an Intel card using the em driver
>>
>> I am hunting down any MTU/duplex/speed problems that could cause it (haven't
>> found any so far).  Of course, any problems on the network wouldn't
>> (ideally) freak out the network stack on the server).  I don't know how to
>> troubleshoot this further on the server since I am not getting any problems
>> indicated in logging, panics, cores, etc.
>>
>> Any help is appreciated.
>>
>
>
> I have swapped out the computer, switch, ethernet card, 3ware card.  We are
> running on 8.0-CURRENT 12/08 that was what we where using with a lot less
> issues.  No help.
>
> If it happens again, I am going to try to do a netif restart and routing
> restart.  Although I believe I tried that at the begining and it did not help.
>

BTW.. doing a netif / routing restart doesn't help
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Ivan Voras-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Weldon S Godfrey 3 wrote:

> I don't
> know how to troubleshoot this further on the server since I am not
> getting any problems indicated in logging, panics, cores, etc.

If you have console access to the system, the generic advice would be to
compile a kernel with the kernel debugger - options KDB and DDB (see
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html),
enter the debugger, force a kernel dump file to be created (by entering
"call doadump") and then proceed with post-mortem examination of the
kernel at your leisure (e.g. from a remote ssh console, etc).

See
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html 
for instructions on what information to collect.

If you can provoke your problem with using WITNESS that would probably
be great, but it will slow down your production machine noticeably. When
WITNESS is enabled you might also get more information - such as LOR
warnings, which you should also collect.

Keep the dump file, someone might ask you for more information.

Good luck!

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If memory serves me right, sometime around Tomorrow, Ivan Voras told me:

> Weldon S Godfrey 3 wrote:
>
>> I don't know how to troubleshoot this further on the server since I am not
>> getting any problems indicated in logging, panics, cores, etc.
>
> If you have console access to the system, the generic advice would be to
> compile a kernel with the kernel debugger - options KDB and DDB (see
> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html),
> enter the debugger, force a kernel dump file to be created (by entering "call
> doadump") and then proceed with post-mortem examination of the kernel at your
> leisure (e.g. from a remote ssh console, etc).
>
> See
> http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html 
> for instructions on what information to collect.
>
> If you can provoke your problem with using WITNESS that would probably be
> great, but it will slow down your production machine noticeably. When WITNESS
> is enabled you might also get more information - such as LOR warnings, which
> you should also collect.
>
> Keep the dump file, someone might ask you for more information.
>

Thanks, I will work on trying to get a system with those enabled.

Another thought that came to mind that this sounds like some sort of
network buffer exhaustion. Is there anything to look for there?
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Pieter de Goeje :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 03 November 2009 02:02:08 Weldon S Godfrey 3 wrote:

> If memory serves me right, sometime around Tomorrow, Ivan Voras told me:
>
> > Weldon S Godfrey 3 wrote:
> >
> >> I don't know how to troubleshoot this further on the server since I am not
> >> getting any problems indicated in logging, panics, cores, etc.
> >
> > If you have console access to the system, the generic advice would be to
> > compile a kernel with the kernel debugger - options KDB and DDB (see
> > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html),
> > enter the debugger, force a kernel dump file to be created (by entering "call
> > doadump") and then proceed with post-mortem examination of the kernel at your
> > leisure (e.g. from a remote ssh console, etc).
> >
> > See
> > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html 
> > for instructions on what information to collect.
> >
> > If you can provoke your problem with using WITNESS that would probably be
> > great, but it will slow down your production machine noticeably. When WITNESS
> > is enabled you might also get more information - such as LOR warnings, which
> > you should also collect.
> >
> > Keep the dump file, someone might ask you for more information.
> >
>
> Thanks, I will work on trying to get a system with those enabled.
>
> Another thought that came to mind that this sounds like some sort of
> network buffer exhaustion. Is there anything to look for there?

Are you perhaps using em(4)? There was an mbuf leak in the driver, which was fixed recently.
You can check mbuf usage with netstat -m.

--
Pieter de Goeje
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around Yesterday, Gavin Atkinson told me:

Gavin, thank you A LOT for helping us with this, I have answered as much
as I can from the most recent crash below.  We did hit max mbufs.  It is
at 25Kclusters, which is the default.  I have upped it to 32K because a
rather old article mentioned that as the top end and I need to get into
work so I am not trying to do this with a remote console to go higher.  I
have already set it to reboot next with 64K clusters.  I already have kmem
maxed to what is bootable (or at least at one time) in 8.0, 4GB, how high
can I safely go?  This is a NFS server running ZFS with sustained 5 min
averages of 120-200Mb/s running as a store for a mail system.

> Some things that would be useful:
>
> - Does "arp -da" fix things?

no, it hangs like ssh, route add, etc

> - What's the output of "netstat -m" while the networking is broken?
Tue Nov  3 07:02:11 CST 2009
36971/2033/39004 mbufs in use (current/cache/total)
24869/731/25600/25600 mbuf clusters in use (current/cache/total/max)
24314/731 mbuf+clusters out of packet secondary zone in use
(current/cache)
0/35/35/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
58980K/2110K/61091K bytes allocated to network (current/cache/total)
0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines


> - What does CTRL-T show for the hung SSH or route processes?

of the arp:
load: 0.01  cmd: arp 6144 [zonelimit] 0.00u 0.00s 0% 996k

> - What does "procstat -kk" on the same processes show?
sorry I couldn't get this to run this time, remote  console issues

> - Does going to single user mode ("init 1" and killing off any leftover
> processes) cause the machine to start working again?  If so, what's the
> output of "netstat -m" afterwards?

no, mbuf was still maxed out


below is the last vmstat -m         Type InUse MemUse HighUse Requests
Size(s)
   ntfs_nthash     1   512K       -        1
     pfs_nodes    20     5K       -       20  256
          GEOM   262    52K       -     4551
16,32,64,128,256,512,1024,2048
        isadev     9     2K       -        9  128
          cdev    13     4K       -       13  256
         sigio     1     1K       -        1  64
      filedesc   127    64K       -     6412  512,1024
          kenv    75    11K       -       80  16,32,64,128
        kqueue     0     0K       -      188  256,2048
     proc-args    41     2K       -     5647  16,32,64,128
       scsi_cd     0     0K       -      333  16
       ithread   119    21K       -      119  32,128,256
        acpica   888    78K       -   121045  16,32,64,128,256,512,1024
        KTRACE   100    13K       -      100  128
      acpitask     0     0K       -        1  64
        linker   139   596K       -      181
16,32,64,128,256,512,1024,2048
         lockf    11     2K       -      399  64,128
CAM dev queue     4     1K       -        4  128
        ip6ndp     5     1K       -        5  64,128
          temp    48   562K       - 14544952
16,32,64,128,256,512,1024,2048,4096
        devbuf 17105 36341K       -    24988
16,32,64,128,512,1024,2048,4096
        module   420    53K       -      420  128
      mtx_pool     1     8K       -        1
           osd     2     1K       -        2  16
     CAM queue    62    52K       -     2211
16,32,64,128,256,512,1024,2048
       subproc   562   722K       -     6851  512,4096
          proc     2    16K       -        2
       session    33     5K       -      127  128
          pgrp    37     5K       -      190  128
          cred    62    16K       - 29192756  256
       uidinfo     4     3K       -       99  64,2048
        plimit    17     5K       -      910  256
       acpisem    15     1K       -       15  64
     sysctltmp     0     0K       -    13867
16,32,64,128,256,512,1024,2048,4096
     sysctloid  5400   270K       -     5782  16,32,64,128
        sysctl     0     0K       -    11423  16,32,64
       callout     7  3584K       -        7
          umtx   780    98K       -      780  128
      p1003.1b     1     1K       -        1  16
          SWAP     2  3281K       -        2  64
        kbdmux     8     9K       -        8  16,256,512,2048,4096
        bus-sc   103   188K       -     4558
16,32,64,128,256,512,1024,2048,4096
           bus  1174    93K       -    57792  16,32,64,128,256,512,1024
         clist    54     7K       -       54  128
       devstat    32    65K       -       32  32,4096
  eventhandler    64     6K       -       64  64,128
          kobj   276  1104K       -      387  4096
          rman   144    18K       -      601  16,32,128
        mfibuf     3    21K       -       12  32,256,512,2048,4096
          sbuf     0     0K       -    14350
16,32,64,128,256,512,1024,2048,4096
       scsi_da     0     0K       -      504  16
       CAM SIM     4     1K       -        4  256
         stack     0     0K       -      194  256
     taskqueue    13     2K       -       13  16,32,128
        Unitno    11     1K       -     4759  32,64
           iov     0     0K       -     1193  16,64,256,512
        select    98    13K       -       98  128
      ioctlops     0     0K       -    14716
16,32,64,128,256,512,1024,4096
           msg     4    30K       -        4  2048,4096
           sem     4     8K       -        4  512,1024,2048,4096
           shm     1    16K       -        1
           tty    25    25K       -       25  1024
           pts     3     1K       -        3  256
      mbuf_tag     0     0K       -        2  32
         shmfd     1     8K       -        1
    CAM periph    54    14K       -      371  16,32,64,128,256
           pcb    28   157K       -      148  16,32,128,1024,2048,4096
        soname     5     1K       -    18699  16,32,128
        biobuf     4     8K       -        6  2048
      vfscache     1  1024K       -        1
    cl_savebuf     0     0K       -        7  64,128
   export_host     5     3K       -        5  512
      vfs_hash     1   512K       -        1
        vnodes     2     1K       -        2  256
   vnodemarker     0     0K       -     4832  512
         mount   222    15K       -      807  16,32,64,128,256,1024
   ata_generic     1     1K       -        1  1024
           BPF     4     1K       -        4  128
   ether_multi    22     2K       -       24  16,32,64
        ifaddr    54    14K       -       54  32,64,128,256,512,4096
         ifnet     5     9K       -        5  256,2048
         clone     5    20K       -        5  4096
        arpcom     3     1K       -        3  16
      routetbl    65    11K       -      949  32,64,128,256,512
      in_multi     3     1K       -        3  64
     sctp_iter     0     0K       -        3  256
      sctp_ifn     3     1K       -        3  128
      sctp_ifa     4     1K       -        4  128
      sctp_vrf     1     1K       -        1  64
     sctp_a_it     0     0K       -        3  16
     hostcache     1    28K       -        1
    acd_driver     1     2K       -        1  2048
      syncache     1    92K       -        1
     in6_multi    19     2K       -       19  32,64,128
  ip6_moptions     1     1K       -        1  32
       NFS FHA    13     3K       - 18480347  64,2048
           rpc  1381   716K       - 82214178  32,64,128,256,512,2048
audit_evclass   168     6K       -      205  32
        newblk     1     1K       -        1  512
      inodedep     1   512K       -        1
       pagedep     1   128K       -        1
   ufs_dirhash    45     9K       -       45  16,32,64,128,512
     ufs_mount     3    11K       -        3  512,2048
       UMAHash     3   130K       -       12  512,1024,2048,4096
       acpidev    56     4K       -       56  64
     vm_pgdata     2   129K       -        2  128
       CAM XPT   589   369K       -     2047  32,64,128,256,1024
       io_apic     2     4K       -        2  2048
      pci_link    16     2K       -       16  32,128
       memdesc     1     4K       -        1  4096
           msi     3     1K       -        3  128
      nexusdev     3     1K       -        3  16
       entropy  1024    64K       -     1024  64
  twa_commands     2   104K       -      101  256
      atkbddev     2     1K       -        2  64
          UART     6     4K       -        6  16,512,1024
         USBHC     1     1K       -        1  128
        USBdev    30    11K       -       30  16,32,64,128,256,512
           USB   157    54K       -      190  16,32,64,128,256,1024
        DEVFS1   152    76K       -      153  512
        DEVFS3   165    42K       -      167  256
         DEVFS    16     1K       -       17  16,128
       solaris 822038 707024K       - 235790398
16,32,64,128,256,512,1024,2048,4096
    kstat_data     2     1K       -        2  64

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around 9:37am, Pieter de Goeje told me:

> Are you perhaps using em(4)? There was an mbuf leak in the driver, which was fixed recently.
> You can check mbuf usage with netstat -m.
>

we are using onboard NICs on the Dell using the bce driver.  We did try
several times to see if using an intel PCIexpress card using the em
driver, and we had the same symptoms.

Could the bce driver have the same leak?

Thanks!

Weldon
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Ivan Voras-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/11/3 Weldon S Godfrey 3 <weldon@...>:

>
>
> If memory serves me right, sometime around 9:37am, Pieter de Goeje told me:
>
>> Are you perhaps using em(4)? There was an mbuf leak in the driver, which
>> was fixed recently.
>> You can check mbuf usage with netstat -m.
>>
>
> we are using onboard NICs on the Dell using the bce driver.  We did try
> several times to see if using an intel PCIexpress card using the em driver,
> and we had the same symptoms.
>
> Could the bce driver have the same leak?

It would be unlikely to pass unnoticed since Dells are common hardware...
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Galactic_Dominator :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 3, 2009 at 7:32 AM, Weldon S Godfrey 3 <weldon@...
> wrote:

>
>
> If memory serves me right, sometime around Yesterday, Gavin Atkinson told
> me:
>
> Gavin, thank you A LOT for helping us with this, I have answered as much as
> I can from the most recent crash below.  We did hit max mbufs.  It is at
> 25Kclusters, which is the default.  I have upped it to 32K because a rather
> old article mentioned that as the top end and I need to get into work so I
> am not trying to do this with a remote console to go higher.  I have already
> set it to reboot next with 64K clusters.  I already have kmem maxed to what
> is bootable (or at least at one time) in 8.0, 4GB, how high can I safely go?
>  This is a NFS server running ZFS with sustained 5 min averages of
> 120-200Mb/s running as a store for a mail system.
>
>
>  Some things that would be useful:
>>
>> - Does "arp -da" fix things?
>>
>
> no, it hangs like ssh, route add, etc
>
>
>  - What's the output of "netstat -m" while the networking is broken?
>>
> Tue Nov  3 07:02:11 CST 2009
> 36971/2033/39004 mbufs in use (current/cache/total)
> 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max)
> 24314/731 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/35/35/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 58980K/2110K/61091K bytes allocated to network (current/cache/total)
> 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
>
>
>  - What does CTRL-T show for the hung SSH or route processes?
>>
>
> of the arp:
> load: 0.01  cmd: arp 6144 [zonelimit] 0.00u 0.00s 0% 996k
>
>
>  - What does "procstat -kk" on the same processes show?
>>
> sorry I couldn't get this to run this time, remote  console issues
>
>
>  - Does going to single user mode ("init 1" and killing off any leftover
>> processes) cause the machine to start working again?  If so, what's the
>> output of "netstat -m" afterwards?
>>
>
> no, mbuf was still maxed out
>
>
> below is the last vmstat -m         Type InUse MemUse HighUse Requests
> Size(s)
>  ntfs_nthash     1   512K       -        1
>    pfs_nodes    20     5K       -       20  256
>         GEOM   262    52K       -     4551 16,32,64,128,256,512,1024,2048
>       isadev     9     2K       -        9  128
>         cdev    13     4K       -       13  256
>        sigio     1     1K       -        1  64
>     filedesc   127    64K       -     6412  512,1024
>         kenv    75    11K       -       80  16,32,64,128
>       kqueue     0     0K       -      188  256,2048
>    proc-args    41     2K       -     5647  16,32,64,128
>      scsi_cd     0     0K       -      333  16
>      ithread   119    21K       -      119  32,128,256
>       acpica   888    78K       -   121045  16,32,64,128,256,512,1024
>       KTRACE   100    13K       -      100  128
>     acpitask     0     0K       -        1  64
>       linker   139   596K       -      181 16,32,64,128,256,512,1024,2048
>        lockf    11     2K       -      399  64,128
> CAM dev queue     4     1K       -        4  128
>       ip6ndp     5     1K       -        5  64,128
>         temp    48   562K       - 14544952
> 16,32,64,128,256,512,1024,2048,4096
>       devbuf 17105 36341K       -    24988 16,32,64,128,512,1024,2048,4096
>       module   420    53K       -      420  128
>     mtx_pool     1     8K       -        1
>          osd     2     1K       -        2  16
>    CAM queue    62    52K       -     2211 16,32,64,128,256,512,1024,2048
>      subproc   562   722K       -     6851  512,4096
>         proc     2    16K       -        2
>      session    33     5K       -      127  128
>         pgrp    37     5K       -      190  128
>         cred    62    16K       - 29192756  256
>      uidinfo     4     3K       -       99  64,2048
>       plimit    17     5K       -      910  256
>      acpisem    15     1K       -       15  64
>    sysctltmp     0     0K       -    13867
> 16,32,64,128,256,512,1024,2048,4096
>    sysctloid  5400   270K       -     5782  16,32,64,128
>       sysctl     0     0K       -    11423  16,32,64
>      callout     7  3584K       -        7
>         umtx   780    98K       -      780  128
>     p1003.1b     1     1K       -        1  16
>         SWAP     2  3281K       -        2  64
>       kbdmux     8     9K       -        8  16,256,512,2048,4096
>       bus-sc   103   188K       -     4558
> 16,32,64,128,256,512,1024,2048,4096
>          bus  1174    93K       -    57792  16,32,64,128,256,512,1024
>        clist    54     7K       -       54  128
>      devstat    32    65K       -       32  32,4096
>  eventhandler    64     6K       -       64  64,128
>         kobj   276  1104K       -      387  4096
>         rman   144    18K       -      601  16,32,128
>       mfibuf     3    21K       -       12  32,256,512,2048,4096
>         sbuf     0     0K       -    14350
> 16,32,64,128,256,512,1024,2048,4096
>      scsi_da     0     0K       -      504  16
>      CAM SIM     4     1K       -        4  256
>        stack     0     0K       -      194  256
>    taskqueue    13     2K       -       13  16,32,128
>       Unitno    11     1K       -     4759  32,64
>          iov     0     0K       -     1193  16,64,256,512
>       select    98    13K       -       98  128
>     ioctlops     0     0K       -    14716 16,32,64,128,256,512,1024,4096
>          msg     4    30K       -        4  2048,4096
>          sem     4     8K       -        4  512,1024,2048,4096
>          shm     1    16K       -        1
>          tty    25    25K       -       25  1024
>          pts     3     1K       -        3  256
>     mbuf_tag     0     0K       -        2  32
>        shmfd     1     8K       -        1
>   CAM periph    54    14K       -      371  16,32,64,128,256
>          pcb    28   157K       -      148  16,32,128,1024,2048,4096
>       soname     5     1K       -    18699  16,32,128
>       biobuf     4     8K       -        6  2048
>     vfscache     1  1024K       -        1
>   cl_savebuf     0     0K       -        7  64,128
>  export_host     5     3K       -        5  512
>     vfs_hash     1   512K       -        1
>       vnodes     2     1K       -        2  256
>  vnodemarker     0     0K       -     4832  512
>        mount   222    15K       -      807  16,32,64,128,256,1024
>  ata_generic     1     1K       -        1  1024
>          BPF     4     1K       -        4  128
>  ether_multi    22     2K       -       24  16,32,64
>       ifaddr    54    14K       -       54  32,64,128,256,512,4096
>        ifnet     5     9K       -        5  256,2048
>        clone     5    20K       -        5  4096
>       arpcom     3     1K       -        3  16
>     routetbl    65    11K       -      949  32,64,128,256,512
>     in_multi     3     1K       -        3  64
>    sctp_iter     0     0K       -        3  256
>     sctp_ifn     3     1K       -        3  128
>     sctp_ifa     4     1K       -        4  128
>     sctp_vrf     1     1K       -        1  64
>    sctp_a_it     0     0K       -        3  16
>    hostcache     1    28K       -        1
>   acd_driver     1     2K       -        1  2048
>     syncache     1    92K       -        1
>    in6_multi    19     2K       -       19  32,64,128
>  ip6_moptions     1     1K       -        1  32
>      NFS FHA    13     3K       - 18480347  64,2048
>          rpc  1381   716K       - 82214178  32,64,128,256,512,2048
> audit_evclass   168     6K       -      205  32
>       newblk     1     1K       -        1  512
>     inodedep     1   512K       -        1
>      pagedep     1   128K       -        1
>  ufs_dirhash    45     9K       -       45  16,32,64,128,512
>    ufs_mount     3    11K       -        3  512,2048
>      UMAHash     3   130K       -       12  512,1024,2048,4096
>      acpidev    56     4K       -       56  64
>    vm_pgdata     2   129K       -        2  128
>      CAM XPT   589   369K       -     2047  32,64,128,256,1024
>      io_apic     2     4K       -        2  2048
>     pci_link    16     2K       -       16  32,128
>      memdesc     1     4K       -        1  4096
>          msi     3     1K       -        3  128
>     nexusdev     3     1K       -        3  16
>      entropy  1024    64K       -     1024  64
>  twa_commands     2   104K       -      101  256
>     atkbddev     2     1K       -        2  64
>         UART     6     4K       -        6  16,512,1024
>        USBHC     1     1K       -        1  128
>       USBdev    30    11K       -       30  16,32,64,128,256,512
>          USB   157    54K       -      190  16,32,64,128,256,1024
>       DEVFS1   152    76K       -      153  512
>       DEVFS3   165    42K       -      167  256
>        DEVFS    16     1K       -       17  16,128
>      solaris 822038 707024K       - 235790398
> 16,32,64,128,256,512,1024,2048,4096
>   kstat_data     2     1K       -        2  64
>
>
> _______________________________________________
> freebsd-current@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
>
 from man tuning:

     kern.ipc.nmbclusters may be adjusted to increase the number of network
     mbufs the system is willing to allocate.  Each cluster represents
approx-
     imately 2K of memory, so a value of 1024 represents 2M of kernel memory
     reserved for network buffers.  You can do a simple calculation to
figure
     out how many you need.  If you have a web server which maxes out at
1000
     simultaneous connections, and each connection eats a 16K receive and
16K
     send buffer, you need approximately 32MB worth of network buffers to
deal
     with it.  A good rule of thumb is to multiply by 2, so 32MBx2 = 64MB/2K
=
     32768.  So for this case you would want to set kern.ipc.nmbclusters to
     32768.  We recommend values between 1024 and 4096 for machines with
mod-
     erates amount of memory, and between 4096 and 32768 for machines with
     greater amounts of memory.  Under no circumstances should you specify
an
     arbitrarily high value for this parameter, it could lead to a boot-time
     crash.  The -m option to netstat(1) may be used to observe network
clus-
     ter use.  Older versions of FreeBSD do not have this tunable and
require
     that the kernel config(8) option NMBCLUSTERS be set instead.

     More and more programs are using the sendfile(2) system call to
transmit
     files over the network.  The kern.ipc.nsfbufs sysctl controls the
number
     of file system buffers sendfile(2) is allowed to use to perform its
work.
     This parameter nominally scales with kern.maxusers so you should not
need
     to modify this parameter except under extreme circumstances.  See the
     TUNING section in the sendfile(2) manual page for details.



--
Adam Vande More
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Gavin Atkinson-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2009-11-03 at 08:32 -0500, Weldon S Godfrey 3 wrote:

>
> If memory serves me right, sometime around Yesterday, Gavin Atkinson told me:
>
> Gavin, thank you A LOT for helping us with this, I have answered as much
> as I can from the most recent crash below.  We did hit max mbufs.  It is
> at 25Kclusters, which is the default.  I have upped it to 32K because a
> rather old article mentioned that as the top end and I need to get into
> work so I am not trying to do this with a remote console to go higher.  I
> have already set it to reboot next with 64K clusters.  I already have kmem
> maxed to what is bootable (or at least at one time) in 8.0, 4GB, how high
> can I safely go?  This is a NFS server running ZFS with sustained 5 min
> averages of 120-200Mb/s running as a store for a mail system.
>
> > Some things that would be useful:
> >
> > - Does "arp -da" fix things?
>
> no, it hangs like ssh, route add, etc
>
> > - What's the output of "netstat -m" while the networking is broken?
> Tue Nov  3 07:02:11 CST 2009
> 36971/2033/39004 mbufs in use (current/cache/total)
> 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max)
> 24314/731 mbuf+clusters out of packet secondary zone in use
> (current/cache)
> 0/35/35/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 58980K/2110K/61091K bytes allocated to network (current/cache/total)
> 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines

OK, at least we've figured out what is going wrong then.  As a
workaround to get the machine to stay up longer, you should be able to
set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we
can resolve this soon.

Firstly, what kernel was the above output from?  And what network card
are you using?  In your initial post you mentioned testing both bce(4)
and em(4) cards, be aware that em(4) had an issue that would cause
exactly this issue, which was fixed with a commit on September 11th
(r197093).  Make sure your kernel is from after that date if you are
using em(4).  I guess it is also possible that bce(4) has the same
issue, I'm not aware of any fixes to it recently.

So, from here, I think the best thing would be to just use the em(4) NIC
and an up-to-date kernel, and see if you can reproduce the issue.

How important is this machine?  If em(4) works, are you able to help
debug the issues with the bce(4) driver?

Thanks,

Gavin
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around 3:13pm, Gavin Atkinson told me:

> OK, at least we've figured out what is going wrong then.  As a
> workaround to get the machine to stay up longer, you should be able to
> set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we
> can resolve this soon.
>
> Firstly, what kernel was the above output from?  And what network card
> are you using?  In your initial post you mentioned testing both bce(4)
> and em(4) cards, be aware that em(4) had an issue that would cause
> exactly this issue, which was fixed with a commit on September 11th
> (r197093).  Make sure your kernel is from after that date if you are
> using em(4).  I guess it is also possible that bce(4) has the same
> issue, I'm not aware of any fixes to it recently.
>
> So, from here, I think the best thing would be to just use the em(4) NIC
> and an up-to-date kernel, and see if you can reproduce the issue.
>
> How important is this machine?  If em(4) works, are you able to help
> debug the issues with the bce(4) driver?
>
> Thanks,
>
> Gavin
>

we used the em card only a few times, but each time we used it, the
problem happened so we have been staying with the on board nics using the
bce driver.  Would leaving in the em card cause any issues, even if it
isn't up?

This output was from a kernel on 12/08.  The issue really came up while we
tried to swap to 8.0-RC2.  We plan to swap back sometime in the near
future.  The same symptoms happened with RC2 so I am sure it is a kmem
exhaustion.  I am guessing v3 requires a lot more.  When we switch, i'll
change to using the em card.

This machine is very important.  I could set up an additional machine, but
I don't have the ability to simulate the load nor have the large drive
array attached.

Thanks!

Weldon
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Tom Judge :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Weldon S Godfrey 3 wrote:

>
>
> If memory serves me right, sometime around 9:37am, Pieter de Goeje told me:
>
>> Are you perhaps using em(4)? There was an mbuf leak in the driver,
>> which was fixed recently.
>> You can check mbuf usage with netstat -m.
>>
>
> we are using onboard NICs on the Dell using the bce driver.  We did try
> several times to see if using an intel PCIexpress card using the em
> driver, and we had the same symptoms.
>
> Could the bce driver have the same leak?

The bce driver does not have a memory leak, it does however have a bug
which causes memory fragmentation leading to denied mbuf allocation.


There is a work around for this in current, you can get the patch like this:

http://svn.freebsd.org/viewvc/base/head/

You need to put

options BCE_JUMBO_HDRSPLIT

In your kernel to enable the work arround.

Tom

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Tom Judge :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tom Judge wrote:

> Weldon S Godfrey 3 wrote:
>>
>>
>> If memory serves me right, sometime around 9:37am, Pieter de Goeje
>> told me:
>>
>>> Are you perhaps using em(4)? There was an mbuf leak in the driver,
>>> which was fixed recently.
>>> You can check mbuf usage with netstat -m.
>>>
>>
>> we are using onboard NICs on the Dell using the bce driver.  We did
>> try several times to see if using an intel PCIexpress card using the
>> em driver, and we had the same symptoms.
>>
>> Could the bce driver have the same leak?
>
> The bce driver does not have a memory leak, it does however have a bug
> which causes memory fragmentation leading to denied mbuf allocation.
>
>
> There is a work around for this in current, you can get the patch like
> this:
>
> http://svn.freebsd.org/viewvc/base/head/
>
That should be:

  svn diff -r 198319:198320 http://svn.freebsd.org/base/head


> You need to put
>
> options        BCE_JUMBO_HDRSPLIT
>
> In your kernel to enable the work arround.
>
> Tom
>
> _______________________________________________
> freebsd-current@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Ivan Voras-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Something else just occured to me - do you use ipfw?

--
Sent from my p1i mobile phone

------- Original message -------

> From: Weldon S Godfrey 3 <weldon@...>
> Cc: freebsd-current@..., ivoras@...
> Sent: 3.11.'09,  14:35
>
>
>
> If memory serves me right, sometime around 9:37am, Pieter de Goeje told
> me:
>
>> Are you perhaps using em(4)? There was an mbuf leak in the driver, which
>> was fixed recently.
>> You can check mbuf usage with netstat -m.
>>
>
> we are using onboard NICs on the Dell using the bce driver.  We did try
> several times to see if using an intel PCIexpress card using the em
> driver, and we had the same symptoms.
>
> Could the bce driver have the same leak?
>
> Thanks!
>
> Weldon

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Gavin Atkinson-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2009-11-03 at 10:43 -0500, Weldon S Godfrey 3 wrote:
> This output was from a kernel on 12/08.  The issue really came up while we
> tried to swap to 8.0-RC2.  We plan to swap back sometime in the near
> future.  The same symptoms happened with RC2 so I am sure it is a kmem
> exhaustion.  I am guessing v3 requires a lot more.  When we switch, i'll
> change to using the em card.

Sorry, can you clarify: have you ever tested the em card with the
8.0-RC2 kernel?

> This machine is very important.  I could set up an additional machine, but
> I don't have the ability to simulate the load nor have the large drive
> array attached.

OK, thanks.

Gavin
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




If memory serves me right, sometime around 5:59pm, Ivan Voras told me:

> Something else just occured to me - do you use ipfw?
>
> --

not on this server.

Thanks,

Weldon
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around 5:02pm, Gavin Atkinson told me:

> On Tue, 2009-11-03 at 10:43 -0500, Weldon S Godfrey 3 wrote:
>> This output was from a kernel on 12/08.  The issue really came up while we
>> tried to swap to 8.0-RC2.  We plan to swap back sometime in the near
>> future.  The same symptoms happened with RC2 so I am sure it is a kmem
>> exhaustion.  I am guessing v3 requires a lot more.  When we switch, i'll
>> change to using the em card.
>
> Sorry, can you clarify: have you ever tested the em card with the
> 8.0-RC2 kernel?
>

We briefly tried em card with RC2 but went back because it didn't help at
the time.  But we are planning to go back to RC2 soon and I also plan to
use the em card instead.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: FreeBSD 8.0 - network stack crashes?

by Weldon S Godfrey 3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



If memory serves me right, sometime around 10:43am, Weldon S Godfrey 3 told me:

>
>
> If memory serves me right, sometime around 3:13pm, Gavin Atkinson told me:
>
>> OK, at least we've figured out what is going wrong then.  As a
>> workaround to get the machine to stay up longer, you should be able to
>> set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we
>> can resolve this soon.
>>

I upped it to 256K.  What I am trying to wrap my head around is how it was
working somewhat for so long at 24K, but it got to near 65K before I
rebooted it with the higher setting.   Or did I reboot too early?  Is
there any cleanup that isn't triggered intil it reaches max nmbclusters?
I am trying to see if anything on our network has changed to cause this to
become cronic.


_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
< Prev | 1 - 2 | Next >