|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
FreeBSD 8.0 - network stack crashes?Up until yesterday, we have been running FreeBSD-CURRENT of 12/08. We started to see a couple months ago some very odd network behavior. Something happens to the stack that causes processes accessing the network to just hang. After the problem happens, usually (but not always), you can't ssh in. Always, you can't ssh or telnet out, and nothing can access the NFS shares on the server. You can ping everything from the server. You can't even do a route add, you can't ssh if you use just the IP address (although pinging with hostnames it doesn't have cached or in hosts table resolves). When you try to ssh out, do a route add from the box, the process just hangs. You can't control C it at all, it hangs forever. There is nothing in dmesg or messages to indicate an issue. I try to up/down the interfaces. In CURRENT-12/08, it may allow things to work for like 30s. We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to happen a lot more often. We expected that was related with the increase in network performance. At least in 8.0-RC2, I did see a large amount of input errors with netstat -in on the heavily loaded interface before it started the locking up behavior. I have replaced the ethernet cable and move ports. The Catalyst 3650 never records any errors. The problem would reoccur in about 5 minutes once our load kicked in this morning. One change in this upgrade, we switched from NFS v2 to v3. When we downgraded to the previous OS, we stayed at v3. The problem was just about as bad with v3 with the 12/08 OS We went back to RC2 with NFS v2 and appeared to stabilize to a degree. It ran for about an hour and a half and then the issue came up We are currently back to the 12/08 version using NFS2 and watching things. We are using a Dell PowerEdge 2950-iii, the problem happens when using the onboard nics using the bce driver and with an Intel card using the em driver I am hunting down any MTU/duplex/speed problems that could cause it (haven't found any so far). Of course, any problems on the network wouldn't (ideally) freak out the network stack on the server). I don't know how to troubleshoot this further on the server since I am not getting any problems indicated in logging, panics, cores, etc. Any help is appreciated. Thanks, Weldon _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?On Mon, 2009-11-02 at 10:52 -0500, Weldon S Godfrey 3 wrote:
> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08. We > started to see a couple months ago some very odd network behavior. > Something happens to the stack that causes processes accessing the network > to just hang. After the problem happens, usually (but not always), you > can't ssh in. Always, you can't ssh or telnet out, and nothing can access > the NFS shares on the server. You can ping everything from the server. > You can't even do a route add, you can't ssh if you use just the IP > address (although pinging with hostnames it doesn't have cached or in > hosts table resolves). When you try to ssh out, do a route add from the > box, the process just hangs. You can't control C it at all, it hangs > forever. There is nothing in dmesg or messages to indicate an issue. I > try to up/down the interfaces. In CURRENT-12/08, it may allow things to > work for like 30s. Some things that would be useful: - Does "arp -da" fix things? - What's the output of "netstat -m" while the networking is broken? - What does CTRL-T show for the hung SSH or route processes? - What does "procstat -kk" on the same processes show? - Does going to single user mode ("init 1" and killing off any leftover processes) cause the machine to start working again? If so, what's the output of "netstat -m" afterwards? If you look to be hitting some of the limits shown by "netstat -m", try logging the date, "netstat -m" and "vmstat -m" to a file every 30 seconds or similar so that we can see if it is a memory leak, and what may be leaking. Gavin _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told me: > > Up until yesterday, we have been running FreeBSD-CURRENT of 12/08. We started > to see a couple months ago some very odd network behavior. Something happens > to the stack that causes processes accessing the network to just hang. After > the problem happens, usually (but not always), you can't ssh in. Always, you > can't ssh or telnet out, and nothing can access the NFS shares on the server. > You can ping everything from the server. You can't even do a route add, you > can't ssh if you use just the IP address (although pinging with hostnames it > doesn't have cached or in hosts table resolves). When you try to ssh out, do > a route add from the box, the process just hangs. You can't control C it at > all, it hangs forever. There is nothing in dmesg or messages to indicate an > issue. I try to up/down the interfaces. In CURRENT-12/08, it may allow > things to work for like 30s. > > We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to happen > a lot more often. We expected that was related with the increase in network > performance. At least in 8.0-RC2, I did see a large amount of input errors > with netstat -in on the heavily loaded interface before it started the locking > up behavior. I have replaced the ethernet cable and move ports. The Catalyst > 3650 never records any errors. The problem would reoccur in about 5 minutes > once our load kicked in this morning. > > > One change in this upgrade, we switched from NFS v2 to v3. When we downgraded > to the previous OS, we stayed at v3. The problem was just about as bad with > v3 with the 12/08 OS > > We went back to RC2 with NFS v2 and appeared to stabilize to a degree. > It ran for about an hour and a half and then the issue came up > > We are currently back to the 12/08 version using NFS2 and watching things. > > We are using a Dell PowerEdge 2950-iii, the problem happens when using the > onboard nics using the bce driver and with an Intel card using the em driver > > I am hunting down any MTU/duplex/speed problems that could cause it (haven't > found any so far). Of course, any problems on the network wouldn't (ideally) > freak out the network stack on the server). I don't know how to troubleshoot > this further on the server since I am not getting any problems indicated in > logging, panics, cores, etc. > > Any help is appreciated. > I have swapped out the computer, switch, ethernet card, 3ware card. We are running on 8.0-CURRENT 12/08 that was what we where using with a lot less issues. No help. If it happens again, I am going to try to do a netif restart and routing restart. Although I believe I tried that at the begining and it did not help. _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 4:11pm, Weldon S Godfrey 3 told me: > > > If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told > me: > >> >> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08. We >> started to see a couple months ago some very odd network behavior. Something >> happens to the stack that causes processes accessing the network to just >> hang. After the problem happens, usually (but not always), you can't ssh >> in. Always, you can't ssh or telnet out, and nothing can access the NFS >> shares on the server. You can ping everything from the server. You can't >> even do a route add, you can't ssh if you use just the IP address (although >> pinging with hostnames it doesn't have cached or in hosts table resolves). >> When you try to ssh out, do a route add from the box, the process just >> hangs. You can't control C it at all, it hangs forever. There is nothing >> in dmesg or messages to indicate an issue. I try to up/down the interfaces. >> In CURRENT-12/08, it may allow things to work for like 30s. >> >> We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to >> happen a lot more often. We expected that was related with the increase in >> network performance. At least in 8.0-RC2, I did see a large amount of input >> errors with netstat -in on the heavily loaded interface before it started >> the locking up behavior. I have replaced the ethernet cable and move ports. >> The Catalyst 3650 never records any errors. The problem would reoccur in >> about 5 minutes once our load kicked in this morning. >> >> >> One change in this upgrade, we switched from NFS v2 to v3. When we >> downgraded to the previous OS, we stayed at v3. The problem was just about >> as bad with v3 with the 12/08 OS >> >> We went back to RC2 with NFS v2 and appeared to stabilize to a degree. >> It ran for about an hour and a half and then the issue came up >> >> We are currently back to the 12/08 version using NFS2 and watching things. >> >> We are using a Dell PowerEdge 2950-iii, the problem happens when using the >> onboard nics using the bce driver and with an Intel card using the em driver >> >> I am hunting down any MTU/duplex/speed problems that could cause it (haven't >> found any so far). Of course, any problems on the network wouldn't >> (ideally) freak out the network stack on the server). I don't know how to >> troubleshoot this further on the server since I am not getting any problems >> indicated in logging, panics, cores, etc. >> >> Any help is appreciated. >> > > > I have swapped out the computer, switch, ethernet card, 3ware card. We are > running on 8.0-CURRENT 12/08 that was what we where using with a lot less > issues. No help. > > If it happens again, I am going to try to do a netif restart and routing > restart. Although I believe I tried that at the begining and it did not help. > BTW.. doing a netif / routing restart doesn't help _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?Weldon S Godfrey 3 wrote:
> I don't > know how to troubleshoot this further on the server since I am not > getting any problems indicated in logging, panics, cores, etc. If you have console access to the system, the generic advice would be to compile a kernel with the kernel debugger - options KDB and DDB (see http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html), enter the debugger, force a kernel dump file to be created (by entering "call doadump") and then proceed with post-mortem examination of the kernel at your leisure (e.g. from a remote ssh console, etc). See http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html for instructions on what information to collect. If you can provoke your problem with using WITNESS that would probably be great, but it will slow down your production machine noticeably. When WITNESS is enabled you might also get more information - such as LOR warnings, which you should also collect. Keep the dump file, someone might ask you for more information. Good luck! _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around Tomorrow, Ivan Voras told me:
> Weldon S Godfrey 3 wrote: > >> I don't know how to troubleshoot this further on the server since I am not >> getting any problems indicated in logging, panics, cores, etc. > > If you have console access to the system, the generic advice would be to > compile a kernel with the kernel debugger - options KDB and DDB (see > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html), > enter the debugger, force a kernel dump file to be created (by entering "call > doadump") and then proceed with post-mortem examination of the kernel at your > leisure (e.g. from a remote ssh console, etc). > > See > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html > for instructions on what information to collect. > > If you can provoke your problem with using WITNESS that would probably be > great, but it will slow down your production machine noticeably. When WITNESS > is enabled you might also get more information - such as LOR warnings, which > you should also collect. > > Keep the dump file, someone might ask you for more information. > Thanks, I will work on trying to get a system with those enabled. Another thought that came to mind that this sounds like some sort of network buffer exhaustion. Is there anything to look for there? _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?On Tuesday 03 November 2009 02:02:08 Weldon S Godfrey 3 wrote:
> If memory serves me right, sometime around Tomorrow, Ivan Voras told me: > > > Weldon S Godfrey 3 wrote: > > > >> I don't know how to troubleshoot this further on the server since I am not > >> getting any problems indicated in logging, panics, cores, etc. > > > > If you have console access to the system, the generic advice would be to > > compile a kernel with the kernel debugger - options KDB and DDB (see > > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-online-ddb.html), > > enter the debugger, force a kernel dump file to be created (by entering "call > > doadump") and then proceed with post-mortem examination of the kernel at your > > leisure (e.g. from a remote ssh console, etc). > > > > See > > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html > > for instructions on what information to collect. > > > > If you can provoke your problem with using WITNESS that would probably be > > great, but it will slow down your production machine noticeably. When WITNESS > > is enabled you might also get more information - such as LOR warnings, which > > you should also collect. > > > > Keep the dump file, someone might ask you for more information. > > > > Thanks, I will work on trying to get a system with those enabled. > > Another thought that came to mind that this sounds like some sort of > network buffer exhaustion. Is there anything to look for there? Are you perhaps using em(4)? There was an mbuf leak in the driver, which was fixed recently. You can check mbuf usage with netstat -m. -- Pieter de Goeje _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around Yesterday, Gavin Atkinson told me: Gavin, thank you A LOT for helping us with this, I have answered as much as I can from the most recent crash below. We did hit max mbufs. It is at 25Kclusters, which is the default. I have upped it to 32K because a rather old article mentioned that as the top end and I need to get into work so I am not trying to do this with a remote console to go higher. I have already set it to reboot next with 64K clusters. I already have kmem maxed to what is bootable (or at least at one time) in 8.0, 4GB, how high can I safely go? This is a NFS server running ZFS with sustained 5 min averages of 120-200Mb/s running as a store for a mail system. > Some things that would be useful: > > - Does "arp -da" fix things? no, it hangs like ssh, route add, etc > - What's the output of "netstat -m" while the networking is broken? Tue Nov 3 07:02:11 CST 2009 36971/2033/39004 mbufs in use (current/cache/total) 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max) 24314/731 mbuf+clusters out of packet secondary zone in use (current/cache) 0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 58980K/2110K/61091K bytes allocated to network (current/cache/total) 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines > - What does CTRL-T show for the hung SSH or route processes? of the arp: load: 0.01 cmd: arp 6144 [zonelimit] 0.00u 0.00s 0% 996k > - What does "procstat -kk" on the same processes show? sorry I couldn't get this to run this time, remote console issues > - Does going to single user mode ("init 1" and killing off any leftover > processes) cause the machine to start working again? If so, what's the > output of "netstat -m" afterwards? no, mbuf was still maxed out below is the last vmstat -m Type InUse MemUse HighUse Requests Size(s) ntfs_nthash 1 512K - 1 pfs_nodes 20 5K - 20 256 GEOM 262 52K - 4551 16,32,64,128,256,512,1024,2048 isadev 9 2K - 9 128 cdev 13 4K - 13 256 sigio 1 1K - 1 64 filedesc 127 64K - 6412 512,1024 kenv 75 11K - 80 16,32,64,128 kqueue 0 0K - 188 256,2048 proc-args 41 2K - 5647 16,32,64,128 scsi_cd 0 0K - 333 16 ithread 119 21K - 119 32,128,256 acpica 888 78K - 121045 16,32,64,128,256,512,1024 KTRACE 100 13K - 100 128 acpitask 0 0K - 1 64 linker 139 596K - 181 16,32,64,128,256,512,1024,2048 lockf 11 2K - 399 64,128 CAM dev queue 4 1K - 4 128 ip6ndp 5 1K - 5 64,128 temp 48 562K - 14544952 16,32,64,128,256,512,1024,2048,4096 devbuf 17105 36341K - 24988 16,32,64,128,512,1024,2048,4096 module 420 53K - 420 128 mtx_pool 1 8K - 1 osd 2 1K - 2 16 CAM queue 62 52K - 2211 16,32,64,128,256,512,1024,2048 subproc 562 722K - 6851 512,4096 proc 2 16K - 2 session 33 5K - 127 128 pgrp 37 5K - 190 128 cred 62 16K - 29192756 256 uidinfo 4 3K - 99 64,2048 plimit 17 5K - 910 256 acpisem 15 1K - 15 64 sysctltmp 0 0K - 13867 16,32,64,128,256,512,1024,2048,4096 sysctloid 5400 270K - 5782 16,32,64,128 sysctl 0 0K - 11423 16,32,64 callout 7 3584K - 7 umtx 780 98K - 780 128 p1003.1b 1 1K - 1 16 SWAP 2 3281K - 2 64 kbdmux 8 9K - 8 16,256,512,2048,4096 bus-sc 103 188K - 4558 16,32,64,128,256,512,1024,2048,4096 bus 1174 93K - 57792 16,32,64,128,256,512,1024 clist 54 7K - 54 128 devstat 32 65K - 32 32,4096 eventhandler 64 6K - 64 64,128 kobj 276 1104K - 387 4096 rman 144 18K - 601 16,32,128 mfibuf 3 21K - 12 32,256,512,2048,4096 sbuf 0 0K - 14350 16,32,64,128,256,512,1024,2048,4096 scsi_da 0 0K - 504 16 CAM SIM 4 1K - 4 256 stack 0 0K - 194 256 taskqueue 13 2K - 13 16,32,128 Unitno 11 1K - 4759 32,64 iov 0 0K - 1193 16,64,256,512 select 98 13K - 98 128 ioctlops 0 0K - 14716 16,32,64,128,256,512,1024,4096 msg 4 30K - 4 2048,4096 sem 4 8K - 4 512,1024,2048,4096 shm 1 16K - 1 tty 25 25K - 25 1024 pts 3 1K - 3 256 mbuf_tag 0 0K - 2 32 shmfd 1 8K - 1 CAM periph 54 14K - 371 16,32,64,128,256 pcb 28 157K - 148 16,32,128,1024,2048,4096 soname 5 1K - 18699 16,32,128 biobuf 4 8K - 6 2048 vfscache 1 1024K - 1 cl_savebuf 0 0K - 7 64,128 export_host 5 3K - 5 512 vfs_hash 1 512K - 1 vnodes 2 1K - 2 256 vnodemarker 0 0K - 4832 512 mount 222 15K - 807 16,32,64,128,256,1024 ata_generic 1 1K - 1 1024 BPF 4 1K - 4 128 ether_multi 22 2K - 24 16,32,64 ifaddr 54 14K - 54 32,64,128,256,512,4096 ifnet 5 9K - 5 256,2048 clone 5 20K - 5 4096 arpcom 3 1K - 3 16 routetbl 65 11K - 949 32,64,128,256,512 in_multi 3 1K - 3 64 sctp_iter 0 0K - 3 256 sctp_ifn 3 1K - 3 128 sctp_ifa 4 1K - 4 128 sctp_vrf 1 1K - 1 64 sctp_a_it 0 0K - 3 16 hostcache 1 28K - 1 acd_driver 1 2K - 1 2048 syncache 1 92K - 1 in6_multi 19 2K - 19 32,64,128 ip6_moptions 1 1K - 1 32 NFS FHA 13 3K - 18480347 64,2048 rpc 1381 716K - 82214178 32,64,128,256,512,2048 audit_evclass 168 6K - 205 32 newblk 1 1K - 1 512 inodedep 1 512K - 1 pagedep 1 128K - 1 ufs_dirhash 45 9K - 45 16,32,64,128,512 ufs_mount 3 11K - 3 512,2048 UMAHash 3 130K - 12 512,1024,2048,4096 acpidev 56 4K - 56 64 vm_pgdata 2 129K - 2 128 CAM XPT 589 369K - 2047 32,64,128,256,1024 io_apic 2 4K - 2 2048 pci_link 16 2K - 16 32,128 memdesc 1 4K - 1 4096 msi 3 1K - 3 128 nexusdev 3 1K - 3 16 entropy 1024 64K - 1024 64 twa_commands 2 104K - 101 256 atkbddev 2 1K - 2 64 UART 6 4K - 6 16,512,1024 USBHC 1 1K - 1 128 USBdev 30 11K - 30 16,32,64,128,256,512 USB 157 54K - 190 16,32,64,128,256,1024 DEVFS1 152 76K - 153 512 DEVFS3 165 42K - 167 256 DEVFS 16 1K - 17 16,128 solaris 822038 707024K - 235790398 16,32,64,128,256,512,1024,2048,4096 kstat_data 2 1K - 2 64 _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 9:37am, Pieter de Goeje told me: > Are you perhaps using em(4)? There was an mbuf leak in the driver, which was fixed recently. > You can check mbuf usage with netstat -m. > we are using onboard NICs on the Dell using the bce driver. We did try several times to see if using an intel PCIexpress card using the em driver, and we had the same symptoms. Could the bce driver have the same leak? Thanks! Weldon _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?2009/11/3 Weldon S Godfrey 3 <weldon@...>:
> > > If memory serves me right, sometime around 9:37am, Pieter de Goeje told me: > >> Are you perhaps using em(4)? There was an mbuf leak in the driver, which >> was fixed recently. >> You can check mbuf usage with netstat -m. >> > > we are using onboard NICs on the Dell using the bce driver. Â We did try > several times to see if using an intel PCIexpress card using the em driver, > and we had the same symptoms. > > Could the bce driver have the same leak? It would be unlikely to pass unnoticed since Dells are common hardware... _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?On Tue, Nov 3, 2009 at 7:32 AM, Weldon S Godfrey 3 <weldon@...
> wrote: > > > If memory serves me right, sometime around Yesterday, Gavin Atkinson told > me: > > Gavin, thank you A LOT for helping us with this, I have answered as much as > I can from the most recent crash below. We did hit max mbufs. It is at > 25Kclusters, which is the default. I have upped it to 32K because a rather > old article mentioned that as the top end and I need to get into work so I > am not trying to do this with a remote console to go higher. I have already > set it to reboot next with 64K clusters. I already have kmem maxed to what > is bootable (or at least at one time) in 8.0, 4GB, how high can I safely go? > This is a NFS server running ZFS with sustained 5 min averages of > 120-200Mb/s running as a store for a mail system. > > > Some things that would be useful: >> >> - Does "arp -da" fix things? >> > > no, it hangs like ssh, route add, etc > > > - What's the output of "netstat -m" while the networking is broken? >> > Tue Nov 3 07:02:11 CST 2009 > 36971/2033/39004 mbufs in use (current/cache/total) > 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max) > 24314/731 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/35/35/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 58980K/2110K/61091K bytes allocated to network (current/cache/total) > 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > > > - What does CTRL-T show for the hung SSH or route processes? >> > > of the arp: > load: 0.01 cmd: arp 6144 [zonelimit] 0.00u 0.00s 0% 996k > > > - What does "procstat -kk" on the same processes show? >> > sorry I couldn't get this to run this time, remote console issues > > > - Does going to single user mode ("init 1" and killing off any leftover >> processes) cause the machine to start working again? If so, what's the >> output of "netstat -m" afterwards? >> > > no, mbuf was still maxed out > > > below is the last vmstat -m Type InUse MemUse HighUse Requests > Size(s) > ntfs_nthash 1 512K - 1 > pfs_nodes 20 5K - 20 256 > GEOM 262 52K - 4551 16,32,64,128,256,512,1024,2048 > isadev 9 2K - 9 128 > cdev 13 4K - 13 256 > sigio 1 1K - 1 64 > filedesc 127 64K - 6412 512,1024 > kenv 75 11K - 80 16,32,64,128 > kqueue 0 0K - 188 256,2048 > proc-args 41 2K - 5647 16,32,64,128 > scsi_cd 0 0K - 333 16 > ithread 119 21K - 119 32,128,256 > acpica 888 78K - 121045 16,32,64,128,256,512,1024 > KTRACE 100 13K - 100 128 > acpitask 0 0K - 1 64 > linker 139 596K - 181 16,32,64,128,256,512,1024,2048 > lockf 11 2K - 399 64,128 > CAM dev queue 4 1K - 4 128 > ip6ndp 5 1K - 5 64,128 > temp 48 562K - 14544952 > 16,32,64,128,256,512,1024,2048,4096 > devbuf 17105 36341K - 24988 16,32,64,128,512,1024,2048,4096 > module 420 53K - 420 128 > mtx_pool 1 8K - 1 > osd 2 1K - 2 16 > CAM queue 62 52K - 2211 16,32,64,128,256,512,1024,2048 > subproc 562 722K - 6851 512,4096 > proc 2 16K - 2 > session 33 5K - 127 128 > pgrp 37 5K - 190 128 > cred 62 16K - 29192756 256 > uidinfo 4 3K - 99 64,2048 > plimit 17 5K - 910 256 > acpisem 15 1K - 15 64 > sysctltmp 0 0K - 13867 > 16,32,64,128,256,512,1024,2048,4096 > sysctloid 5400 270K - 5782 16,32,64,128 > sysctl 0 0K - 11423 16,32,64 > callout 7 3584K - 7 > umtx 780 98K - 780 128 > p1003.1b 1 1K - 1 16 > SWAP 2 3281K - 2 64 > kbdmux 8 9K - 8 16,256,512,2048,4096 > bus-sc 103 188K - 4558 > 16,32,64,128,256,512,1024,2048,4096 > bus 1174 93K - 57792 16,32,64,128,256,512,1024 > clist 54 7K - 54 128 > devstat 32 65K - 32 32,4096 > eventhandler 64 6K - 64 64,128 > kobj 276 1104K - 387 4096 > rman 144 18K - 601 16,32,128 > mfibuf 3 21K - 12 32,256,512,2048,4096 > sbuf 0 0K - 14350 > 16,32,64,128,256,512,1024,2048,4096 > scsi_da 0 0K - 504 16 > CAM SIM 4 1K - 4 256 > stack 0 0K - 194 256 > taskqueue 13 2K - 13 16,32,128 > Unitno 11 1K - 4759 32,64 > iov 0 0K - 1193 16,64,256,512 > select 98 13K - 98 128 > ioctlops 0 0K - 14716 16,32,64,128,256,512,1024,4096 > msg 4 30K - 4 2048,4096 > sem 4 8K - 4 512,1024,2048,4096 > shm 1 16K - 1 > tty 25 25K - 25 1024 > pts 3 1K - 3 256 > mbuf_tag 0 0K - 2 32 > shmfd 1 8K - 1 > CAM periph 54 14K - 371 16,32,64,128,256 > pcb 28 157K - 148 16,32,128,1024,2048,4096 > soname 5 1K - 18699 16,32,128 > biobuf 4 8K - 6 2048 > vfscache 1 1024K - 1 > cl_savebuf 0 0K - 7 64,128 > export_host 5 3K - 5 512 > vfs_hash 1 512K - 1 > vnodes 2 1K - 2 256 > vnodemarker 0 0K - 4832 512 > mount 222 15K - 807 16,32,64,128,256,1024 > ata_generic 1 1K - 1 1024 > BPF 4 1K - 4 128 > ether_multi 22 2K - 24 16,32,64 > ifaddr 54 14K - 54 32,64,128,256,512,4096 > ifnet 5 9K - 5 256,2048 > clone 5 20K - 5 4096 > arpcom 3 1K - 3 16 > routetbl 65 11K - 949 32,64,128,256,512 > in_multi 3 1K - 3 64 > sctp_iter 0 0K - 3 256 > sctp_ifn 3 1K - 3 128 > sctp_ifa 4 1K - 4 128 > sctp_vrf 1 1K - 1 64 > sctp_a_it 0 0K - 3 16 > hostcache 1 28K - 1 > acd_driver 1 2K - 1 2048 > syncache 1 92K - 1 > in6_multi 19 2K - 19 32,64,128 > ip6_moptions 1 1K - 1 32 > NFS FHA 13 3K - 18480347 64,2048 > rpc 1381 716K - 82214178 32,64,128,256,512,2048 > audit_evclass 168 6K - 205 32 > newblk 1 1K - 1 512 > inodedep 1 512K - 1 > pagedep 1 128K - 1 > ufs_dirhash 45 9K - 45 16,32,64,128,512 > ufs_mount 3 11K - 3 512,2048 > UMAHash 3 130K - 12 512,1024,2048,4096 > acpidev 56 4K - 56 64 > vm_pgdata 2 129K - 2 128 > CAM XPT 589 369K - 2047 32,64,128,256,1024 > io_apic 2 4K - 2 2048 > pci_link 16 2K - 16 32,128 > memdesc 1 4K - 1 4096 > msi 3 1K - 3 128 > nexusdev 3 1K - 3 16 > entropy 1024 64K - 1024 64 > twa_commands 2 104K - 101 256 > atkbddev 2 1K - 2 64 > UART 6 4K - 6 16,512,1024 > USBHC 1 1K - 1 128 > USBdev 30 11K - 30 16,32,64,128,256,512 > USB 157 54K - 190 16,32,64,128,256,1024 > DEVFS1 152 76K - 153 512 > DEVFS3 165 42K - 167 256 > DEVFS 16 1K - 17 16,128 > solaris 822038 707024K - 235790398 > 16,32,64,128,256,512,1024,2048,4096 > kstat_data 2 1K - 2 64 > > > _______________________________________________ > freebsd-current@... mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." > kern.ipc.nmbclusters may be adjusted to increase the number of network mbufs the system is willing to allocate. Each cluster represents approx- imately 2K of memory, so a value of 1024 represents 2M of kernel memory reserved for network buffers. You can do a simple calculation to figure out how many you need. If you have a web server which maxes out at 1000 simultaneous connections, and each connection eats a 16K receive and 16K send buffer, you need approximately 32MB worth of network buffers to deal with it. A good rule of thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. So for this case you would want to set kern.ipc.nmbclusters to 32768. We recommend values between 1024 and 4096 for machines with mod- erates amount of memory, and between 4096 and 32768 for machines with greater amounts of memory. Under no circumstances should you specify an arbitrarily high value for this parameter, it could lead to a boot-time crash. The -m option to netstat(1) may be used to observe network clus- ter use. Older versions of FreeBSD do not have this tunable and require that the kernel config(8) option NMBCLUSTERS be set instead. More and more programs are using the sendfile(2) system call to transmit files over the network. The kern.ipc.nsfbufs sysctl controls the number of file system buffers sendfile(2) is allowed to use to perform its work. This parameter nominally scales with kern.maxusers so you should not need to modify this parameter except under extreme circumstances. See the TUNING section in the sendfile(2) manual page for details. -- Adam Vande More _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?On Tue, 2009-11-03 at 08:32 -0500, Weldon S Godfrey 3 wrote:
> > If memory serves me right, sometime around Yesterday, Gavin Atkinson told me: > > Gavin, thank you A LOT for helping us with this, I have answered as much > as I can from the most recent crash below. We did hit max mbufs. It is > at 25Kclusters, which is the default. I have upped it to 32K because a > rather old article mentioned that as the top end and I need to get into > work so I am not trying to do this with a remote console to go higher. I > have already set it to reboot next with 64K clusters. I already have kmem > maxed to what is bootable (or at least at one time) in 8.0, 4GB, how high > can I safely go? This is a NFS server running ZFS with sustained 5 min > averages of 120-200Mb/s running as a store for a mail system. > > > Some things that would be useful: > > > > - Does "arp -da" fix things? > > no, it hangs like ssh, route add, etc > > > - What's the output of "netstat -m" while the networking is broken? > Tue Nov 3 07:02:11 CST 2009 > 36971/2033/39004 mbufs in use (current/cache/total) > 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max) > 24314/731 mbuf+clusters out of packet secondary zone in use > (current/cache) > 0/35/35/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 58980K/2110K/61091K bytes allocated to network (current/cache/total) > 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines OK, at least we've figured out what is going wrong then. As a workaround to get the machine to stay up longer, you should be able to set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we can resolve this soon. Firstly, what kernel was the above output from? And what network card are you using? In your initial post you mentioned testing both bce(4) and em(4) cards, be aware that em(4) had an issue that would cause exactly this issue, which was fixed with a commit on September 11th (r197093). Make sure your kernel is from after that date if you are using em(4). I guess it is also possible that bce(4) has the same issue, I'm not aware of any fixes to it recently. So, from here, I think the best thing would be to just use the em(4) NIC and an up-to-date kernel, and see if you can reproduce the issue. How important is this machine? If em(4) works, are you able to help debug the issues with the bce(4) driver? Thanks, Gavin _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 3:13pm, Gavin Atkinson told me: > OK, at least we've figured out what is going wrong then. As a > workaround to get the machine to stay up longer, you should be able to > set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we > can resolve this soon. > > Firstly, what kernel was the above output from? And what network card > are you using? In your initial post you mentioned testing both bce(4) > and em(4) cards, be aware that em(4) had an issue that would cause > exactly this issue, which was fixed with a commit on September 11th > (r197093). Make sure your kernel is from after that date if you are > using em(4). I guess it is also possible that bce(4) has the same > issue, I'm not aware of any fixes to it recently. > > So, from here, I think the best thing would be to just use the em(4) NIC > and an up-to-date kernel, and see if you can reproduce the issue. > > How important is this machine? If em(4) works, are you able to help > debug the issues with the bce(4) driver? > > Thanks, > > Gavin > we used the em card only a few times, but each time we used it, the problem happened so we have been staying with the on board nics using the bce driver. Would leaving in the em card cause any issues, even if it isn't up? This output was from a kernel on 12/08. The issue really came up while we tried to swap to 8.0-RC2. We plan to swap back sometime in the near future. The same symptoms happened with RC2 so I am sure it is a kmem exhaustion. I am guessing v3 requires a lot more. When we switch, i'll change to using the em card. This machine is very important. I could set up an additional machine, but I don't have the ability to simulate the load nor have the large drive array attached. Thanks! Weldon _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?Weldon S Godfrey 3 wrote:
> > > If memory serves me right, sometime around 9:37am, Pieter de Goeje told me: > >> Are you perhaps using em(4)? There was an mbuf leak in the driver, >> which was fixed recently. >> You can check mbuf usage with netstat -m. >> > > we are using onboard NICs on the Dell using the bce driver. We did try > several times to see if using an intel PCIexpress card using the em > driver, and we had the same symptoms. > > Could the bce driver have the same leak? The bce driver does not have a memory leak, it does however have a bug which causes memory fragmentation leading to denied mbuf allocation. There is a work around for this in current, you can get the patch like this: http://svn.freebsd.org/viewvc/base/head/ You need to put options BCE_JUMBO_HDRSPLIT In your kernel to enable the work arround. Tom _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?Tom Judge wrote:
> Weldon S Godfrey 3 wrote: >> >> >> If memory serves me right, sometime around 9:37am, Pieter de Goeje >> told me: >> >>> Are you perhaps using em(4)? There was an mbuf leak in the driver, >>> which was fixed recently. >>> You can check mbuf usage with netstat -m. >>> >> >> we are using onboard NICs on the Dell using the bce driver. We did >> try several times to see if using an intel PCIexpress card using the >> em driver, and we had the same symptoms. >> >> Could the bce driver have the same leak? > > The bce driver does not have a memory leak, it does however have a bug > which causes memory fragmentation leading to denied mbuf allocation. > > > There is a work around for this in current, you can get the patch like > this: > > http://svn.freebsd.org/viewvc/base/head/ > svn diff -r 198319:198320 http://svn.freebsd.org/base/head > You need to put > > options BCE_JUMBO_HDRSPLIT > > In your kernel to enable the work arround. > > Tom > > _______________________________________________ > freebsd-current@... mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?Something else just occured to me - do you use ipfw?
-- Sent from my p1i mobile phone ------- Original message ------- > From: Weldon S Godfrey 3 <weldon@...> > Cc: freebsd-current@..., ivoras@... > Sent: 3.11.'09, 14:35 > > > > If memory serves me right, sometime around 9:37am, Pieter de Goeje told > me: > >> Are you perhaps using em(4)? There was an mbuf leak in the driver, which >> was fixed recently. >> You can check mbuf usage with netstat -m. >> > > we are using onboard NICs on the Dell using the bce driver. We did try > several times to see if using an intel PCIexpress card using the em > driver, and we had the same symptoms. > > Could the bce driver have the same leak? > > Thanks! > > Weldon _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?On Tue, 2009-11-03 at 10:43 -0500, Weldon S Godfrey 3 wrote:
> This output was from a kernel on 12/08. The issue really came up while we > tried to swap to 8.0-RC2. We plan to swap back sometime in the near > future. The same symptoms happened with RC2 so I am sure it is a kmem > exhaustion. I am guessing v3 requires a lot more. When we switch, i'll > change to using the em card. Sorry, can you clarify: have you ever tested the em card with the 8.0-RC2 kernel? > This machine is very important. I could set up an additional machine, but > I don't have the ability to simulate the load nor have the large drive > array attached. OK, thanks. Gavin _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 5:59pm, Ivan Voras told me: > Something else just occured to me - do you use ipfw? > > -- not on this server. Thanks, Weldon _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 5:02pm, Gavin Atkinson told me: > On Tue, 2009-11-03 at 10:43 -0500, Weldon S Godfrey 3 wrote: >> This output was from a kernel on 12/08. The issue really came up while we >> tried to swap to 8.0-RC2. We plan to swap back sometime in the near >> future. The same symptoms happened with RC2 so I am sure it is a kmem >> exhaustion. I am guessing v3 requires a lot more. When we switch, i'll >> change to using the em card. > > Sorry, can you clarify: have you ever tested the em card with the > 8.0-RC2 kernel? > We briefly tried em card with RC2 but went back because it didn't help at the time. But we are planning to go back to RC2 soon and I also plan to use the em card instead. _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: FreeBSD 8.0 - network stack crashes?If memory serves me right, sometime around 10:43am, Weldon S Godfrey 3 told me: > > > If memory serves me right, sometime around 3:13pm, Gavin Atkinson told me: > >> OK, at least we've figured out what is going wrong then. As a >> workaround to get the machine to stay up longer, you should be able to >> set kern.ipc.nmbclusters=256000 in /boot/loader.conf -but hopefully we >> can resolve this soon. >> I upped it to 256K. What I am trying to wrap my head around is how it was working somewhat for so long at 24K, but it got to near 65K before I rebooted it with the higher setting. Or did I reboot too early? Is there any cleanup that isn't triggered intil it reaches max nmbclusters? I am trying to see if anything on our network has changed to cause this to become cronic. _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |