|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)We have servers with dual 82573 NICs that work well during low-throughput activity, but during high-volume activity, they pause shortly after transfers start and do not recover. Other sessions to the system are not affected.
These systems are being repurposed, jumping from 6.3 to 7.2. The same system and its kin do not exhibit the symptom under 6.3-RELEASE-p13. The symptoms appear under freebsd-updated 7.2-RELEASE GENERIC kernel with no tuning. Previously, we've been using DCGDIS.EXE (from Jack Vogel) for this symptom. The first system to be repurposed accepts DCGDIS with 'Updated' and subsequent 'update not needed', with no relief. Notably, there are no watchdog timeout errors - unlike our various Supermicro models still running FreeBSD 6.x. All of our other 7.x Supermicro flavors had already received the flash update and haven't show the symptom. Details follow. Kernel: rand# uname -a FreeBSD rand.acsalaska.net 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri Oct 2 12:21:39 UTC 2009 root@...:/usr/obj/usr/src/sys/GENERIC i386 sysctls: rand# sysctl dev.em dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c class=0x020000 dev.em.0.%parent: pci13 dev.em.0.debug: -1 dev.em.0.stats: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 dev.em.1.%driver: em dev.em.1.%location: slot=0 function=0 dev.em.1.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c class=0x020000 dev.em.1.%parent: pci14 dev.em.1.debug: -1 dev.em.1.stats: -1 dev.em.1.rx_int_delay: 0 dev.em.1.tx_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_processing_limit: 100 kenv: rand# kenv | grep smbios | egrep -v 'socket|serial|uuid|tag|0123456789' smbios.bios.reldate="03/05/2008" smbios.bios.vendor="Phoenix Technologies LTD" smbios.bios.version="6.00" smbios.chassis.maker="Supermicro" smbios.planar.maker="Supermicro" smbios.planar.product="PDSMi " smbios.planar.version="PCB Version" smbios.system.maker="Supermicro" smbios.system.product="PDSMi" The system is not yet production, so I can invasively abuse it if needed. The other systems are in production under 6.3-RELEASE-p13 and can also be inspected. Any pointers appreciated. Royce _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)On Thu, Nov 12, 2009 at 10:36:16AM -0900, Royce Williams wrote:
> We have servers with dual 82573 NICs that work well during low-throughput activity, but during high-volume activity, they pause shortly after transfers start and do not recover. Other sessions to the system are not affected. Please define "low-throughput" and "high-volume" if you could; it might help folks determine where the threshold is for problems. > These systems are being repurposed, jumping from 6.3 to 7.2. The same system and its kin do not exhibit the symptom under 6.3-RELEASE-p13. The symptoms appear under freebsd-updated 7.2-RELEASE GENERIC kernel with no tuning. > > Previously, we've been using DCGDIS.EXE (from Jack Vogel) for this symptom. The first system to be repurposed accepts DCGDIS with 'Updated' and subsequent 'update not needed', with no relief. > > Notably, there are no watchdog timeout errors - unlike our various Supermicro models still running FreeBSD 6.x. All of our other 7.x Supermicro flavors had already received the flash update and haven't show the symptom. > > Details follow. > > Kernel: > > rand# uname -a > FreeBSD rand.acsalaska.net 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri Oct 2 12:21:39 UTC 2009 root@...:/usr/obj/usr/src/sys/GENERIC i386 > > sysctls: > > rand# sysctl dev.em > dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 > dev.em.0.%driver: em > dev.em.0.%location: slot=0 function=0 > dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c class=0x020000 > dev.em.0.%parent: pci13 > dev.em.0.debug: -1 > dev.em.0.stats: -1 > dev.em.0.rx_int_delay: 0 > dev.em.0.tx_int_delay: 66 > dev.em.0.rx_abs_int_delay: 66 > dev.em.0.tx_abs_int_delay: 66 > dev.em.0.rx_processing_limit: 100 > dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 > dev.em.1.%driver: em > dev.em.1.%location: slot=0 function=0 > dev.em.1.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c class=0x020000 > dev.em.1.%parent: pci14 > dev.em.1.debug: -1 > dev.em.1.stats: -1 > dev.em.1.rx_int_delay: 0 > dev.em.1.tx_int_delay: 66 > dev.em.1.rx_abs_int_delay: 66 > dev.em.1.tx_abs_int_delay: 66 > dev.em.1.rx_processing_limit: 100 > > kenv: > > rand# kenv | grep smbios | egrep -v 'socket|serial|uuid|tag|0123456789' > smbios.bios.reldate="03/05/2008" > smbios.bios.vendor="Phoenix Technologies LTD" > smbios.bios.version="6.00" > smbios.chassis.maker="Supermicro" > smbios.planar.maker="Supermicro" > smbios.planar.product="PDSMi " > smbios.planar.version="PCB Version" > smbios.system.maker="Supermicro" > smbios.system.product="PDSMi" > > > The system is not yet production, so I can invasively abuse it if needed. The other systems are in production under 6.3-RELEASE-p13 and can also be inspected. > > Any pointers appreciated. > > Royce For what it's worth as a comparison base: We use the following Supermicro SuperServers, and can confirm that no such issues occur for us using RELENG_6 nor RELENG_7 on the following hardware: Supermicro SuperServer 5015B-MTB - amd64 - Intel 82573V + Intel 82573L Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L Supermicro SuperServer 5015M-T+B - i386 - Intel 82573V + Intel 82573L Supermicro SuperServer 5015M-T+B - i386 - Intel 82573V + Intel 82573L The 5015B-MTB system presently runs RELENG_8 -- no issues there either. Relevant server configuration and network setup details: - All machines use pf(4). - All emX devices are configured for autoneg. - All emX devices use RXCSUM, TXCSUM, and TSO4. - We do not use polling. - All machines use both NICs simultaneously at all times. - All machines connected to an HP ProCurve 2626 switch (100mbit, full-duplex ports, all autoneg). - We do not use Jumbo frames. - No add-in cards (PCI, PCI-X, nor PCIe) are used in the systems. - All of the systems had DCGDIS.EXE run on them; no EEPROM settings were changed, indicating the from-the-Intel-factory MANC register in question was set properly. Relevant throughput details per box: - em0 pushes ~600-1000kbit/sec at all times. - em1 pushes ~100-200kbit/sec at all times. - During nightly maintenance (backups), em1 pushes ~2-3mbit/sec for a variable amount of time. - For a full level 0 backup (which I've done numerous times), em1 pushes 60-70mbit/sec without issues. I've compared your sysctl dev.em output to that of our 5015M-T+B systems (which use the PDSMi+, not the PDSMi, but whatever), and ours is 100% identical. All of our 5015M-T+B systems are using BIOS 1.3, and the 5015B-MTB system is using BIOS 1.30. If you'd like, I can provide the exact BIOS settings we use on the machines in question; they do deviate from the factory defaults a slight bit, but none of the adjustments are "tweaks" for performance or otherwise (just disabling things which we don't use, etc.). -- | Jeremy Chadwick jdc@... | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)It is critically important on these systems that you get the latest BIOS on
them, so maybe that's the difference between you two. I am going to be putting out a new em driver to CURRENT soon, it might be an option to try that as well, it sounds like a hang, management/os race in the driver is a possibility. Jack On Thu, Nov 12, 2009 at 12:47 PM, Jeremy Chadwick <freebsd@...>wrote: > On Thu, Nov 12, 2009 at 10:36:16AM -0900, Royce Williams wrote: > > We have servers with dual 82573 NICs that work well during low-throughput > activity, but during high-volume activity, they pause shortly after > transfers start and do not recover. Other sessions to the system are not > affected. > > Please define "low-throughput" and "high-volume" if you could; it might > help folks determine where the threshold is for problems. > > > These systems are being repurposed, jumping from 6.3 to 7.2. The same > system and its kin do not exhibit the symptom under 6.3-RELEASE-p13. The > symptoms appear under freebsd-updated 7.2-RELEASE GENERIC kernel with no > tuning. > > > > Previously, we've been using DCGDIS.EXE (from Jack Vogel) for this > symptom. The first system to be repurposed accepts DCGDIS with 'Updated' > and subsequent 'update not needed', with no relief. > > > > Notably, there are no watchdog timeout errors - unlike our various > Supermicro models still running FreeBSD 6.x. All of our other 7.x > Supermicro flavors had already received the flash update and haven't show > the symptom. > > > > Details follow. > > > > Kernel: > > > > rand# uname -a > > FreeBSD rand.acsalaska.net 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri > Oct 2 12:21:39 UTC 2009 root@...:/usr/obj/usr/src/sys/GENERIC > i386 > > > > sysctls: > > > > rand# sysctl dev.em > > dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 > > dev.em.0.%driver: em > > dev.em.0.%location: slot=0 function=0 > > dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 > subdevice=0x108c class=0x020000 > > dev.em.0.%parent: pci13 > > dev.em.0.debug: -1 > > dev.em.0.stats: -1 > > dev.em.0.rx_int_delay: 0 > > dev.em.0.tx_int_delay: 66 > > dev.em.0.rx_abs_int_delay: 66 > > dev.em.0.tx_abs_int_delay: 66 > > dev.em.0.rx_processing_limit: 100 > > dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 > > dev.em.1.%driver: em > > dev.em.1.%location: slot=0 function=0 > > dev.em.1.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 > subdevice=0x108c class=0x020000 > > dev.em.1.%parent: pci14 > > dev.em.1.debug: -1 > > dev.em.1.stats: -1 > > dev.em.1.rx_int_delay: 0 > > dev.em.1.tx_int_delay: 66 > > dev.em.1.rx_abs_int_delay: 66 > > dev.em.1.tx_abs_int_delay: 66 > > dev.em.1.rx_processing_limit: 100 > > > > kenv: > > > > rand# kenv | grep smbios | egrep -v 'socket|serial|uuid|tag|0123456789' > > smbios.bios.reldate="03/05/2008" > > smbios.bios.vendor="Phoenix Technologies LTD" > > smbios.bios.version="6.00" > > smbios.chassis.maker="Supermicro" > > smbios.planar.maker="Supermicro" > > smbios.planar.product="PDSMi " > > smbios.planar.version="PCB Version" > > smbios.system.maker="Supermicro" > > smbios.system.product="PDSMi" > > > > > > The system is not yet production, so I can invasively abuse it if needed. > The other systems are in production under 6.3-RELEASE-p13 and can also be > inspected. > > > > Any pointers appreciated. > > > > Royce > > For what it's worth as a comparison base: > > We use the following Supermicro SuperServers, and can confirm that no > such issues occur for us using RELENG_6 nor RELENG_7 on the following > hardware: > > Supermicro SuperServer 5015B-MTB - amd64 - Intel 82573V + Intel 82573L > Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L > Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L > Supermicro SuperServer 5015M-T+B - i386 - Intel 82573V + Intel 82573L > Supermicro SuperServer 5015M-T+B - i386 - Intel 82573V + Intel 82573L > > The 5015B-MTB system presently runs RELENG_8 -- no issues there either. > > Relevant server configuration and network setup details: > > - All machines use pf(4). > - All emX devices are configured for autoneg. > - All emX devices use RXCSUM, TXCSUM, and TSO4. > - We do not use polling. > - All machines use both NICs simultaneously at all times. > - All machines connected to an HP ProCurve 2626 switch (100mbit, > full-duplex ports, all autoneg). > - We do not use Jumbo frames. > - No add-in cards (PCI, PCI-X, nor PCIe) are used in the systems. > - All of the systems had DCGDIS.EXE run on them; no EEPROM settings > were changed, indicating the from-the-Intel-factory MANC register > in question was set properly. > > Relevant throughput details per box: > > - em0 pushes ~600-1000kbit/sec at all times. > - em1 pushes ~100-200kbit/sec at all times. > - During nightly maintenance (backups), em1 pushes ~2-3mbit/sec > for a variable amount of time. > - For a full level 0 backup (which I've done numerous times), em1 > pushes 60-70mbit/sec without issues. > > I've compared your sysctl dev.em output to that of our 5015M-T+B systems > (which use the PDSMi+, not the PDSMi, but whatever), and ours is 100% > identical. > > All of our 5015M-T+B systems are using BIOS 1.3, and the 5015B-MTB > system is using BIOS 1.30. > > If you'd like, I can provide the exact BIOS settings we use on the > machines in question; they do deviate from the factory defaults a slight > bit, but none of the adjustments are "tweaks" for performance or > otherwise (just disabling things which we don't use, etc.). > > -- > | Jeremy Chadwick jdc@... | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > _______________________________________________ > freebsd-stable@... mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." > freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)On Thu, Nov 12, 2009 at 11:47 AM, Jeremy Chadwick
<freebsd@...> wrote: > Please define "low-throughput" and "high-volume" if you could; it might > help folks determine where the threshold is for problems. My definitions are pretty subjective/operational, but for what it's worth: - "low" is interactive SSH, DNS lookups, and pings; - "high" is a single unthrottled rsync session. >> rand# sysctl dev.em >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6 >> dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c class=0x020000 >> kenv: >> >> rand# kenv | grep smbios | egrep -v 'socket|serial|uuid|tag|0123456789' >> smbios.bios.reldate="03/05/2008" > For what it's worth as a comparison base: > > We use the following Supermicro SuperServers, and can confirm that no > such issues occur for us using RELENG_6 nor RELENG_7 on the following > hardware: [good cross-check list snipped] The problem system is a 5015M-MF. We are running 5015M-MT+ and 5015T-PR on RELENG_6 and 7, both without the symptom. > Relevant server configuration and network setup details: > > - All machines use pf(4). > - All emX devices are configured for autoneg. > - All emX devices use RXCSUM, TXCSUM, and TSO4. > - We do not use polling. > - All machines use both NICs simultaneously at all times. > - All machines connected to an HP ProCurve 2626 switch (100mbit, > full-duplex ports, all autoneg). > - We do not use Jumbo frames. > - No add-in cards (PCI, PCI-X, nor PCIe) are used in the systems. > - All of the systems had DCGDIS.EXE run on them; no EEPROM settings > were changed, indicating the from-the-Intel-factory MANC register > in question was set properly. No firewall is active on the problem system, and none of this back have been DCGDIS-ified, but otherwise, our setup is identical. > I've compared your sysctl dev.em output to that of our 5015M-T+B systems > (which use the PDSMi+, not the PDSMi, but whatever), and ours is 100% > identical. > > All of our 5015M-T+B systems are using BIOS 1.3, and the 5015B-MTB > system is using BIOS 1.30. The repurposed system is at 1.3 (03/05/2008) - flashed prior to install. The production 6.3 systems are using 1.1 (or 1.1A, would have to reboot to check, but the date is 10/27/2005). > If you'd like, I can provide the exact BIOS settings we use on the > machines in question; they do deviate from the factory defaults a slight > bit, but none of the adjustments are "tweaks" for performance or > otherwise (just disabling things which we don't use, etc.). We're running similarly as well. I might be able to retire another system of this batch and install 7.2, but leave the BIOS update off, to see if it makes a difference. Royce _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)On Thu, Nov 12, 2009 at 2:18 PM, Royce Williams
<royce.williams@...> wrote: > On Thu, Nov 12, 2009 at 11:47 AM, Jeremy Chadwick >> - All machines connected to an HP ProCurve 2626 switch (100mbit, >> full-duplex ports, all autoneg). > No firewall is active on the problem system, and none of this back > have been DCGDIS-ified, but otherwise, our setup is identical. Er, s/back/batch/g, and it's not a ProCurve. ;-) But we are also usually full-duplex and autoneg on both sides. Based on new (embarrassing) information, I'll leave it to Jack to decide whether or not he wants to pursue this further. The problem box is sitting in my grotty mini-lab, with a subnet partially serviced by a 10M hub. Guess which Ethernet cable I picked up. Guess what happens when I move the system to a 100M/full connection. As my cow-orker put it, "You and the other four people on Earth using that NIC on 10M hubs" can probably find workarounds. My apologies for the noise, though it's theoretically possible that the root cause might still need addressing. Jack, let me know if you want me to do any testing for you. Or I can always send you my hub. ;-) Royce _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)LOL, glad the problem has been resolved, and no thanks, I do not need
to pursue this any further. I also want to thank Jeremy for his help and data!! Thanks guys and good evening, Jack On Thu, Nov 12, 2009 at 6:56 PM, Royce Williams <royce.williams@...>wrote: > On Thu, Nov 12, 2009 at 2:18 PM, Royce Williams > <royce.williams@...> wrote: > > On Thu, Nov 12, 2009 at 11:47 AM, Jeremy Chadwick > >> - All machines connected to an HP ProCurve 2626 switch (100mbit, > >> full-duplex ports, all autoneg). > > > No firewall is active on the problem system, and none of this back > > have been DCGDIS-ified, but otherwise, our setup is identical. > > Er, s/back/batch/g, and it's not a ProCurve. ;-) But we are also > usually full-duplex and autoneg on both sides. > > Based on new (embarrassing) information, I'll leave it to Jack to > decide whether or not he wants to pursue this further. > > The problem box is sitting in my grotty mini-lab, with a subnet > partially serviced by a 10M hub. Guess which Ethernet cable I picked > up. Guess what happens when I move the system to a 100M/full > connection. > > As my cow-orker put it, "You and the other four people on Earth using > that NIC on 10M hubs" can probably find workarounds. My apologies for > the noise, though it's theoretically possible that the root cause > might still need addressing. > > Jack, let me know if you want me to do any testing for you. Or I can > always send you my hub. ;-) > > Royce > freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
| Free embeddable forum powered by Nabble | Forum Help |