|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
RELENG_7 and HEAD: bge causes system hangGood time of the day.
I still would like to report the real and constantly reproducible problem connected with bge driver on machines with BroadCom Netlink Gigabit Ethernet Controller (BCM5787 chipset). In order to reproduce the problem it is enough try to install RELENG_7 or CURRENT system on a machine with BroadCom Netlink Gigabit Ethernet Controller (with BCM5787 chipset). System hangs as soon as it tries to adjust bge device driver. I've tried to overcome mentioned problem in many different ways. Unfortunately I still didn't manage to get it working. As RELENG_6 handles bge properly on that machine, so I even decided to try "rollback" bge from RELENG_7 to RELENG_6. Actually I don't think it was a good idea, but I couldn't see a way out. Thus I threw out miibus and bge devices from the kernel config and recompiled kernel (in order to obtain freedom to play with bge). Then I obtained the RELENG_6 sources for bge, miibus and dependencies. After that I managed to compile miibus.ko and if_bge.ko (of course I had to overcome several problems but at least I managed to compile it). # cd /usr/src/sys/modules/mii # make cleandepend && make clean # make obj && make depend && make && make install # cd /usr/src/sys/modules/bge # make cleandepend && make clean # make obj && make depend && make && make install After those steps above I got new miibus.ko and if_bge.ko under /boot/kernel/. And I even managed to kldload miibus successfully. But with if_bge an old story took place :(. It completely freezed the system. Of course I doubt that I'm trying right means in order to solve the problem. But it seems that nobody interested even to confirm or refuse an existence of described problem. I've tried to reproduce the problem on two different machines with BCM5787 and the results were the same: bge driver causes system hang on RELENG_7 and CURRENT and works properly on RELENG_6. So I'll appreciate any comments on this. Thanks. # uname -a FreeBSD 7.0-BETA2 FreeBSD 7.0-BETA2 #0: Thu Nov 8 23:58:43 EET 2007 root@:/usr/obj/usr/src/sys/FREE-SMP-ULE-08112007-v1 i386 # pciconf -lv hostb0 at pci0:0:0:0: class=0x060000 card=0x30c0103c chip=0x2a008086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile PM965/GM965/GL960 Express Processor to DRAM Controller' class = bridge subclass = HOST-PCI vgapci0 at pci0:0:2:0: class=0x030000 card=0x30c0103c chip=0x2a028086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile 965 Express Integrated Graphics Controller' class = display subclass = VGA vgapci1 at pci0:0:2:1: class=0x038000 card=0x30c0103c chip=0x2a038086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile 965 Express Integrated Graphics Controller' class = display uhci0 at pci0:0:26:0: class=0x0c0300 card=0x30c0103c chip=0x28348086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB uhci1 at pci0:0:26:1: class=0x0c0300 card=0x30c0103c chip=0x28358086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB ehci0 at pci0:0:26:7: class=0x0c0320 card=0x30c0103c chip=0x283a8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '81EC1043 (?) ICH8 Enhanced USB2 Enhanced Host Controller' class = serial bus subclass = USB none0 at pci0:0:27:0: class=0x040300 card=0x30c0103c chip=0x284b8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H &SUBSYS_81EC1043&REV_02\3&11583659&0&D8' class = multimedia pcib1 at pci0:0:28:0: class=0x060400 card=0x30c0103c chip=0x283f8086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 1' class = bridge subclass = PCI-PCI pcib2 at pci0:0:28:1: class=0x060400 card=0x30c0103c chip=0x28418086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 2' class = bridge subclass = PCI-PCI pcib3 at pci0:0:28:2: class=0x060400 card=0x30c0103c chip=0x28438086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 3' class = bridge subclass = PCI-PCI pcib4 at pci0:0:28:4: class=0x060400 card=0x30c0103c chip=0x28478086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 5' class = bridge subclass = PCI-PCI uhci2 at pci0:0:29:0: class=0x0c0300 card=0x30c0103c chip=0x28308086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB uhci3 at pci0:0:29:1: class=0x0c0300 card=0x30c0103c chip=0x28318086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB uhci4 at pci0:0:29:2: class=0x0c0300 card=0x30c0103c chip=0x28328086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB ehci1 at pci0:0:29:7: class=0x0c0320 card=0x30c0103c chip=0x28368086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB2 EHCI' class = serial bus subclass = USB pcib5 at pci0:0:30:0: class=0x060401 card=0x30c0103c chip=0x24488086 rev=0xf3 hdr=0x01 vendor = 'Intel Corporation' device = '82801BAM/CAM/DBM (ICH2-M/3-M/4-M) Hub Interface to PCI Bridge' class = bridge subclass = PCI-PCI isab0 at pci0:0:31:0: class=0x060100 card=0x30c0103c chip=0x28158086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'ICH8M-E (ICH8 Family) LPC Interface Controller' class = bridge subclass = PCI-ISA atapci0 at pci0:0:31:1: class=0x01018a card=0x30c0103c chip=0x28508086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) Ultra ATA Storage Controllers' class = mass storage subclass = ATA atapci1 at pci0:0:31:2: class=0x010601 card=0x30c0103c chip=0x28298086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801 Intel(R) 82801HEM/HBM SATA AHCI Controller' class = mass storage none1 at pci0:16:0:0: class=0x028000 card=0x135c103c chip=0x42228086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '10418086 Intel 3945ABG Wireless LAN controller' class = network none2 at pci0:24:0:0: class=0x020000 card=0x30c0103c chip=0x169314e4 rev=0x02 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM 5787A Ethernet Controller Broadcom Netlink Gigabit' class = network subclass = ethernet cbb0 at pci0:2:4:0: class=0x060700 card=0x30c0103c chip=0x04761180 rev=0xb6 hdr=0x02 vendor = 'Ricoh Company, Ltd.' device = 'unknown Ricoh R/RL/5C476(II)' class = bridge subclass = PCI-CardBus -- Sincerely, Andrey _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangOn Wed, Nov 14, 2007 at 01:18:34PM +0200, Andrey wrote:
> I still would like to report the real and constantly reproducible problem > connected with bge driver on machines with BroadCom Netlink Gigabit > Ethernet Controller (BCM5787 chipset). > > In order to reproduce the problem it is enough try to install RELENG_7 or > CURRENT system on a machine with BroadCom Netlink Gigabit Ethernet > Controller (with BCM5787 chipset). System hangs as soon as it tries to > adjust bge device driver. > > I've tried to overcome mentioned problem in many different ways. > Unfortunately I still didn't manage to get it working. This is a shot in the dark, but try putting the following in /boot/loader.conf: hw.bge.allow_asf=1 If /boot/loader.conf already has that in it, try a value of 0 instead. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangThanks for willingness to help!
I've tried to play with hw.bge.allow_asf sysctl option many times on many snapshots of 7-th branch (200708, 200709, 200710, 7.0-BETA1.5, 7.0-BETA2). More that each time I tracked changes in CVS assumed to be related with the problem I csup'ed sources and tried again and again. Even changing corresponding code inside of sys/dev/bge/if_bge.c (I mean string that currently looks like "static int bge_allow_asf = 0;") didn't help. I know it helps other people who had similar porblems on their HP machines with other Broadcom's NIC's. But unfortunately such a solution does not fit my case :o(. Jeremy Chadwick wrote: > > This is a shot in the dark, but try putting the following in > /boot/loader.conf: > > hw.bge.allow_asf=1 > > If /boot/loader.conf already has that in it, try a value of 0 instead. > -- Sincerely, Andrey _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangOn Wednesday 14 November 2007 07:49 am, Andrey wrote:
> Thanks for willingness to help! > > I've tried to play with hw.bge.allow_asf sysctl option many times > on many snapshots of 7-th branch (200708, 200709, 200710, > 7.0-BETA1.5, 7.0-BETA2). More that each time I tracked changes in > CVS assumed to be related with the problem I csup'ed sources and > tried again and again. Even changing corresponding code inside of > sys/dev/bge/if_bge.c (I mean string that currently looks like > "static int bge_allow_asf = 0;") didn't help. > > I know it helps other people who had similar porblems on their HP > machines with other Broadcom's NIC's. But unfortunately such a > solution does not fit my case :o(. > > Jeremy Chadwick wrote: > > This is a shot in the dark, but try putting the following in > > /boot/loader.conf: > > > > hw.bge.allow_asf=1 > > > > If /boot/loader.conf already has that in it, try a value of 0 > > instead. Please try: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pci/pci.c.diff?r1=1.355;r2=1.356 Jung-uk Kim _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangGood time of the day, Jung-uk Kim!
Thank you for good news! I've just tried fresh revision of sys/dev/pci/pci.c you pointed me to and it solved the problem for me. Actually, now I can state that there is no problem "bge causes system hang" on RELENG_7 with bge driver (on HP machine with Broadcom Netlink Gigabit Ethernet, chipset BCM5787). There are several excerpts below(I think you may be interested in): ------------ # dmesg ... pcib2: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0 pci24: <ACPI PCI bus> on pcib2 pci0:24:0:0: failed to read VPD data. bge0: <Broadcom BCM5754/5787 A2, ASIC rev. 0xb002> mem 0xe4000000-0xe400ffff irq 18 at device 0.0 on pci24 miibus0: <MII bus> on bge0 brgphy0: <BCM5787 10/100/1000baseTX PHY> PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:1a:4b:66:1b:0e bge0: [ITHREAD] ... ------------ # pciconf -lv ... bge0@pci0:24:0:0: class=0x020000 card=0x30c0103c chip=0x169314e4 rev=0x02 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM 5787A Ethernet Controller Broadcom Netlink Gigabit' class = network subclass = ethernet ... ------------ # ifconfig bge0 bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether 00:1a:4b:66:1b:0e inet 10.10.0.2 netmask 0xffffff00 broadcast 10.10.0.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active ------------ So, great work :o). Thank you. Jung-uk Kim wrote: > > Please try: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pci/pci.c.diff?r1=1.355;r2=1.356 > > Jung-uk Kim -- Sincerely, Andrey _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangJung-uk Kim wrote:
> On Wednesday 14 November 2007 07:49 am, Andrey wrote: >> Thanks for willingness to help! >> >> I've tried to play with hw.bge.allow_asf sysctl option many times >> on many snapshots of 7-th branch (200708, 200709, 200710, >> 7.0-BETA1.5, 7.0-BETA2). More that each time I tracked changes in >> CVS assumed to be related with the problem I csup'ed sources and >> tried again and again. Even changing corresponding code inside of >> sys/dev/bge/if_bge.c (I mean string that currently looks like >> "static int bge_allow_asf = 0;") didn't help. >> >> I know it helps other people who had similar porblems on their HP >> machines with other Broadcom's NIC's. But unfortunately such a >> solution does not fit my case :o(. >> >> Jeremy Chadwick wrote: >>> This is a shot in the dark, but try putting the following in >>> /boot/loader.conf: >>> >>> hw.bge.allow_asf=1 >>> >>> If /boot/loader.conf already has that in it, try a value of 0 >>> instead. > > Please try: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pci/pci.c.diff?r1=1.355;r2=1.356 Great to hear this problem was solved. I still have one big fat question. Why did the system hang and not allow the kernel debugger show up? I strongly believe that this bug would have been easily spotted suppose KDB would have responded. Is it perhaps possible to "harden" KDB, so that such issues are easier to find and fix in future? _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangOn Mon, 26 Nov 2007, Cristian KLEIN wrote: > Great to hear this problem was solved. I still have one big fat question. > Why did the system hang and not allow the kernel debugger show up? I > strongly believe that this bug would have been easily spotted suppose KDB > would have responded. Is it perhaps possible to "harden" KDB, so that such > issues are easier to find and fix in future? I don't know the details of this particular situation, but I can speak to at least one known issue in DDB: right now, getting into DDB from a serial console is a very quick and straight forward path, requiring only the delivery of the serial interrupt and execution of its fast handler. The regular video console keypresses take a much more circuitous route, as syscons isn't MPSAFE, so include the scheduling of an ithread and acquisition of Giant. As such, I've found breaking into the debugger much easier from a serial console for several years. As Giant has been pushed off larger and larger parts of the kernel, the syscons break path has gotten a lot more reliable. There will always be certain cases where a console break (serial or video) will not work, and those include cases where interrupts are disabled on all CPUs (such as if spinlocks are held on all CPUs, perhaps due to one being leaked and then a cascading deadline). In that situation, there's nothing like a nice NMI button or IPMI NMI to get into the debugger :-). We have a feature on i386 and amd64 called MP_WATCHDOG, which allows one CPU to be dedicated to being a watchdog for the others--on lower end hardware this isn't so useful, as CPUs aren't plentiful, but as the number of cores increases, it becomes more and more possible to run this without disrupting normal operation of the machine. When it notices the kernel is no longer running callouts, it delivers an NMI to the other CPUs and kicks (hopefully) one of them into DDB. There are a number of issues with the implementation, not least that we do actually run some other code on the watchdog CPU sometimes as our interrupt routing and scheduler need a bit more adaptation, but it can be quite useful nonetheless. Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangRobert Watson wrote:
> > On Mon, 26 Nov 2007, Cristian KLEIN wrote: > >> Great to hear this problem was solved. I still have one big fat >> question. Why did the system hang and not allow the kernel debugger >> show up? I strongly believe that this bug would have been easily >> spotted suppose KDB would have responded. Is it perhaps possible to >> "harden" KDB, so that such issues are easier to find and fix in future? > > I don't know the details of this particular situation, but I can speak > to at least one known issue in DDB: right now, getting into DDB from a > serial console is a very quick and straight forward path, requiring only > the delivery of the serial interrupt and execution of its fast handler. > The regular video console keypresses take a much more circuitous route, > as syscons isn't MPSAFE, so include the scheduling of an ithread and > acquisition of Giant. As such, I've found breaking into the debugger > much easier from a serial console for several years. As Giant has been > pushed off larger and larger parts of the kernel, the syscons break path > has gotten a lot more reliable. That is very unfortunate. Newer laptops don't come with a serial port anymore. As far as I know, using USB-to-serial converters won't work. > There will always be certain cases > where a console break (serial or video) will not work, and those include > cases where interrupts are disabled on all CPUs (such as if spinlocks > are held on all CPUs, perhaps due to one being leaked and then a > cascading deadline). In that situation, there's nothing like a nice NMI > button or IPMI NMI to get into the debugger :-). IIRC, spinlocks are not an issue anymore. The kernel will throw a message like "spinlock held too long in file, line", and the issue can easily be spotted. Is there any way to forcibly enter the DDB on a serialless laptop, so future problems like this will be spotted faster? Perhaps, should MPSAFEing syscons get more attention? _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangOn Mon, 26 Nov 2007, Cristian KLEIN wrote:
>> I don't know the details of this particular situation, but I can speak to >> at least one known issue in DDB: right now, getting into DDB from a serial >> console is a very quick and straight forward path, requiring only the >> delivery of the serial interrupt and execution of its fast handler. The >> regular video console keypresses take a much more circuitous route, as >> syscons isn't MPSAFE, so include the scheduling of an ithread and >> acquisition of Giant. As such, I've found breaking into the debugger much >> easier from a serial console for several years. As Giant has been pushed >> off larger and larger parts of the kernel, the syscons break path has >> gotten a lot more reliable. > > That is very unfortunate. Newer laptops don't come with a serial port > anymore. As far as I know, using USB-to-serial converters won't work. Many notebooks do, however, have firewire. I've not read the firewire code or used firewire for debugging, so I can't comment on how effective breaks are, but I can say that one of the neatest things about firewire is that you can inspect the kernel memory of a host remotely even when it's frozen solid, which is pretty cool. So if you have a notebook that is also without firewire, you may indeed be out of luck, but with firewire, you have a nice new option. >> There will always be certain cases where a console break (serial or video) >> will not work, and those include cases where interrupts are disabled on all >> CPUs (such as if spinlocks are held on all CPUs, perhaps due to one being >> leaked and then a cascading deadline). In that situation, there's nothing >> like a nice NMI button or IPMI NMI to get into the debugger :-). > > IIRC, spinlocks are not an issue anymore. The kernel will throw a message > like "spinlock held too long in file, line", and the issue can easily be > spotted. Only on an SMP box -- the test is in the spin loop waiting for a spinlock, so only when a second CPU has to hang around for a long time waiting for the lock will that fire. If you have a single-CPU box, it's just a hard wedge with interrupts disabled. > Is there any way to forcibly enter the DDB on a serialless laptop, so future > problems like this will be spotted faster? Perhaps, should MPSAFEing syscons > get more attention? I think getting an MPSAFE syscons would be desirable, but it's a non-trivial piece of work, especially if you take into account that it's tangled up in the tty code. If you have firewire, that may be a useful option. However, I would agree with an assertion that notebooks are becoming less useful as a development platform because of the omission of a real serial port. One of the nice things about true serial ports is that you can run them in purely polled operation quite easily, so use them from within a debugger while interrupts are disasbled. Unfortunately, USB controllers are very complex beasts, and do not lend themselves to low-level operation of this sort. Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
|
|
Re: RELENG_7 and HEAD: bge causes system hangRobert Watson wrote:
> On Mon, 26 Nov 2007, Cristian KLEIN wrote: > *snip* (this probbaly justifies a new thread ... but..) >> >> That is very unfortunate. Newer laptops don't come with a serial port >> anymore. As far as I know, using USB-to-serial converters won't work. > > Many notebooks do, however, have firewire. I've not read the firewire > code or used firewire for debugging, so I can't comment on how effective > breaks are, but I can say that one of the neatest things about firewire > is that you can inspect the kernel memory of a host remotely even when > it's frozen solid, which is pretty cool. So if you have a notebook that > is also without firewire, you may indeed be out of luck, but with > firewire, you have a nice new option. > > > I think getting an MPSAFE syscons would be desirable, but it's a > non-trivial piece of work, especially if you take into account that it's > tangled up in the tty code. If you have firewire, that may be a useful > option. However, I would agree with an assertion that notebooks are > becoming less useful as a development platform because of the omission > of a real serial port. *snip* A) Notebooks are at over 50% of new sales and climbing, so it isn't just as devel platforms - but as sources of 'field reports' of tester / user encountered problems that will become an ever-growing challenge. Worse, unlike a conventional MB, one cannot just plug in a bus card and emulate (or substitute for) the key subsystem involved, so at some point those laptops need to be accomodated. B) It isn't just laptops. Mac Mini-like, small-format packaging is taking another chunk out of the field. These, too are legacy I/O challenged as well as limited in bus sockets. C) Even full-ATX size MB have long-since begun shedding (external) serial ports as well as PS2 mouse & keyboard-ports, may not even ship with the cables or connectors to attach to such 'legacy' serial connectors as reamin - usually well-hidden somewhere on the MB. D) Much as I like FW, I haven't seen any indication that it has a guarantee of survival or universality any greater than once-common IRDA did. Too many price-driven decisions favor 'good enough' and far more common USB 2. Something will be needed soon/already to cover the general gap of missing SIO. We probably *can* count on audio I/O not going away, so perhaps ASCII to fsk - or even text to speech. :-( Bill _______________________________________________ freebsd-current@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@..." |
| Free embeddable forum powered by Nabble | Forum Help |