Random hangs on 5.0 on R5000 Challenge S

View: New views
8 Messages — Rating Filter:   Alert me  

Random hangs on 5.0 on R5000 Challenge S

by George Harvey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I have a R5000 Challenge S which has been running NetBSD 4.0 quite
happily. Today I upgraded it to 5.0 and now I'm seeing random hangs.

The first one happened during the upgrade process itself when
postinstall hung while performing the cleanups after installing base and
etc. Hitting ctrl-C broke out of postinstall and allowed the rest of the
upgrade to continue. On rebooting from local disk, what I now see are
hangs at various places while running the rc script. For example, I've
seen it hang in fsck, mount, rm, sshd and getty. Sometimes I can use
ctrl-C to abort the hung process and it will continue on for a few steps
then hang somewhere else. Kernel boot messages are included below, if
there is any other info I can provide that would help to find the
problem, let me know.

George

# NetBSD 5.0 on R5000/180 Challenge S

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 5.0 (GENERIC32_IP2x) #0: Mon Apr 27 06:08:08 UTC 2009
        builds@...:/homebuilds/ab/netbsd-5-0-RELEASE/sgimips/200904260229Z-obj/home/buildsab/netbsd-5-0-RELEASE/src/sys/arch/sgimips/compile/GENERIC32_IP2x
total memory = 256 MB
(768 KB reserved for ARCS)
avail memory = 245 MB
mainbus0 (root): SGI-IP22 [SGI, 690ac9fb], 1 processor
cpu0 at mainbus0: MIPS R5000 CPU (0x2310) Rev. 1.0 with built-in FPU Rev. 1.0
cpu0: 32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB etries
cpu0: 32KB/32B 2-way set-associative write-back L1 Data cache
cpu0: 512KB/32B direct-mapped write-through L2 Data cache
ioc0 at mainbus0 addr 0x1fbd9800: rev 0, machine Idy (Guinness), board rev 0
int0 at mainbus0 addr 0x1fbd9880
int0: bus 90MHz, CPU 180MHz
imc0 at mainbus0 addr 0x1fa00000: revision 3
gio0 at imc0
giopci0 at gio0 slot 1 addr 0x1f400000: Phobos G130 10/100 Ethernet
pci0 at giopci0 bus 0
tlp0 at pci0 dev 0 function 0: DECchip 21143 Ethernet, pass 4.1
tlp0: interrupting at slot EXP0
tlp0: Ethernet address 00:60:f5:08:23:07
lxtphy0 at tlp0 phy 1: LXT970 10/100 media interface, rev. 3
lxtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
hpc0 at gio0 addr 0x1fb80000: SGI HPC3 (onboard)
zsc0 at hpc0 offset 0x59830
zstty0 at zsc0 channel 1 (console i/o)
zstty1 at zsc0 channel 0
pckbc0 at hpc0 offset 0x59840
sq0 at hpc0 offset 0x54000: SGI Seeq 80c03
sq0: Ethernet address 08:00:69:0a:c9:fb
wdsc0 at hpc0 offset 0x44000: WD33C93B (20.0 MHz clock, BURST DMA, SCSI ID 0)
wdsc0: microcode revision 0x0d, Fast SCSI
scsibus0 at wdsc0: 8 targets, 8 luns per target
dsclock0 at hpc0 offset 0x60000
pi1ppc0 at hpc0 offset 0x58000
pi1ppc0: capabilities=8<PS2>
ppbus0 at pi1ppc0
ppbus0: No IEEE1284 device found.
pi1ppc at hpc0 offset 0x59800 not configured
hpc1 at gio0 addr 0x1fb00000: SGI HPC3 (IOPLUS mezzanine)
hpc1: using EXP1's DMA channel
sq1 at hpc1 offset 0x54000: SGI Seeq 80c03
sq1: Ethernet address 08:00:69:02:9d:a2
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 2 lun 0: <IBM, DNES-309170, SAH0> disk fixed
sd0: 8748 MB, 11474 cyl, 5 head, 312 sec, 512 bytes/sect x 17916240 sectors
sd0: sync (100.00ns offset 12), 8-bit (10.000MB/s) transfers, tagged queueing
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
Tue Jul  7 14:39:12 GMT 2009


Re: Random hangs on 5.0 on R5000 Challenge S

by Christos Zoulas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In article <20090707165703.4880d5d4.fr30@...>,
George Harvey  <fr30@...> wrote:

>Hi,
>
>I have a R5000 Challenge S which has been running NetBSD 4.0 quite
>happily. Today I upgraded it to 5.0 and now I'm seeing random hangs.
>
>The first one happened during the upgrade process itself when
>postinstall hung while performing the cleanups after installing base and
>etc. Hitting ctrl-C broke out of postinstall and allowed the rest of the
>upgrade to continue. On rebooting from local disk, what I now see are
>hangs at various places while running the rc script. For example, I've
>seen it hang in fsck, mount, rm, sshd and getty. Sometimes I can use
>ctrl-C to abort the hung process and it will continue on for a few steps
>then hang somewhere else. Kernel boot messages are included below, if
>there is any other info I can provide that would help to find the
>problem, let me know.

Can you cause processes to hang? Can you ktrace them?

christos


Re: Random hangs on 5.0 on R5000 Challenge S

by George Harvey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 7 Jul 2009 19:08:49 +0000 (UTC)
christos@... (Christos Zoulas) wrote:

> In article <20090707165703.4880d5d4.fr30@...>,
> George Harvey  <fr30@...> wrote:
> >Hi,
> >
> >I have a R5000 Challenge S which has been running NetBSD 4.0 quite
> >happily. Today I upgraded it to 5.0 and now I'm seeing random hangs.
> >
> >The first one happened during the upgrade process itself when
> >postinstall hung while performing the cleanups after installing base
> >and etc. Hitting ctrl-C broke out of postinstall and allowed the rest
> >of the upgrade to continue. On rebooting from local disk, what I now
> >see are hangs at various places while running the rc script. For
> >example, I've seen it hang in fsck, mount, rm, sshd and getty.
> >Sometimes I can use ctrl-C to abort the hung process and it will
> >continue on for a few steps then hang somewhere else. Kernel boot
> >messages are included below, if there is any other info I can provide
> >that would help to find the problem, let me know.
>
> Can you cause processes to hang? Can you ktrace them?

No, the system hangs completely before it gets to the login prompt.
However, I can drop into ddb on the console.

As an experiment today, I moved the disk into another Challenge S in my
collection, this time with a R4400 CPU. So far, I haven't had any hangs
on the R4400 system, I can login, compile programs, access the network
and everything works normally. That suggests to me that problem may be
R5000 specific, maybe a cache issue?

George

 

Re: Random hangs on 5.0 on R5000 Challenge S

by Izumi Tsutsui :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

fr30@... wrote:

> christos@... (Christos Zoulas) wrote:
> > >
> > >I have a R5000 Challenge S which has been running NetBSD 4.0 quite
> > >happily. Today I upgraded it to 5.0 and now I'm seeing random hangs.
 :
> No, the system hangs completely before it gets to the login prompt.
> However, I can drop into ddb on the console.
>
> As an experiment today, I moved the disk into another Challenge S in my
> collection, this time with a R4400 CPU. So far, I haven't had any hangs
> on the R4400 system, I can login, compile programs, access the network
> and everything works normally. That suggests to me that problem may be
> R5000 specific, maybe a cache issue?

Last time I tried 5.0 on R5000 Indy, I saw some hangs.
I have not tracked the problem, but R5000 O2 works fine
so maybe there are some IP22/R5000 specific problems.
(probably around wdsc SCSI?)

---
Izumi Tsutsui

Re: Random hangs on 5.0 on R5000 Challenge S

by Michael Lorenz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

On Jul 8, 2009, at 12:21 PM, George Harvey wrote:

> No, the system hangs completely before it gets to the login prompt.
> However, I can drop into ddb on the console.


So it hangs when starting userland. Strange.
My R5k/150MHz/512kB Indy works just fine, I don't have the right  
cables to try booting it with a seial console though. That said, I  
dimly remember issues with serial console on some SGI machines, not  
sure if it was IP2x or something else though.

have fun
Michael

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQEVAwUBSlTVfspnzkX8Yg2nAQJadAf/SLKM9X6SyVoducKNC9s0pMR9IWNtVFXN
dJBGTdDNl37jfkC/HCVkp322z9ij1kttRBvnQWtI1cdCDOQw0fwFW91kkk8U+sAT
kCpsn86ctppEtsNldqdBL9b6jY6hGxL80cVhtvJ0L1/pJp5NxhEEkGXKtNWlqZZr
XfPE6TGmjT3/vHnqFLhqElfb4gr+IwN2GCGbdgLosNt2hk7oU/WPprEqwYTCtHqW
ZSVYCUtFZO/Qo0nWBIMYybdHhwVUxwTZ4Ibawa+1b2xzi7f8hoBFJKVD693fNZMG
BxBpABGXrbwPIlLvDJmkG+eBj9elOyY1CmOTZNYBt0m42xd+aA0lcg==
=1P40
-----END PGP SIGNATURE-----

Re: Random hangs on 5.0 on R5000 Challenge S

by Stephen M. Rumble-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 09, 2009 at 01:57:27AM +0900, Izumi Tsutsui wrote:
> Last time I tried 5.0 on R5000 Indy, I saw some hangs.
> I have not tracked the problem, but R5000 O2 works fine
> so maybe there are some IP22/R5000 specific problems.
> (probably around wdsc SCSI?)

The only differences between 4.0 and 5.0 in wdsc are that the clock
is now correctly specified as 20MHz (was 10MHz), Burst DMA mode
is enabled, and Fast SCSI support was added.

Setting sc_clkfreq to 100 and sc_dmamode to SBIC_CTL_DMA in
sys/arch/sgimips/hpc/wdsc.c should essentially revert all of the above
and return to near NetBSD-4 behaviour.

There were also a fair number of Challenge-S related changes to
hpc/hpc.c, but it all used to work, at least.

Steve

Re: Random hangs on 5.0 on R5000 Challenge S

by George Harvey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 8 Jul 2009 13:21:02 -0400
Michael <macallan@...> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> On Jul 8, 2009, at 12:21 PM, George Harvey wrote:
>
> > No, the system hangs completely before it gets to the login prompt.
> > However, I can drop into ddb on the console.
>
> So it hangs when starting userland. Strange.
> My R5k/150MHz/512kB Indy works just fine, I don't have the right  
> cables to try booting it with a seial console though. That said, I  
> dimly remember issues with serial console on some SGI machines, not  
> sure if it was IP2x or something else though.

I've got a R5000/150 Indy here as well so I tried a fresh install of 5.0
on that. The first attempt completed installing all the sets and got to
the stage of setting the root password then hung. On rebooting, the root
and usr partitions were hopelessly corrupt so I did a re-install and
this time skipped setting the password. The second install completed
normally but when I rebooted it hung while trying to start the network.
On the next reboot it hung after the 'root file system type' line.

From the boot messages, my R5000/150 box has:

MIPS R5000 CPU (0x2310) Rev. 1.0 with built-in FPU Rev. 1.0
32KB/32B 2-way set-associative L1 Instruction cache, 48 TLB entries
32KB/32B 2-way set-associative write-back L1 Data cache
512KB/32B direct-mapped write-through L2 data cache

How does that compare with yours?

George


> have fun
> Michael
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (Darwin)
>
> iQEVAwUBSlTVfspnzkX8Yg2nAQJadAf/SLKM9X6SyVoducKNC9s0pMR9IWNtVFXN
> dJBGTdDNl37jfkC/HCVkp322z9ij1kttRBvnQWtI1cdCDOQw0fwFW91kkk8U+sAT
> kCpsn86ctppEtsNldqdBL9b6jY6hGxL80cVhtvJ0L1/pJp5NxhEEkGXKtNWlqZZr
> XfPE6TGmjT3/vHnqFLhqElfb4gr+IwN2GCGbdgLosNt2hk7oU/WPprEqwYTCtHqW
> ZSVYCUtFZO/Qo0nWBIMYybdHhwVUxwTZ4Ibawa+1b2xzi7f8hoBFJKVD693fNZMG
> BxBpABGXrbwPIlLvDJmkG+eBj9elOyY1CmOTZNYBt0m42xd+aA0lcg==
> =1P40
> -----END PGP SIGNATURE-----
>

Re: Random hangs on 5.0 on R5000 Challenge S

by Izumi Tsutsui :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Steve wrote:

> The only differences between 4.0 and 5.0 in wdsc are that the clock
> is now correctly specified as 20MHz (was 10MHz), Burst DMA mode
> is enabled, and Fast SCSI support was added.
>
> Setting sc_clkfreq to 100 and sc_dmamode to SBIC_CTL_DMA in
> sys/arch/sgimips/hpc/wdsc.c should essentially revert all of the above
> and return to near NetBSD-4 behaviour.

Unfortunately it doesn't help.

IIRC 4.99.72 kernel (around August 2008; when I updated bootinfo handling)
worked, so maybe we should check changes after that point.
(though there are not so particular changes under arch/sgimips..)

I'll check which older versions actually work.
---
Izumi Tsutsui