|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
pbulk hang in 5.99.21Hi!
I've just upgraded my 5.99.21 from Oct 22 to Nov 8 (kernel and userland, packages not rebuilt), and now a pbulk (set up in a tmpfs) hangs during the scan phase. Output including some ctrl-t: Scanning... ................................load: 1.00 cmd: make 29443 [layerfs] 0.13u 0.02s 0% 2196k make: Working in: /usr/pkgsrc/chat/finch make: Working in: /usr/pkgsrc/chat/libpurple make: Working in: /usr/pkgsrc/chat/finch make: Working in: /usr/pkgsrc/net/avahi make: Working in: /usr/pkgsrc/x11/gtk2 load: 1.00 cmd: make 29443 [layerfs] 0.13u 0.02s 0% 2196k make: Working in: /usr/pkgsrc/chat/libpurple make: Working in: /usr/pkgsrc/chat/finch make: Working in: /usr/pkgsrc/chat/finch make: Working in: /usr/pkgsrc/net/avahi make: Working in: /usr/pkgsrc/x11/gtk2 load: 1.08 cmd: make 29443 [layerfs] 0.14u 0.02s 0% 2196k make: Working in: /usr/pkgsrc/chat/libpurple make: Working in: /usr/pkgsrc/x11/gtk2 make: Working in: /usr/pkgsrc/chat/finch make: Working in: /usr/pkgsrc/net/avahi make: Working in: /usr/pkgsrc/chat/finch Second try: ^C # /usr/pkg_bulk/bin/bulkbuild Warning: All log files of the previous pbulk run will be removed in 5 seconds. If you want to abort, press Ctrl-C. Scanning... ........load: 1.06 cmd: make 27628 [tstile] 0.00u 0.00s 0% 1360k load: 1.06 cmd: make 27628 [tstile] 0.00u 0.00s 0% 1360k load: 1.03 cmd: make 27628 [tstile] 0.00u 0.00s 0% 1360k Any ideas? Thomas |
|
|
Re: pbulk hang in 5.99.21On Sun, Nov 08, 2009 at 11:38:15AM +0100, Thomas Klausner wrote:
> I've just upgraded my 5.99.21 from Oct 22 to Nov 8 (kernel and > userland, packages not rebuilt), and now a pbulk (set up in a tmpfs) > hangs during the scan phase. Output including some ctrl-t: Still happens with today's 5.99.22. # /usr/pkg_bulk/bin/bulkbuild Scanning... ........ .......................................... 50/386 .................................................. 100/386 ................ .................................. 150/386 .. load: 1.00 cmd: sh 28152 [wait] 0.00u 0.00s 0% 808k make: Working in: /usr/pkgsrc/misc/tellico make: Working in: /usr/pkgsrc/misc/tellico It's been staying there for minutes, the machine is idle. In case it's a file system locking problem, here's the relevant mount information: tmpfs on /home/wiz/sandbox type tmpfs (local) /bin on /home/wiz/sandbox/bin type null (read-only, local) /sbin on /home/wiz/sandbox/sbin type null (read-only, local) /lib on /home/wiz/sandbox/lib type null (read-only, local) /libexec on /home/wiz/sandbox/libexec type null (read-only, local) /usr/X11R7 on /home/wiz/sandbox/usr/X11R7 type null (read-only, local) /usr/bin on /home/wiz/sandbox/usr/bin type null (read-only, local) /usr/games on /home/wiz/sandbox/usr/games type null (read-only, local) /usr/include on /home/wiz/sandbox/usr/include type null (read-only, local) /usr/lib on /home/wiz/sandbox/usr/lib type null (read-only, local) /usr/libdata on /home/wiz/sandbox/usr/libdata type null (read-only, local) /usr/libexec on /home/wiz/sandbox/usr/libexec type null (read-only, local) /usr/share on /home/wiz/sandbox/usr/share type null (read-only, local) /usr/sbin on /home/wiz/sandbox/usr/sbin type null (read-only, local) /var/mail on /home/wiz/sandbox/var/mail type null (read-only, local) /archive/cvs/src on /home/wiz/sandbox/usr/src type null (read-only, local) /archive/cvs/pkgsrc on /home/wiz/sandbox/usr/pkgsrc type null (local) /archive/cvs/xsrc on /home/wiz/sandbox/usr/xsrc type null (read-only, local) /disk/1/archive/packages/5.99.22 on /home/wiz/sandbox/packages type null (local) /disk/1/archive/distfiles on /home/wiz/sandbox/distfiles type null (local) Thomas |
|
|
Re: pbulk hang in 5.99.21Another data point:
I've just tried removing the sandbox, umount hangs: # ./tmpfs-sandbox umount load: 1.00 cmd: sh 26021 [wait] 0.00u 0.00s 0% 1292k mount now says: tmpfs on /home/wiz/sandbox type tmpfs (local) /bin on /home/wiz/sandbox/bin type null (read-only, local) /sbin on /home/wiz/sandbox/sbin type null (read-only, local) /lib on /home/wiz/sandbox/lib type null (read-only, local) /libexec on /home/wiz/sandbox/libexec type null (read-only, local) /usr/X11R7 on /home/wiz/sandbox/usr/X11R7 type null (read-only, local) /usr/bin on /home/wiz/sandbox/usr/bin type null (read-only, local) /usr/games on /home/wiz/sandbox/usr/games type null (read-only, local) /usr/include on /home/wiz/sandbox/usr/include type null (read-only, local) /usr/lib on /home/wiz/sandbox/usr/lib type null (read-only, local) /usr/libdata on /home/wiz/sandbox/usr/libdata type null (read-only, local) /usr/libexec on /home/wiz/sandbox/usr/libexec type null (read-only, local) /usr/share on /home/wiz/sandbox/usr/share type null (read-only, local) /usr/sbin on /home/wiz/sandbox/usr/sbin type null (read-only, local) /var/mail on /home/wiz/sandbox/var/mail type null (read-only, local) /archive/cvs/src on /home/wiz/sandbox/usr/src type null (read-only, local) /archive/cvs/pkgsrc on /home/wiz/sandbox/usr/pkgsrc type null (local) /archive/cvs/xsrc on /home/wiz/sandbox/usr/xsrc type null (read-only, local) /disk/1/archive/packages/5.99.22 on /home/wiz/sandbox/packages type null (local) Thomas |
|
|
Re: pbulk hang in 5.99.21> On Sun, Nov 08, 2009 at 11:38:15AM +0100, Thomas Klausner wrote:
> > I've just upgraded my 5.99.21 from Oct 22 to Nov 8 (kernel and > > userland, packages not rebuilt), and now a pbulk (set up in a tmpfs) > > hangs during the scan phase. Output including some ctrl-t: > > Still happens with today's 5.99.22. Here is a workaround I'm trying now. enami. Index: sys/kern/vfs_subr.c =================================================================== RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v retrieving revision 1.386 diff -u -r1.386 vfs_subr.c --- sys/kern/vfs_subr.c 5 Nov 2009 08:18:02 -0000 1.386 +++ sys/kern/vfs_subr.c 11 Nov 2009 06:02:33 -0000 @@ -1386,7 +1386,7 @@ vrelel(vnode_t *vp, int flags) { bool recycle, defer; - int error; + int error, islayer_vnode; KASSERT(mutex_owned(&vp->v_interlock)); KASSERT((vp->v_iflag & VI_MARKER) == 0); @@ -1425,6 +1425,7 @@ * XXX This ugly block can be largely eliminated if * locking is pushed down into the file systems. */ + islayer_vnode = (vp->v_iflag & VI_LAYER) != 0; if (curlwp == uvm.pagedaemon_lwp) { /* The pagedaemon can't wait around; defer. */ defer = true; @@ -1432,13 +1433,18 @@ /* We have to try harder. */ vp->v_iflag &= ~VI_INACTREDO; error = vn_lock(vp, LK_EXCLUSIVE | LK_INTERLOCK | - LK_RETRY); + (islayer_vnode ? LK_NOWAIT : LK_RETRY)); if (error != 0) { - /* XXX */ - vpanic(vp, "vrele: unable to lock %p"); - } - defer = false; - } else if ((vp->v_iflag & VI_LAYER) != 0) { + if (islayer_vnode) { + defer = true; + mutex_enter(&vp->v_interlock); + } else { + /* XXX */ + vpanic(vp, "vrele: unable to lock %p"); + } + } else + defer = false; + } else if (islayer_vnode) { /* * Acquiring the stack's lock in vclean() even * for an honest vput/vrele is dangerous because |
|
|
Re: pbulk hang in 5.99.21> > Still happens with today's 5.99.22.
> > Here is a workaround I'm trying now. ... if the symptom you saw is same as what I saw (layer_node_find() is trying to vget() a vnode while vrele_thread is trying to vn_lock() the same vnode). enami. |
|
|
Re: pbulk hang in 5.99.21On Wed, Nov 11, 2009 at 03:04:49PM +0900, enami tsugutomo wrote:
> Here is a workaround I'm trying now. With this workaround I haven't seen the problem again -- usually the pbulk stopped in scanning the first 200-300 packages. With the patch it has now finished the scanning stage and started building. Thanks! Thomas |
|
|
Re: pbulk hang in 5.99.21On Wed, Nov 11, 2009 at 03:04:49PM +0900, enami tsugutomo wrote:
> Here is a workaround I'm trying now. Do you think bouyer's fix addresses this issue? Module Name: src Committed By: bouyer Date: Sat Nov 28 10:10:18 UTC 2009 Modified Files: src/sys/kern: vfs_subr.c Log Message: Previous did cause a deadlock with layered FS: the vrele thread can sleep on the vnode lock, while vget is sleeping on the VI_INACTNOW flag (or the vget caller is looping on vget returning failure because of the VI_INACTNOW flag). With layered FSes, the upper and lower vnodes share the same lock, so the vget() caller above can be already holding the vnode lock. Fix by dropping VI_INACTNOW before sleeping on the vnode lock in vrelel(), and check the ref count again once we have the lock. If the vnode has more than one reference, donc VOP_INACTIVE it. Fix PR kern/42318 and PR kern/42377 patch tested by Hisashi T Fujinaka, Joachim K�nig, Stephen Borrill and Matthias Scheler. To generate a diff of this commit: cvs rdiff -u -r1.391 -r1.392 src/sys/kern/vfs_subr.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Thomas |
|
|
|
|
|
Re: pbulk hang in 5.99.21On Wed, Dec 02, 2009 at 03:46:46PM +0900, enami tsugutomo wrote:
> Yes, almost same effect. Didn't work for you? Good. Seems to work fine for me as well so far (bulk build processed more than 1000 packages). Thanks! Thomas |
| Free embeddable forum powered by Nabble | Forum Help |