[ ssic-linux-Bugs-1811510 ] deadlock on loop mounted fs

View: New views
1 Messages — Rating Filter:   Alert me  

[ ssic-linux-Bugs-1811510 ] deadlock on loop mounted fs

by SourceForge.net :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bugs item #1811510, was opened at 2007-10-11 08:22
Message generated for change (Settings changed) made by rogertsang
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1811510&group_id=32541

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Filesystem
Group: default
Status: Open
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: John Hughes (hughesj)
Assigned to: Roger Tsang (rogertsang)
Summary: deadlock on loop mounted fs

Initial Comment:
1. Make a sparse file

   perl -e 'open BIGFILE, ">BIGFILE"; seek BIGFILE, 1024 * 1024 * 1024, 0; print BIGFILE "big"'

2. make a filesystem on it

   losetup /dev/loop/0 BIGFILE
   mkfs -t ext3 /dev/loop/0

3. mount it

   mount -t ext3 /dev/loop/0 /mnt

4. write a lot of files to it

   cd /mnt
   dump 0f - / | restore rf -

eventualy the node where we are writing to the loopback mounted fs gets deadlocked.  It's still up as far as the cluster is concerned, but any attempt to start a process on it blocks.



----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2009-03-24 01:06

Message:
checked-in but needs verification

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2009-03-22 17:35

Message:
Testing new code to associate separate BDI per mount.  This should allow us
to support recursively stacked CFS.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-10-10 10:52

Message:
checked-in

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-06-19 02:32

Message:
Logged In: YES
user_id=1246761
Originator: NO

Try the attached patch.

More work would need to be done to pass a flag to kernel space for CFS to
use a different congestion bit in the case of CFS on loopback.  However the
proposed solution only works if you are not going to CFS mount another
loopback on top of a CFS mount on loopback on CFS.  So the simple fix would
be this patch.  Loopback becomes a standard mount.
File Added: util-linux.1811510.patch

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2008-03-16 20:35

Message:
Logged In: YES
user_id=1246761
Originator: NO

Should be fixed in 2.0.0pre3...

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-10-20 21:58

Message:
Logged In: YES
user_id=1246761
Originator: NO

It looks like CFS ran out of memory.  Try the latest checkin of
kernel/cluster/ssi/cfs code that re-enables commit for soft mounts.

----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-10-20 14:42

Message:
Logged In: YES
user_id=1246761
Originator: NO

Does 2.6.10-ssi run into this bug?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2007-10-16 10:33

Message:
Logged In: NO

Still looks the same as the old bug... This time it is stacked
generic_file_writev().

cfs_async (has i_sem)
  loop0
    pdflush
      kjournald
        cfs_async (waiting for i_sem)

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2007-10-12 07:36

Message:
Logged In: YES
user_id=166336
Originator: YES

Here's some debugging.  I've got to the point where the "restore" process
on node 1 seems hung.  On node 2 I try an "onnode 1 pwd".  It hangs.

One node 1:

Entering kdb (current=0xc0502bc0, pid 0) on processor 0 due to Keyboard
Entry
[0]kdb> ps
1 idle process (state I) and 50 sleeping system daemon (state M) processes
suppressed
Task Addr       Pid   Parent [*] cpu State Thread     Command

0xcf82a5d0        5        2  0    0   R  0xcf82a7b0  events/0
0xcf68b990      117       11  0    0   D  0xcf68bb70  pdflush
0xcf68a310      121        2  0    0   D  0xcf68a4f0  cfs_async
0xcf6b99b0      122        2  0    0   D  0xcf6b9b90  cfs_async
0xcf6b9410      123        2  0    0   D  0xcf6b95f0  cfs_async
0xcf6b8e70      124        2  0    0   D  0xcf6b9050  cfs_async
0xcf6b88d0      125        2  0    0   D  0xcf6b8ab0  cfs_async
0xcf6b8330      126        2  0    0   D  0xcf6b8510  cfs_async
0xcf6c99d0      127        2  0    0   D  0xcf6c9bb0  cfs_async
0xcf6c9430      128        2  0    0   D  0xcf6c9610  cfs_async
0xce92b730        1        0  0    0   D  0xce92b910  init
[...]
0xce90b150    67763        2  0    0   D  0xce90b330  loop0
0xce90d170    67820        2  0    0   D  0xce90d350  kjournald
0xce8f96d0    67822    67636  0    0   S  0xce8f98b0  dump
0xce8f8b90    67823    67636  0    0   D  0xce8f8d70  restore
0xcf13f970    67824    67822  0    0   S  0xcf13fb50  dump
0xcf13f3d0    67825    67824  0    0   S  0xcf13f5b0  dump
0xcf7861f0    67826    67824  0    0   S  0xcf7863d0  dump
0xcf786790    67827    67824  0    0   S  0xcf786970  dump
0xcf47d9b0   132773        2  0    0   D  0xcf47db90  onnode
[0]kdb> btp 132773
Stack traceback for pid 132773
0xcf47d9b0   132773        2  0    0   D  0xcf47db90  onnode
EBP        EIP        Function (args)
0xce879ba8 0xc046c2e6 schedule+0x3a6 (0xce879c10)
0xce879bb4 0xc046d348 io_schedule+0x28 (0xc1271c70)
0xce879bc0 0xc014aed5 sync_page+0x45 (0xc10c37f8, 0x0, 0xc014ae90,
0xcf47d9b0, 0xce879c10)
0xce879be0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc10c37f8,
0xc10c37f8, 0x0, 0x0)
0xce879c3c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xce7c31a0, 0x0,
0x1)
0xce879cc4 0xc014beeb do_generic_mapping_read+0x3db (0xce88ca00,
0xce7c31f0, 0xce7c31a0, 0xce879e00, 0xce879d00)
0xce879d1c 0xc014c3ed __generic_file_aio_read+0x1ed (0xce879dc4,
0xce879d34, 0x1, 0xce879e00, 0xcf06d600)
0xce879d48 0xc014c473 generic_file_aio_read+0x53 (0xce879dc4, 0xcf06d600,
0x80, 0x0, 0x0)
0xce879d84 0xc028375a __cfs_file_read+0xaa (0xce879dc4, 0x0, 0xcf06d600,
0x80, 0xce879da0)
0xce879da8 0xc0283828 cfs_file_aio_read+0x38 (0xce879dc4, 0xcf06d600,
0x80, 0x0, 0x0)
0xce879e50 0xc016c3b3 do_sync_read+0xa3 (0xce7c31a0, 0xcf06d600, 0x80,
0xce879e8c, 0xce879000)
0xce879e74 0xc016c490 vfs_read+0xb0 (0xce7c31a0, 0xcf06d600, 0x80,
0xce879e8c, 0x0)
0xce879e9c 0xc017895a kernel_read+0x4a (0xce7c31a0, 0x0, 0xcf06d600, 0x80,
0xcf06d600)
0xce879ec0 0xc017946a prepare_binprm+0xca (0xcf06d600, 0x7fff, 0xc13b4080,
0x0, 0x0)
0xce879eec 0xc0179a16 ssi_do_execve+0x1a6 (0xcf012920, 0xce6f8800,
0xcf6aa400, 0xce879fa0, 0x0)
0xce879f78 0xc0245c3a rexecve_server+0xea (0xcf50e000, 0xcf47d9b0,
0xcf012920, 0xce6f8800, 0xcf6aa400)
0xce879fec 0xc02454f5 rexecve_server_setup+0x55
           0xc01023a5 kernel_thread_helper+0x5
[0]kdb> btp 67823
Stack traceback for pid 67823
0xce8f8b90    67823    67636  0    0   D  0xce8f8d70  restore
EBP        EIP        Function (args)
0xca2fcea0 0xc046c2e6 schedule+0x3a6 (0x0, 0xce8f8b90, 0xc013f0a0,
0xca2fced4, 0xca2fced4)
0xca2fcef4 0xc029f3ba cfs_wait_on_request+0x7a (0xc9a8c200, 0xca2fcf14,
0x0, 0x1, 0x0)
0xca2fcf24 0xc0285a9e cfs_wait_on_requests+0x8e (0xccb63be4, 0x0, 0x0,
0x0, 0xce7c3600)
0xca2fcf48 0xc0286f66 cfs_sync_inode+0x76 (0xccb63be4, 0x0, 0x0, 0x2,
0x0)
0xca2fcf80 0xc0283653 cfs_file_flush+0x93 (0xce7c3600, 0x81a4, 0xccdef200,
0x5, 0xccdef204)
0xca2fcf9c 0xc016bb3c filp_close+0x6c (0xce7c3600, 0xccdef200, 0xce7c3600,
0x5, 0x0)
0xca2fcfbc 0xc016bbce sys_close+0x6e
           0xc0105a3b syscall_call+0x7
[0]kdb>
[0]kdb> btp 67763
Stack traceback for pid 67763
0xce90b150    67763        2  0    0   D  0xce90b330  loop0
EBP        EIP        Function (args)
0xca488db8 0xc046c2e6 schedule+0x3a6 (0xca488e20)
0xca488dc4 0xc046d348 io_schedule+0x28 (0xc12711e0)
0xca488dd0 0xc014aed5 sync_page+0x45 (0xc11d6be0, 0x0, 0xc014ae90,
0xce90b150, 0xca488e20)
0xca488df0 0xc046d6fe __wait_on_bit_lock+0x5e (0x2, 0xc11d6be0,
0xc11d6be0, 0x0, 0x0)
0xca488e4c 0xc014b744 __lock_page+0x84 (0xc049efb5, 0xa7, 0xcd6ca600,
0x38002, 0x1)
0xca488ed4 0xc014beeb do_generic_mapping_read+0x3db (0xcb632f40,
0xcd6ca650, 0xcd6ca600, 0xca488f58, 0xca488ef4)
0xca488f04 0xc014c61b generic_file_sendfile+0x5b (0xcd6ca600, 0xca488f58,
0x1000, 0xd08f15d0, 0xca488f60)
0xca488f3c 0xc02838bd cfs_file_sendfile+0x8d (0xcd6ca600, 0xca488f58,
0x1000, 0xd08f15d0, 0xca488f60)
0xca488f74 0xd08f16fc [loop]do_lo_receive+0x5c (0xc9353000, 0xc4279630,
0x1000, 0x38002000, 0x0)
0xca488fa4 0xd08f176e [loop]lo_receive+0x5e (0xc9353000, 0xc1ed33e0,
0x1000, 0x38002000, 0x0)
0xca488fc8 0xd08f17eb [loop]do_bio_filebacked+0x4b (0xc9353000,
0xc1ed33e0, 0x0, 0xc9353138, 0xd08f1a60)
0xca488fec 0xd08f1b3b [loop]loop_thread+0xdb
           0xc01023a5 kernel_thread_helper+0x5




----------------------------------------------------------------------

Comment By: Roger Tsang (rogertsang)
Date: 2007-10-11 22:21

Message:
Logged In: YES
user_id=1246761
Originator: NO

Sounds like [ 686748 ] Filesystem stacking deadlock.

----------------------------------------------------------------------

Comment By: John Hughes (hughesj)
Date: 2007-10-11 08:22

Message:
Logged In: YES
user_id=166336
Originator: YES

This is with the 2.6.11 kernel

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=405834&aid=1811510&group_id=32541

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
ssic-linux-devel mailing list
ssic-linux-devel@...
https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel