|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
2.6.31 xfs_fs_destroy_inode: cannot reclaimHello All,
We had this error reported on the list about 1 or 2 months ago. During that time a lot of fixes were applied. However, we still experience this problem with the recent 2.6.31 tree. We've also applied an extra log entry to aid in debugging. printk("XFS: inode_init_always failed to re-initialize inode\n"); However, we didn't see this logging! Here's a screenshot of our latest crash: http://www.news-service.com/tmp/sb06-20090916.jpg Here's the config used just in case: http://www.news-service.com/tmp/config-2.6.31.txt For now we've downgraded to 2.6.28 again. Please let me know if we can do something to better troubleshoot this. We have a set of 8 servers which can easily reproduce this. It mostly happens within a few days after a clean reboot. Kind Regards, Tommy van Leeuwen -- **Warning** New Addres from May 25th, 2009! News-Service.com - European Usenet Provider Pobox 12026, 1100 AA Amsterdam, Netherlands http://www.news-service.com - +3120-3981111 _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Wed, Sep 16, 2009 at 12:27:21PM +0200, Tommy van Leeuwen wrote:
> Hello All, > > We had this error reported on the list about 1 or 2 months ago. During > that time a lot of fixes were applied. However, we still experience > this problem with the recent 2.6.31 tree. We've also applied an extra > log entry to aid in debugging. > > printk("XFS: inode_init_always failed to re-initialize inode\n"); > > However, we didn't see this logging! Can you try the patch below, its does two things - remove all that reclaimable flagging if we reclaim the inode directly. This removes any possibility of racing with the reclaiming thread. - adds asserts if one of the reclaim-related flags is already set. Index: xfs/fs/xfs/xfs_vnodeops.c =================================================================== --- xfs.orig/fs/xfs/xfs_vnodeops.c 2009-09-17 14:39:37.799003843 -0300 +++ xfs/fs/xfs/xfs_vnodeops.c 2009-09-17 14:50:14.987005862 -0300 @@ -2460,39 +2460,35 @@ int xfs_reclaim( xfs_inode_t *ip) { - xfs_itrace_entry(ip); ASSERT(!VN_MAPPED(VFS_I(ip))); /* bad inode, get out here ASAP */ - if (is_bad_inode(VFS_I(ip))) { - xfs_ireclaim(ip); - return 0; - } + if (is_bad_inode(VFS_I(ip))) + goto out_reclaim; xfs_ioend_wait(ip); ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0); /* + * We should never get here with one of the reclaim flags already set. + */ + BUG_ON(xfs_iflags_test(ip, XFS_IRECLAIMABLE)); + BUG_ON(xfs_iflags_test(ip, XFS_IRECLAIM)); + + /* * If we have nothing to flush with this inode then complete the - * teardown now, otherwise break the link between the xfs inode and the - * linux inode and clean up the xfs inode later. This avoids flushing - * the inode to disk during the delete operation itself. - * - * When breaking the link, we need to set the XFS_IRECLAIMABLE flag - * first to ensure that xfs_iunpin() will never see an xfs inode - * that has a linux inode being reclaimed. Synchronisation is provided - * by the i_flags_lock. + * teardown now, otherwise delay the flush operation. */ - if (!ip->i_update_core && (ip->i_itemp == NULL)) { - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_iflock(ip); - xfs_iflags_set(ip, XFS_IRECLAIMABLE); - return xfs_reclaim_inode(ip, 1, XFS_IFLUSH_DELWRI_ELSE_SYNC); + if (ip->i_update_core || ip->i_itemp) { + xfs_inode_set_reclaim_tag(ip); + return 0; } - xfs_inode_set_reclaim_tag(ip); + +out_reclaim: + xfs_ireclaim(ip); return 0; } _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimChristoph Hellwig wrote:
> On Wed, Sep 16, 2009 at 12:27:21PM +0200, Tommy van Leeuwen wrote: >> Hello All, >> >> We had this error reported on the list about 1 or 2 months ago. During >> that time a lot of fixes were applied. However, we still experience >> this problem with the recent 2.6.31 tree. We've also applied an extra >> log entry to aid in debugging. >> >> printk("XFS: inode_init_always failed to re-initialize inode\n"); >> >> However, we didn't see this logging! > > Can you try the patch below, its does two things > > - remove all that reclaimable flagging if we reclaim the inode > directly. This removes any possibility of racing with the reclaiming > thread. > - adds asserts if one of the reclaim-related flags is already set. Update: We've applied this patch on 2 servers. They didn't crash until now. Today we've applied the patch on 6 other servers. We'll keep you posted. -Patrick _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Tue, Sep 29, 2009 at 12:15:42PM +0200, Patrick Schreurs wrote:
> Update: We've applied this patch on 2 servers. They didn't crash until > now. Today we've applied the patch on 6 other servers. Thanks. I'll prepare a patch for upstream as the patch is extremly useful by itself. IF other issues show up I'll fix it on top of it. _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimChristoph Hellwig wrote:
> On Tue, Sep 29, 2009 at 12:15:42PM +0200, Patrick Schreurs wrote: >> Update: We've applied this patch on 2 servers. They didn't crash until >> now. Today we've applied the patch on 6 other servers. > > Thanks. I'll prepare a patch for upstream as the patch is extremly > useful by itself. IF other issues show up I'll fix it on top of it. Unfortunately we had a crashing server last night. Please see attachment. Hope it helps. Please advice if there is anything we could do to assist you. Thanks, -Patrick _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Wed, Sep 30, 2009 at 12:48:55PM +0200, Patrick Schreurs wrote:
> Christoph Hellwig wrote: >> On Tue, Sep 29, 2009 at 12:15:42PM +0200, Patrick Schreurs wrote: >>> Update: We've applied this patch on 2 servers. They didn't crash >>> until now. Today we've applied the patch on 6 other servers. >> >> Thanks. I'll prepare a patch for upstream as the patch is extremly >> useful by itself. IF other issues show up I'll fix it on top of it. > > Unfortunately we had a crashing server last night. Please see > attachment. Hope it helps. Please advice if there is anything we could > do to assist you. Can't really see much there except some common code. Can you boot the machine with a larger console resolution (vga= kernel parameter) so a full backtrace can be captured? _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimDear Christoph,
Yesterday two of our servers (2.6.31.1 + your patch) crashed again, this time we have a bigger console, but not the full backtrace unfortunately. I did manage to get some more calltrace info from the logs, which I have attached together with the screenshots of the crashscreens. I hope this info helps you. Kind Regards, Bas Couwenberg -- News-Service.com - European Usenet Provider Luttenbergweg 4, 1101 EC Amsterdam P.O BOX: 12026 1100 AA, Netherlands http://www.news-service.com +31(0)20 398 1111 Oct 1 22:44:01 sb06 kernel: Oct 1 22:44:01 sb06 kernel: Call Trace: Oct 1 22:44:01 sb06 kernel: [<ffffffff810e69e4>] ? xfs_bmap_read_extents+0x274/0x30c Oct 1 22:44:01 sb06 kernel: [<ffffffff810e7f44>] ? xfs_bmapi+0x25d/0xea8 Oct 1 22:44:01 sb06 kernel: [<ffffffff8113de7c>] ? swiotlb_map_page+0x73/0xe1 Oct 1 22:44:01 sb06 kernel: [<ffffffff81055e8c>] ? find_get_page+0x1a/0x77 Oct 1 22:44:01 sb06 kernel: [<ffffffff8105689d>] ? find_or_create_page+0x2d/0x88 Oct 1 22:44:01 sb06 kernel: [<ffffffff81103e59>] ? xfs_iomap+0x145/0x284 Oct 1 22:44:01 sb06 kernel: [<ffffffff811174cb>] ? __xfs_get_blocks+0x6c/0x15c Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff810a0fd6>] ? mpage_readpages+0xbd/0xff Oct 1 22:44:01 sb06 kernel: [<ffffffff81102664>] ? xfs_iread+0x152/0x166 Oct 1 22:44:01 sb06 kernel: [<ffffffff8103e723>] ? bit_waitqueue+0x10/0x8b Oct 1 22:44:01 sb06 kernel: [<ffffffff8105cb1c>] ? __do_page_cache_readahead+0x125/0x1b1 Oct 1 22:44:01 sb06 kernel: [<ffffffff8105cdb6>] ? ondemand_readahead+0x11f/0x1a7 Oct 1 22:44:01 sb06 kernel: [<ffffffff810a0b84>] ? do_mpage_readpage+0x163/0x486 Oct 1 22:44:01 sb06 kernel: [<ffffffff81136dd2>] ? radix_tree_insert+0xd7/0x19f Oct 1 22:44:01 sb06 kernel: [<ffffffff8105626b>] ? add_to_page_cache_locked+0x72/0x98 Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff8111e0c8>] ? xfs_read+0x16e/0x1de Oct 1 22:44:01 sb06 kernel: [<ffffffff8107c889>] ? do_sync_read+0xce/0x113 Oct 1 22:44:01 sb06 kernel: [<ffffffff8107d3ec>] ? sys_read+0x45/0x6e Oct 1 22:44:01 sb06 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 Oct 1 22:44:01 sb06 kernel: IP: [<ffffffff810f9516>] xfs_dir2_sf_lookup+0xe3/0x219 Oct 1 22:44:01 sb06 kernel: Oops: 0000 [#1] SMP Oct 1 22:44:01 sb06 kernel: CPU 2 Oct 1 22:44:01 sb06 kernel: Pid: 6804, comm: diablo Not tainted 2.6.31.1xfspatch #4 PowerEdge 1950 Oct 1 22:44:01 sb06 kernel: RSP: 0018:ffff88017ce8db68 EFLAGS: 00010202 Oct 1 22:44:01 sb06 kernel: RAX: 0000000000000006 RBX: 0000000000000000 RCX: 00000000e62cdb77 Oct 1 22:44:01 sb06 kernel: RDX: 00000000e62cc212 RSI: 0000000000000002 RDI: ffff88017ce8dbb8 Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff8105ad0c>] ? __alloc_pages_nodemask+0xf8/0x524 Oct 1 22:44:01 sb06 kernel: FS: 0000000001369860(0063) GS:ffff880028066000(0000) knlGS:0000000000000000 Oct 1 22:44:01 sb06 kernel: [<ffffffff810574e4>] ? generic_file_aio_read+0x1ff/0x548 Oct 1 22:44:01 sb06 kernel: [<ffffffff8103e7ed>] ? autoremove_wake_function+0x0/0x2e Oct 1 22:44:01 sb06 kernel: [<ffffffff8107d294>] ? vfs_read+0xaa/0x146 Oct 1 22:44:01 sb06 kernel: [<ffffffff8100adab>] ? system_call_fastpath+0x16/0x1b Oct 1 22:44:01 sb06 kernel: PGD 17ce81067 PUD 17ce82067 PMD 0 Oct 1 22:44:01 sb06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Oct 1 22:44:01 sb06 kernel: Modules linked in: acpi_cpufreq cpufreq_ondemand ipmi_si ipmi_devintf ipmi_msghandler bonding serio_raw mptspi rng_core scsi_transport_spi bnx2 processor thermal 8250_pnp 8250 serial_core thermal_sys Oct 1 22:44:01 sb06 kernel: RIP: 0010:[<ffffffff810f9516>] [<ffffffff810f9516>] xfs_dir2_sf_lookup+0xe3/0x219 Oct 1 22:44:01 sb06 kernel: CR2: 0000000000000001 CR3: 000000017ce80000 CR4: 00000000000006a0 Oct 1 22:44:01 sb06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 1 22:44:01 sb06 kernel: RBP: 0000000000000000 R08: ffff880005cc3c00 R09: ffff88022d867080 Oct 1 22:44:01 sb06 kernel: R10: ffffffff813457b0 R11: ffff88017f661cd0 R12: ffff88017ce8dbb8 Oct 1 22:44:01 sb06 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff88017ce8dc98 Oct 1 22:44:01 sb06 kernel: Process diablo (pid: 6804, threadinfo ffff88017ce8c000, task ffff88022d867080) Oct 1 22:44:01 sb06 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 1 22:44:01 sb06 kernel: <0> 0000000000000000 0000000000000000 ffff88017ce8dc98 ffffffff810f2754 Oct 1 22:44:01 sb06 kernel: <0> ffff8800a540b900 ffff88022d8672f8 ffff880154034e20 0000000000000006 Oct 1 22:44:01 sb06 kernel: [<ffffffff810f2754>] ? xfs_dir_lookup+0xa5/0x147 Oct 1 22:44:01 sb06 kernel: [<ffffffff81083b3d>] ? do_lookup+0xd5/0x1b3 Oct 1 22:44:01 sb06 kernel: [<ffffffff810858f0>] ? __link_path_walk+0x966/0xe0d Oct 1 22:44:01 sb06 kernel: [<ffffffff8107db53>] ? get_empty_filp+0x70/0x119 Oct 1 22:44:01 sb06 kernel: [<ffffffff81085fc5>] ? path_walk+0x66/0xca Oct 1 22:44:01 sb06 kernel: [<ffffffff8108ee1c>] ? alloc_fd+0x67/0x10b Oct 1 22:44:01 sb06 kernel: [<ffffffff8100adab>] ? system_call_fastpath+0x16/0x1b Oct 1 22:44:01 sb06 kernel: RIP [<ffffffff810f9516>] xfs_dir2_sf_lookup+0xe3/0x219 Oct 1 22:44:01 sb06 kernel: CR2: 0000000000000001 Oct 1 22:44:01 sb06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 1 22:44:01 sb06 kernel: Stack: Oct 1 22:44:01 sb06 kernel: 00000000000107c0 ffff880005cc3c00 0000000000000000 ffff88017ce8dbb8 Oct 1 22:44:01 sb06 kernel: Call Trace: Oct 1 22:44:01 sb06 kernel: [<ffffffff81114cc0>] ? xfs_lookup+0x47/0xa3 Oct 1 22:44:01 sb06 kernel: [<ffffffff8111c885>] ? xfs_vn_lookup+0x3c/0x7b Oct 1 22:44:01 sb06 kernel: [<ffffffff81083acb>] ? do_lookup+0x63/0x1b3 Oct 1 22:44:01 sb06 kernel: [<ffffffff8108ad14>] ? dput+0x23/0x13d Oct 1 22:44:01 sb06 kernel: [<ffffffff810860f7>] ? do_path_lookup+0x20/0x41 Oct 1 22:44:01 sb06 kernel: [<ffffffff81086c68>] ? do_filp_open+0xe3/0x92a Oct 1 22:44:01 sb06 kernel: [<ffffffff8107b24d>] ? do_sys_open+0x55/0x103 Oct 1 22:44:01 sb06 kernel: Code: 18 09 c2 0f b6 43 07 c1 e0 10 09 c2 0f b6 43 08 c1 e0 08 09 c2 48 09 d1 49 89 4c 24 28 41 c7 44 24 7c 01 00 00 00 e9 d2 00 00 00 <80> 7b 01 01 19 c0 45 31 ff 83 e0 fc 45 31 ed 83 c0 0a 48 98 48 Oct 1 22:44:01 sb06 kernel: RSP <ffff88017ce8db68> Oct 1 22:44:01 sb06 kernel: ---[ end trace 6e14835b29b5648a ]--- Oct 1 22:44:01 sb06 kernel: Filesystem "sdt": XFS internal error xfs_bmap_read_extents(1) at line 4648 of file fs/xfs/xfs_bmap.c. Caller 0xffffffff81101202 Oct 1 22:44:01 sb06 kernel: Pid: 6771, comm: diablo Not tainted 2.6.31.1xfspatch #4 Oct 1 22:44:01 sb06 kernel: [<ffffffff81101202>] ? xfs_iread_extents+0xac/0xc8 Oct 1 22:44:01 sb06 kernel: [<ffffffff81101202>] ? xfs_iread_extents+0xac/0xc8 Oct 1 22:44:01 sb06 kernel: [<ffffffff810fe917>] ? xfs_iext_bno_to_ext+0xba/0x140 Oct 1 22:44:01 sb06 kernel: [<ffffffffa0042973>] ? bnx2_start_xmit+0x19a/0x3db [bnx2] Oct 1 22:44:01 sb06 kernel: [<ffffffff81056104>] ? find_lock_page+0x15/0x50 Oct 1 22:44:01 sb06 kernel: [<ffffffff812409c8>] ? __down_write_nested+0x15/0x9d Oct 1 22:44:01 sb06 kernel: [<ffffffff81116bda>] ? kmem_zone_alloc+0x5e/0xa4 Oct 1 22:44:01 sb06 kernel: [<ffffffff8105689d>] ? find_or_create_page+0x2d/0x88 Oct 1 22:44:01 sb06 kernel: Filesystem "sdt": corrupt dinode 1208050920, (btree extents). Unmount and run xfs_repair. Oct 1 22:44:01 sb06 kernel: Oct 1 22:44:01 sb06 kernel: Call Trace: Oct 1 22:44:01 sb06 kernel: [<ffffffff810e69e4>] ? xfs_bmap_read_extents+0x274/0x30c Oct 1 22:44:01 sb06 kernel: [<ffffffff810e7f44>] ? xfs_bmapi+0x25d/0xea8 Oct 1 22:44:01 sb06 kernel: [<ffffffff8113de7c>] ? swiotlb_map_page+0x73/0xe1 Oct 1 22:44:01 sb06 kernel: [<ffffffff81055e8c>] ? find_get_page+0x1a/0x77 Oct 1 22:44:01 sb06 kernel: [<ffffffff8105689d>] ? find_or_create_page+0x2d/0x88 Oct 1 22:44:01 sb06 kernel: [<ffffffff81103e59>] ? xfs_iomap+0x145/0x284 Oct 1 22:44:01 sb06 kernel: [<ffffffff811174cb>] ? __xfs_get_blocks+0x6c/0x15c Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff810a0fd6>] ? mpage_readpages+0xbd/0xff Oct 1 22:44:01 sb06 kernel: [<ffffffff81102664>] ? xfs_iread+0x152/0x166 Oct 1 22:44:01 sb06 kernel: [<ffffffff8103e723>] ? bit_waitqueue+0x10/0x8b Oct 1 22:44:01 sb06 kernel: [<ffffffff8105cb1c>] ? __do_page_cache_readahead+0x125/0x1b1 Oct 1 22:44:01 sb06 kernel: [<ffffffff8105cdb6>] ? ondemand_readahead+0x11f/0x1a7 Oct 1 22:44:01 sb06 kernel: [<ffffffff810a0b84>] ? do_mpage_readpage+0x163/0x486 Oct 1 22:44:01 sb06 kernel: [<ffffffff81136dd2>] ? radix_tree_insert+0xd7/0x19f Oct 1 22:44:01 sb06 kernel: [<ffffffff8105626b>] ? add_to_page_cache_locked+0x72/0x98 Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff8111e0c8>] ? xfs_read+0x16e/0x1de Oct 1 22:44:01 sb06 kernel: [<ffffffff8107c889>] ? do_sync_read+0xce/0x113 Oct 1 22:44:01 sb06 kernel: [<ffffffff8107d3ec>] ? sys_read+0x45/0x6e Oct 1 22:44:01 sb06 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 Oct 1 22:44:01 sb06 kernel: IP: [<ffffffff810f9516>] xfs_dir2_sf_lookup+0xe3/0x219 Oct 1 22:44:01 sb06 kernel: Oops: 0000 [#1] SMP Oct 1 22:44:01 sb06 kernel: CPU 2 Oct 1 22:44:01 sb06 kernel: Pid: 6804, comm: diablo Not tainted 2.6.31.1xfspatch #4 PowerEdge 1950 Oct 1 22:44:01 sb06 kernel: RSP: 0018:ffff88017ce8db68 EFLAGS: 00010202 Oct 1 22:44:01 sb06 kernel: RAX: 0000000000000006 RBX: 0000000000000000 RCX: 00000000e62cdb77 Oct 1 22:44:01 sb06 kernel: RDX: 00000000e62cc212 RSI: 0000000000000002 RDI: ffff88017ce8dbb8 Oct 1 22:44:01 sb06 kernel: [<ffffffff811175cc>] ? xfs_get_blocks+0x0/0xe Oct 1 22:44:01 sb06 kernel: [<ffffffff8105ad0c>] ? __alloc_pages_nodemask+0xf8/0x524 Oct 1 22:44:01 sb06 kernel: FS: 0000000001369860(0063) GS:ffff880028066000(0000) knlGS:0000000000000000 Oct 1 22:44:01 sb06 kernel: [<ffffffff810574e4>] ? generic_file_aio_read+0x1ff/0x548 Oct 1 22:44:01 sb06 kernel: [<ffffffff8103e7ed>] ? autoremove_wake_function+0x0/0x2e Oct 1 22:44:01 sb06 kernel: [<ffffffff8107d294>] ? vfs_read+0xaa/0x146 Oct 1 22:44:01 sb06 kernel: [<ffffffff8100adab>] ? system_call_fastpath+0x16/0x1b Oct 1 22:44:01 sb06 kernel: PGD 17ce81067 PUD 17ce82067 PMD 0 Oct 1 22:44:01 sb06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Oct 1 22:44:01 sb06 kernel: Modules linked in: acpi_cpufreq cpufreq_ondemand ipmi_si ipmi_devintf ipmi_msghandler bonding serio_raw mptspi rng_core scsi_transport_spi bnx2 processor thermal 8250_pnp 8250 serial_core thermal_sys Oct 1 22:44:01 sb06 kernel: RIP: 0010:[<ffffffff810f9516>] [<ffffffff810f9516>] xfs_dir2_sf_lookup+0xe3/0x219 Oct 1 22:44:01 sb06 kernel: CR2: 0000000000000001 CR3: 000000017ce80000 CR4: 00000000000006a0 Oct 1 22:44:01 sb06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 1 22:44:01 sb06 kernel: RBP: 0000000000000000 R08: ffff880005cc3c00 R09: ffff88022d867080 Oct 1 22:44:01 sb06 kernel: R10: ffffffff813457b0 R11: ffff88017f661cd0 R12: ffff88017ce8dbb8 Oct 1 22:44:01 sb06 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff88017ce8dc98 Oct 1 22:44:01 sb06 kernel: Process diablo (pid: 6804, threadinfo ffff88017ce8c000, task ffff88022d867080) Oct 1 22:44:01 sb06 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 1 22:44:01 sb06 kernel: <0> 0000000000000000 0000000000000000 ffff88017ce8dc98 ffffffff810f2754 Oct 1 22:44:01 sb06 kernel: <0> ffff8800a540b900 ffff88022d8672f8 ffff880154034e20 0000000000000006 Oct 1 22:44:01 sb06 kernel: [<ffffffff810f2754>] ? xfs_dir_lookup+0xa5/0x147 Oct 1 22:44:01 sb06 kernel: [<ffffffff81083b3d>] ? do_lookup+0xd5/0x1b3 Oct 1 22:44:01 sb06 kernel: [<ffffffff810858f0>] ? __link_path_walk+0x966/0xe0d Oct 1 22:44:01 sb06 kernel: [<ffffffff8107db53>] ? get_empty_filp+0x70/0x119 Oct 1 22:44:01 sb06 kernel: [<ffffffff81085fc5>] ? path_walk+0x66/0xca Oct 1 22:44:01 sb06 kernel: [<ffffffff8108ee1c>] ? alloc_fd+0x67/0x10b Oct 1 22:44:01 sb06 kernel: [<ffffffff8100adab>] ? system_call_fastpath+0x16/0x1b Oct 1 22:44:01 sb06 kernel: RIP [<ffffffff810f9516>] xfs_dir2_sf_lookup+0xe3/0x219 Oct 1 22:44:01 sb06 kernel: CR2: 0000000000000001 Oct 1 22:44:01 sb06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 1 22:44:01 sb06 kernel: Stack: Oct 1 22:44:01 sb06 kernel: 00000000000107c0 ffff880005cc3c00 0000000000000000 ffff88017ce8dbb8 Oct 1 22:44:01 sb06 kernel: Call Trace: Oct 1 22:44:01 sb06 kernel: [<ffffffff81114cc0>] ? xfs_lookup+0x47/0xa3 Oct 1 22:44:01 sb06 kernel: [<ffffffff8111c885>] ? xfs_vn_lookup+0x3c/0x7b Oct 1 22:44:01 sb06 kernel: [<ffffffff81083acb>] ? do_lookup+0x63/0x1b3 Oct 1 22:44:01 sb06 kernel: [<ffffffff8108ad14>] ? dput+0x23/0x13d Oct 1 22:44:01 sb06 kernel: [<ffffffff810860f7>] ? do_path_lookup+0x20/0x41 Oct 1 22:44:01 sb06 kernel: [<ffffffff81086c68>] ? do_filp_open+0xe3/0x92a Oct 1 22:44:01 sb06 kernel: [<ffffffff8107b24d>] ? do_sys_open+0x55/0x103 Oct 1 22:44:01 sb06 kernel: Code: 18 09 c2 0f b6 43 07 c1 e0 10 09 c2 0f b6 43 08 c1 e0 08 09 c2 48 09 d1 49 89 4c 24 28 41 c7 44 24 7c 01 00 00 00 e9 d2 00 00 00 <80> 7b 01 01 19 c0 45 31 ff 83 e0 fc 45 31 ed 83 c0 0a 48 98 48 Oct 1 22:44:01 sb06 kernel: RSP <ffff88017ce8db68> Oct 1 22:44:01 sb06 kernel: ---[ end trace 6e14835b29b5648a ]--- Oct 1 22:45:04 sb06 kernel: ------------[ cut here ]------------ Oct 1 22:45:04 sb06 kernel: invalid opcode: 0000 [#2] SMP Oct 1 22:45:04 sb06 kernel: CPU 2 Oct 1 22:45:04 sb06 kernel: kernel BUG at fs/xfs/xfs_iget.c:334! Oct 1 22:45:04 sb06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Oct 1 22:45:04 sb06 kernel: Modules linked in: acpi_cpufreq cpufreq_ondemand ipmi_si ipmi_devintf ipmi_msghandler bonding serio_raw mptspi rng_core scsi_transport_spi bnx2 processor thermal 8250_pnp 8250 serial_core thermal_sys Oct 1 22:45:04 sb06 kernel: RIP: 0010:[<ffffffff810fe33f>] [<ffffffff810fe33f>] xfs_iget+0x2e3/0x424 Oct 1 22:45:04 sb06 kernel: RDX: ffff880119c19080 RSI: 0000000000000296 RDI: ffff880005cc3c8c Oct 1 22:45:04 sb06 kernel: R10: 0000000000000002 R11: 0001400100014004 R12: ffff88022d0c783c Oct 1 22:45:04 sb06 kernel: FS: 0000000001369860(0063) GS:ffff880028066000(0000) knlGS:0000000000000000 Oct 1 22:45:04 sb06 kernel: CR2: 00007faaff8f2000 CR3: 00000001f54b3000 CR4: 00000000000006a0 Oct 1 22:45:04 sb06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 1 22:45:04 sb06 kernel: Stack: Oct 1 22:45:04 sb06 kernel: <0> 000000000000dd70 00000000000001bb ffff8800642bdb70 0000000100000004 Oct 1 22:45:04 sb06 kernel: Pid: 17264, comm: diablo Tainted: G D 2.6.31.1xfspatch #4 PowerEdge 1950 Oct 1 22:45:04 sb06 kernel: RSP: 0018:ffff8800642bdab8 EFLAGS: 00010246 Oct 1 22:45:04 sb06 kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffffffff81102664 Oct 1 22:45:04 sb06 kernel: RBP: ffff880005cc3c00 R08: 0000000000000001 R09: ffff88022c415400 Oct 1 22:45:04 sb06 kernel: R13: ffff88022d0c7800 R14: 000000000000001b R15: 0000000000000001 Oct 1 22:45:04 sb06 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 22:45:04 sb06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 1 22:45:04 sb06 kernel: Process diablo (pid: 17264, threadinfo ffff8800642bc000, task ffff88010002ad00) Oct 1 22:45:04 sb06 kernel: ffff8800022623c0 000000000000dd70 00000001015315f8 000000000000dd70 Oct 1 22:45:04 sb06 kernel: <0> 00000000000001bb ffff880001e692c0 ffff88022c415400 000001bb2d3d5400 Oct 1 22:45:04 sb06 kernel: Call Trace: Oct 1 22:45:04 sb06 kernel: [<ffffffff811125f6>] ? xfs_trans_iget+0xa5/0xd3 Oct 1 22:45:04 sb06 kernel: [<ffffffff81100c9a>] ? xfs_ialloc+0xac/0x568 Oct 1 22:45:04 sb06 kernel: [<ffffffff81112eba>] ? xfs_dir_ialloc+0x84/0x2a2 Oct 1 22:45:04 sb06 kernel: [<ffffffff811111a4>] ? xfs_trans_reserve+0xda/0x1af Oct 1 22:45:04 sb06 kernel: [<ffffffff812409c8>] ? __down_write_nested+0x15/0x9d Oct 1 22:45:04 sb06 kernel: [<ffffffff81114aaf>] ? xfs_create+0x27e/0x448 Oct 1 22:45:04 sb06 kernel: [<ffffffff81114ccc>] ? xfs_lookup+0x53/0xa3 Oct 1 22:45:04 sb06 kernel: [<ffffffff8111ca06>] ? xfs_vn_mknod+0x9c/0xf2 Oct 1 22:45:04 sb06 kernel: [<ffffffff810844a3>] ? vfs_create+0x6e/0xb7 Oct 1 22:45:04 sb06 kernel: [<ffffffff81086e53>] ? do_filp_open+0x2ce/0x92a Oct 1 22:45:04 sb06 kernel: [<ffffffff8107b24d>] ? do_sys_open+0x55/0x103 Oct 1 22:45:04 sb06 kernel: [<ffffffff8100adab>] ? system_call_fastpath+0x16/0x1b Oct 1 22:45:04 sb06 kernel: Code: 00 00 bf d0 00 00 00 e8 7a 8b 03 00 85 c0 0f 85 cd 00 00 00 83 7c 24 38 00 74 14 8b 74 24 38 48 89 ef e8 ff f8 ff ff 85 c0 75 04 <0f> 0b eb fe 4c 89 e7 e8 92 28 14 00 44 88 f1 8b 74 24 5c b8 01 Oct 1 22:45:04 sb06 kernel: RIP [<ffffffff810fe33f>] xfs_iget+0x2e3/0x424 Oct 1 22:45:04 sb06 kernel: RSP <ffff8800642bdab8> Oct 1 22:45:04 sb06 kernel: ---[ end trace 6e14835b29b5648b ]--- Oct 1 22:45:04 sb06 kernel: ------------[ cut here ]------------ Oct 1 22:45:04 sb06 kernel: kernel BUG at fs/xfs/xfs_iget.c:334! Oct 1 22:45:04 sb06 kernel: invalid opcode: 0000 [#3] SMP Oct 1 22:45:04 sb06 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Oct 1 22:45:04 sb06 kernel: CPU 2 Oct 1 22:45:04 sb06 kernel: Modules linked in: acpi_cpufreq cpufreq_ondemand ipmi_si ipmi_devintf ipmi_msghandler bonding serio_raw mptspi rng_core scsi_transport_spi bnx2 processor thermal 8250_pnp 8250 serial_core thermal_sys Oct 1 22:45:04 sb06 kernel: Pid: 17326, comm: diablo Tainted: G D 2.6.31.1xfspatch #4 PowerEdge 1950 Oct 1 22:45:04 sb06 kernel: RIP: 0010:[<ffffffff810fe33f>] [<ffffffff810fe33f>] xfs_iget+0x2e3/0x424 Oct 1 22:45:04 sb06 kernel: RSP: 0018:ffff88000fa79ab8 EFLAGS: 00010246 Oct 1 22:45:04 sb06 kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffffffff81102664 Oct 1 22:45:04 sb06 kernel: RDX: ffff880119c18780 RSI: 0000000000000296 RDI: ffff880005cc3c8c Oct 1 22:45:04 sb06 kernel: RBP: ffff880005cc3c00 R08: 0000000000000001 R09: ffff88022f21dc00 Oct 1 22:45:04 sb06 kernel: R10: 0000000000000002 R11: 0001400100014004 R12: ffff88022ebc383c Oct 1 22:45:04 sb06 kernel: R13: ffff88022ebc3800 R14: 000000000000001b R15: 0000000000000001 Oct 1 22:45:04 sb06 kernel: FS: 0000000001369860(0063) GS:ffff880028066000(0000) knlGS:0000000000000000 Oct 1 22:45:04 sb06 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 22:45:04 sb06 kernel: CR2: 00007fffd3a2ce18 CR3: 0000000135a9d000 CR4: 00000000000006a0 Oct 1 22:45:04 sb06 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 1 22:45:04 sb06 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 1 22:45:04 sb06 kernel: Process diablo (pid: 17326, threadinfo ffff88000fa78000, task ffff8801000ccec0) Oct 1 22:45:04 sb06 kernel: Stack: Oct 1 22:45:04 sb06 kernel: ffff880101276180 000000000000dd70 0000000107e5b000 000000000000dd70 Oct 1 22:45:04 sb06 kernel: <0> 000000000000dd70 00000000000001a6 ffff88000fa79b70 0000000100000004 Oct 1 22:45:04 sb06 kernel: <0> 00000000000001a6 ffff8800c27d55e0 ffff88022f21dc00 000001a62fa9b400 Oct 1 22:45:04 sb06 kernel: Call Trace: Oct 1 22:45:04 sb06 kernel: [<ffffffff811125f6>] ? xfs_trans_iget+0xa5/0xd3 Oct 1 22:45:04 sb06 kernel: [<ffffffff81100c9a>] ? xfs_ialloc+0xac/0x568 Oct 1 22:45:04 sb06 kernel: [<ffffffff81112eba>] ? xfs_dir_ialloc+0x84/0x2a2 Oct 1 22:45:04 sb06 kernel: [<ffffffff811111a4>] ? xfs_trans_reserve+0xda/0x1af Oct 1 22:45:04 sb06 kernel: [<ffffffff812409c8>] ? __down_write_nested+0x15/0x9d Oct 1 22:45:04 sb06 kernel: [<ffffffff81114aaf>] ? xfs_create+0x27e/0x448 Oct 1 22:45:04 sb06 kernel: [<ffffffff81114ccc>] ? xfs_lookup+0x53/0xa3 Oct 1 22:45:04 sb06 kernel: [<ffffffff8111ca06>] ? xfs_vn_mknod+0x9c/0xf2 Oct 1 22:45:04 sb06 kernel: [<ffffffff810844a3>] ? vfs_create+0x6e/0xb7 Oct 1 22:45:04 sb06 kernel: [<ffffffff81086e53>] ? do_filp_open+0x2ce/0x92a Oct 1 22:45:04 sb06 kernel: [<ffffffff8107b24d>] ? do_sys_open+0x55/0x103 Oct 1 22:45:04 sb06 kernel: [<ffffffff8100adab>] ? system_call_fastpath+0x16/0x1b Oct 1 22:45:04 sb06 kernel: Code: 00 00 bf d0 00 00 00 e8 7a 8b 03 00 85 c0 0f 85 cd 00 00 00 83 7c 24 38 00 74 14 8b 74 24 38 48 89 ef e8 ff f8 ff ff 85 c0 75 04 <0f> 0b eb fe 4c 89 e7 e8 92 28 14 00 44 88 f1 8b 74 24 5c b8 01 Oct 1 22:45:04 sb06 kernel: RIP [<ffffffff810fe33f>] xfs_iget+0x2e3/0x424 Oct 1 22:45:04 sb06 kernel: RSP <ffff88000fa79ab8> Oct 1 22:45:04 sb06 kernel: ---[ end trace 6e14835b29b5648c ]--- _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Fri, Oct 02, 2009 at 04:24:39PM +0200, Bas Couwenberg wrote:
> Dear Christoph, > > Yesterday two of our servers (2.6.31.1 + your patch) crashed again, this > time we have a bigger console, but not the full backtrace unfortunately. > > I did manage to get some more calltrace info from the logs, which I have > attached together with the screenshots of the crashscreens. > > I hope this info helps you. It helps a bit, but not so much. I suspect it could be a double free of an inode, and I have identified a possible race window that could explain it. But all the traces are really weird and I think only show later symptoms of something that happened earlier. I'll come up with a patch for the race window ASAP, but could you in the meantime turn on CONFIG_XFS_DEBUG for the test kernel to see if it triggers somehwere and additionally apply the tiny patch below for additional debugging? Subject: xfs: check for not fully initialized inodes in xfs_ireclaim From: Christoph Hellwig <hch@...> Add an assert for inodes not added to the inode cache in xfs_ireclaim, to make sure we're not going to introduce something like the famous nfsd inode cache bug again. Signed-off-by: Christoph Hellwig <hch@...> Index: linux-2.6/fs/xfs/xfs_iget.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_iget.c 2009-08-10 11:30:55.729724742 -0300 +++ linux-2.6/fs/xfs/xfs_iget.c 2009-08-10 11:40:15.271748324 -0300 @@ -535,17 +535,21 @@ xfs_ireclaim( { struct xfs_mount *mp = ip->i_mount; struct xfs_perag *pag; + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); XFS_STATS_INC(xs_ig_reclaims); /* - * Remove the inode from the per-AG radix tree. It doesn't matter - * if it was never added to it because radix_tree_delete can deal - * with that case just fine. + * Remove the inode from the per-AG radix tree. + * + * Because radix_tree_delete won't complain even if the item was never + * added to the tree assert that it's been there before to catch + * problems with the inode life time early on. */ pag = xfs_get_perag(mp, ip->i_ino); write_lock(&pag->pag_ici_lock); - radix_tree_delete(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino)); + ASSERT(radix_tree_lookup(&pag->pag_ici_root, agino)); + radix_tree_delete(&pag->pag_ici_root, agino); write_unlock(&pag->pag_ici_lock); xfs_put_perag(mp, pag); _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimChristoph Hellwig wrote:
> It helps a bit, but not so much. I suspect it could be a double free > of an inode, and I have identified a possible race window that could > explain it. But all the traces are really weird and I think only show > later symptoms of something that happened earlier. I'll come up with > a patch for the race window ASAP, but could you in the meantime turn on > CONFIG_XFS_DEBUG for the test kernel to see if it triggers somehwere > and additionally apply the tiny patch below for additional debugging? Will try this. Could this by any change be releated (from 2.6.32.2)? commit 2f0ffb7ef75a9ad6140899f6d4df45e8a73a013e Author: Jan Kara <jack@...> Date: Mon Sep 21 17:01:06 2009 -0700 fs: make sure data stored into inode is properly seen before unlocking new inode commit 580be0837a7a59b207c3d5c661d044d8dd0a6a30 upstream. In theory it could happen that on one CPU we initialize a new inode but clearing of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on another CPU we return not fully uptodate inode from iget_locked(). This seems to fix a corruption issue on ext3 mounted over NFS. Thanks, Patrick Schreurs _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Tue, Oct 06, 2009 at 11:04:13AM +0200, Patrick Schreurs wrote:
> Christoph Hellwig wrote: >> It helps a bit, but not so much. I suspect it could be a double free >> of an inode, and I have identified a possible race window that could >> explain it. But all the traces are really weird and I think only show >> later symptoms of something that happened earlier. I'll come up with >> a patch for the race window ASAP, but could you in the meantime turn on >> CONFIG_XFS_DEBUG for the test kernel to see if it triggers somehwere >> and additionally apply the tiny patch below for additional debugging? > > Will try this. > > Could this by any change be releated (from 2.6.32.2)? I doubt it, but it's losely in the same area. _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimAttached is a screendump from 2.6.32.2 with your patches (including last
one) applied, but without XFS_DEBUG. We will turn on XFS_DEBUG and see if that helps. Patrick Schreurs News-Service.com _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimHello Christoph,
Attached you'll find a screenshot from a 2.6.31.3 server, which includes your patches and has XFS_DEBUG turned on. I truly hope this is useful to you. Thanks again, -Patrick _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Sun, Oct 11, 2009 at 09:43:09AM +0200, Patrick Schreurs wrote:
> Hello Christoph, > > Attached you'll find a screenshot from a 2.6.31.3 server, which includes > your patches and has XFS_DEBUG turned on. I truly hope this is useful to > you. This is very helpful as the assertation that I put gets hit. Thanks a lot Patrick, I'll have another patch for you real soon. _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Sun, Oct 11, 2009 at 09:43:09AM +0200, Patrick Schreurs wrote:
> Hello Christoph, > > Attached you'll find a screenshot from a 2.6.31.3 server, which includes > your patches and has XFS_DEBUG turned on. I truly hope this is useful to > you. Thanks. The patch below should fix the inode reclaim race that could lead to the double free you're seeing. To be applied ontop of all the other patches I sent you. Index: xfs/fs/xfs/linux-2.6/xfs_sync.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2009-10-11 19:09:43.828254119 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2009-10-12 13:48:14.886006087 +0200 @@ -670,22 +670,22 @@ xfs_reclaim_inode( { xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino); - /* The hash lock here protects a thread in xfs_iget_core from - * racing with us on linking the inode back with a vnode. - * Once we have the XFS_IRECLAIM flag set it will not touch - * us. + /* + * The hash lock here protects a thread in xfs_iget from racing with + * us on recycling the inode. Once we have the XFS_IRECLAIM flag set + * it will not touch it. */ - write_lock(&pag->pag_ici_lock); spin_lock(&ip->i_flags_lock); - if (__xfs_iflags_test(ip, XFS_IRECLAIM) || - !__xfs_iflags_test(ip, XFS_IRECLAIMABLE)) { + ASSERT_ALWAYS(__xfs_iflags_test(ip, XFS_IRECLAIMABLE)); + if (__xfs_iflags_test(ip, XFS_IRECLAIM)) { spin_unlock(&ip->i_flags_lock); write_unlock(&pag->pag_ici_lock); - return -EAGAIN; + return 0; } __xfs_iflags_set(ip, XFS_IRECLAIM); spin_unlock(&ip->i_flags_lock); write_unlock(&pag->pag_ici_lock); + xfs_put_perag(ip->i_mount, pag); /* @@ -758,27 +758,107 @@ __xfs_inode_clear_reclaim_tag( XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG); } -STATIC int -xfs_reclaim_inode_now( - struct xfs_inode *ip, +STATIC xfs_inode_t * +xfs_reclaim_ag_lookup( + struct xfs_mount *mp, struct xfs_perag *pag, + uint32_t *first_index) +{ + int nr_found; + struct xfs_inode *ip; + + /* + * use a gang lookup to find the next inode in the tree + * as the tree is sparse and a gang lookup walks to find + * the number of objects requested. + */ + write_lock(&pag->pag_ici_lock); + nr_found = radix_tree_gang_lookup_tag(&pag->pag_ici_root, + (void **)&ip, *first_index, 1, XFS_ICI_RECLAIM_TAG); + if (!nr_found) + goto unlock; + + /* + * Update the index for the next lookup. Catch overflows + * into the next AG range which can occur if we have inodes + * in the last block of the AG and we are currently + * pointing to the last inode. + */ + *first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1); + if (*first_index < XFS_INO_TO_AGINO(mp, ip->i_ino)) + goto unlock; + + return ip; + +unlock: + write_unlock(&pag->pag_ici_lock); + return NULL; +} + +STATIC int +xfs_reclaim_ag_walk( + struct xfs_mount *mp, + xfs_agnumber_t ag, int flags) { - /* ignore if already under reclaim */ - if (xfs_iflags_test(ip, XFS_IRECLAIM)) { - read_unlock(&pag->pag_ici_lock); - return 0; + struct xfs_perag *pag = &mp->m_perag[ag]; + uint32_t first_index; + int last_error = 0; + int skipped; + +restart: + skipped = 0; + first_index = 0; + do { + int error = 0; + xfs_inode_t *ip; + + ip = xfs_reclaim_ag_lookup(mp, pag, &first_index); + if (!ip) + break; + + error = xfs_reclaim_inode(ip, flags); + if (error == EAGAIN) { + skipped++; + continue; + } + if (error) + last_error = error; + /* + * bail out if the filesystem is corrupted. + */ + if (error == EFSCORRUPTED) + break; + + } while (1); + + if (skipped) { + delay(1); + goto restart; } - read_unlock(&pag->pag_ici_lock); - return xfs_reclaim_inode(ip, flags); + xfs_put_perag(mp, pag); + return last_error; } int xfs_reclaim_inodes( - xfs_mount_t *mp, - int mode) + xfs_mount_t *mp, + int mode) { - return xfs_inode_ag_iterator(mp, xfs_reclaim_inode_now, mode, - XFS_ICI_RECLAIM_TAG); + int error = 0; + int last_error = 0; + xfs_agnumber_t ag; + + for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) { + if (!mp->m_perag[ag].pag_ici_init) + continue; + error = xfs_reclaim_ag_walk(mp, ag, mode); + if (error) { + last_error = error; + if (error == EFSCORRUPTED) + break; + } + } + return XFS_ERROR(last_error); } _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Tue, Oct 13, 2009 at 1:38 AM, Christoph Hellwig <hch@...> wrote:
> On Sun, Oct 11, 2009 at 09:43:09AM +0200, Patrick Schreurs wrote: >> Hello Christoph, >> >> Attached you'll find a screenshot from a 2.6.31.3 server, which includes >> your patches and has XFS_DEBUG turned on. I truly hope this is useful to >> you. > > Thanks. The patch below should fix the inode reclaim race that could > lead to the double free you're seeing. To be applied ontop of all > the other patches I sent you. Here are 2 more crashes with this patch applied, both having xfs_debug on and showing different traces (not inode reclaim related?). Hope it's usefull. Cheers, Tommy _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Thu, Oct 15, 2009 at 05:06:57PM +0200, Tommy van Leeuwen wrote:
> > Thanks. ?The patch below should fix the inode reclaim race that could > > lead to the double free you're seeing. ?To be applied ontop of all > > the other patches I sent you. > > Hi Christoph, > > Here are 2 more crashes with this patch applied, both having xfs_debug > on and showing different traces (not inode reclaim related?). Hope > it's usefull. Can't make too much sense of it, but the dir2 is something you reported earlier already. We must be stomping over inodes somewhere, but I'm not too sure where exactly. Can you try throwing the patch below ontop of your stack? It fixes an area where we could theoretically corrupt inode state. Index: xfs/fs/xfs/linux-2.6/xfs_sync.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2009-10-16 22:54:41.513254291 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2009-10-16 22:57:10.451256293 +0200 @@ -180,6 +180,11 @@ xfs_sync_inode_valid( return EFSCORRUPTED; } + if (xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM)) { + read_unlock(&pag->pag_ici_lock); + return ENOENT; + } + /* * If we can't get a reference on the inode, it must be in reclaim. * Leave it for the reclaim code to flush. Also avoid inodes that @@ -191,7 +196,7 @@ xfs_sync_inode_valid( } read_unlock(&pag->pag_ici_lock); - if (is_bad_inode(inode) || xfs_iflags_test(ip, XFS_INEW)) { + if (is_bad_inode(inode)) { IRELE(ip); return ENOENT; } _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Mon, Oct 12, 2009 at 07:38:54PM -0400, Christoph Hellwig wrote:
> On Sun, Oct 11, 2009 at 09:43:09AM +0200, Patrick Schreurs wrote: > > Hello Christoph, > > > > Attached you'll find a screenshot from a 2.6.31.3 server, which includes > > your patches and has XFS_DEBUG turned on. I truly hope this is useful to > > you. > > Thanks. The patch below should fix the inode reclaim race that could > lead to the double free you're seeing. To be applied ontop of all > the other patches I sent you. > > Index: xfs/fs/xfs/linux-2.6/xfs_sync.c > =================================================================== > --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2009-10-11 19:09:43.828254119 +0200 > +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2009-10-12 13:48:14.886006087 +0200 > @@ -670,22 +670,22 @@ xfs_reclaim_inode( > { > xfs_perag_t *pag = xfs_get_perag(ip->i_mount, ip->i_ino); > > - /* The hash lock here protects a thread in xfs_iget_core from > - * racing with us on linking the inode back with a vnode. > - * Once we have the XFS_IRECLAIM flag set it will not touch > - * us. > + /* > + * The hash lock here protects a thread in xfs_iget from racing with > + * us on recycling the inode. Once we have the XFS_IRECLAIM flag set > + * it will not touch it. > */ > - write_lock(&pag->pag_ici_lock); Did you mean to remove this write_lock? The patch does not remove the unlocks.... Cheers, Dave. -- Dave Chinner david@... _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Sun, Oct 18, 2009 at 07:59:10PM -0400, Christoph Hellwig wrote:
> On Thu, Oct 15, 2009 at 05:06:57PM +0200, Tommy van Leeuwen wrote: > > > Thanks. ?The patch below should fix the inode reclaim race that could > > > lead to the double free you're seeing. ?To be applied ontop of all > > > the other patches I sent you. > > > > Hi Christoph, > > > > Here are 2 more crashes with this patch applied, both having xfs_debug > > on and showing different traces (not inode reclaim related?). Hope > > it's usefull. > > Can't make too much sense of it, but the dir2 is something you reported > earlier already. We must be stomping over inodes somewhere, but I'm > not too sure where exactly. Can you try throwing the patch below ontop > of your stack? It fixes an area where we could theoretically corrupt > inode state. > > Index: xfs/fs/xfs/linux-2.6/xfs_sync.c > =================================================================== > --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2009-10-16 22:54:41.513254291 +0200 > +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2009-10-16 22:57:10.451256293 +0200 > @@ -180,6 +180,11 @@ xfs_sync_inode_valid( > return EFSCORRUPTED; > } > > + if (xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM)) { > + read_unlock(&pag->pag_ici_lock); > + return ENOENT; > + } This needs an IRELE(ip) here, doesn't it? Cheers, Dave. -- Dave Chinner david@... _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Mon, Oct 19, 2009 at 12:17:10PM +1100, Dave Chinner wrote:
> > > > + if (xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM)) { > > + read_unlock(&pag->pag_ici_lock); > > + return ENOENT; > > + } > > This needs an IRELE(ip) here, doesn't it? No, the check is before the igrab now. That was kinda the point as I suspect that the igrab might be corrupting state of a reclaimable or in reclaim inode. _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
|
|
Re: 2.6.31 xfs_fs_destroy_inode: cannot reclaimOn Mon, Oct 19, 2009 at 12:16:00PM +1100, Dave Chinner wrote:
> > + * The hash lock here protects a thread in xfs_iget from racing with > > + * us on recycling the inode. Once we have the XFS_IRECLAIM flag set > > + * it will not touch it. > > */ > > - write_lock(&pag->pag_ici_lock); > > Did you mean to remove this write_lock? The patch does not remove > the unlocks.... It's taken by the caller. _______________________________________________ xfs mailing list xfs@... http://oss.sgi.com/mailman/listinfo/xfs |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |