|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
[PATCH 0/4] Fix fsync on ext3 and ext4 (v2)Hi, this is a second try for a patchset which makes ext3 and ext4 properly force a transaction commit when needed. We now do not rely on buffer dirty bits (which does not work as pdflush can just clear them without forcing a transaction commit) but rather keep transaction ids that need to be committed for each inode. Since last version, I've implemented Aneesh's and Curt's comments and also fixed a missing initialization of the fields. I've tested that now the patch works correctly for uninitialized extents as well as for standard writes. If noone objects, would you merge the ext4 part Ted? I'll take care of the ext3 patch. Honza -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
[PATCH 1/4] ext3: Wait for proper transaction commit on fsyncWe cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@...> --- fs/ext3/fsync.c | 36 ++++++++++++++++-------------------- fs/ext3/inode.c | 32 +++++++++++++++++++++++++++++++- fs/ext3/super.c | 2 ++ include/linux/ext3_fs_i.h | 8 ++++++++ 4 files changed, 57 insertions(+), 21 deletions(-) diff --git a/fs/ext3/fsync.c b/fs/ext3/fsync.c index 451d166..8209f26 100644 --- a/fs/ext3/fsync.c +++ b/fs/ext3/fsync.c @@ -46,19 +46,21 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; + struct ext3_inode_info *ei = EXT3_I(inode); + journal_t *journal = EXT3_SB(inode->i_sb)->s_journal; int ret = 0; + tid_t commit_tid; + + if (inode->i_sb->s_flags & MS_RDONLY) + return 0; J_ASSERT(ext3_journal_current_handle() == NULL); /* - * data=writeback: + * data=writeback,ordered: * The caller's filemap_fdatawrite()/wait will sync the data. - * sync_inode() will sync the metadata - * - * data=ordered: - * The caller's filemap_fdatawrite() will write the data and - * sync_inode() will write the inode if it is dirty. Then the caller's - * filemap_fdatawait() will wait on the pages. + * Metadata is in the journal, we wait for a proper transaction + * to commit here. * * data=journal: * filemap_fdatawrite won't do anything (the buffers are clean). @@ -73,22 +75,16 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync) goto out; } - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) - goto flush; + if (datasync) + commit_tid = atomic_read(&ei->i_datasync_tid); + else + commit_tid = atomic_read(&ei->i_sync_tid); - /* - * The VFS has written the file data. If the inode is unaltered - * then we need not start a commit. - */ - if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) { - struct writeback_control wbc = { - .sync_mode = WB_SYNC_ALL, - .nr_to_write = 0, /* sys_fsync did this */ - }; - ret = sync_inode(inode, &wbc); + if (log_start_commit(journal, commit_tid)) { + log_wait_commit(journal, commit_tid); goto out; } -flush: + /* * In case we didn't commit a transaction, we have to flush * disk caches manually so that data really is on persistent diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index acf1b14..74b6146 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -699,8 +699,9 @@ static int ext3_splice_branch(handle_t *handle, struct inode *inode, int err = 0; struct ext3_block_alloc_info *block_i; ext3_fsblk_t current_block; + struct ext3_inode_info *ei = EXT3_I(inode); - block_i = EXT3_I(inode)->i_block_alloc_info; + block_i = ei->i_block_alloc_info; /* * If we're splicing into a [td]indirect block (as opposed to the * inode) then we need to get write access to the [td]indirect block @@ -741,6 +742,8 @@ static int ext3_splice_branch(handle_t *handle, struct inode *inode, inode->i_ctime = CURRENT_TIME_SEC; ext3_mark_inode_dirty(handle, inode); + /* ext3_mark_inode_dirty already updated i_sync_tid */ + atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid); /* had we spliced it onto indirect block? */ if (where->bh) { @@ -2750,6 +2753,8 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino) struct ext3_inode_info *ei; struct buffer_head *bh; struct inode *inode; + journal_t *journal = EXT3_SB(sb)->s_journal; + transaction_t *transaction; long ret; int block; @@ -2827,6 +2832,30 @@ struct inode *ext3_iget(struct super_block *sb, unsigned long ino) ei->i_data[block] = raw_inode->i_block[block]; INIT_LIST_HEAD(&ei->i_orphan); + /* + * Set transaction id's of transactions that have to be committed + * to finish f[data]sync. We set them to currently running transaction + * as we cannot be sure that the inode or some of its metadata isn't + * part of the transaction - the inode could have been reclaimed and + * now it is reread from disk. + */ + if (journal) { + tid_t tid; + + spin_lock(&journal->j_state_lock); + if (journal->j_running_transaction) + transaction = journal->j_running_transaction; + else + transaction = journal->j_committing_transaction; + if (transaction) + tid = transaction->t_tid; + else + tid = journal->j_commit_sequence; + spin_unlock(&journal->j_state_lock); + atomic_set(&ei->i_sync_tid, tid); + atomic_set(&ei->i_datasync_tid, tid); + } + if (inode->i_ino >= EXT3_FIRST_INO(inode->i_sb) + 1 && EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) { /* @@ -3011,6 +3040,7 @@ again: err = rc; ei->i_state &= ~EXT3_STATE_NEW; + atomic_set(&ei->i_sync_tid, handle->h_transaction->t_tid); out_brelse: brelse (bh); ext3_std_error(inode->i_sb, err); diff --git a/fs/ext3/super.c b/fs/ext3/super.c index 7a520a8..427496c 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -466,6 +466,8 @@ static struct inode *ext3_alloc_inode(struct super_block *sb) return NULL; ei->i_block_alloc_info = NULL; ei->vfs_inode.i_version = 1; + atomic_set(&ei->i_datasync_tid, 0); + atomic_set(&ei->i_sync_tid, 0); return &ei->vfs_inode; } diff --git a/include/linux/ext3_fs_i.h b/include/linux/ext3_fs_i.h index ca1bfe9..93e7428 100644 --- a/include/linux/ext3_fs_i.h +++ b/include/linux/ext3_fs_i.h @@ -137,6 +137,14 @@ struct ext3_inode_info { * by other means, so we have truncate_mutex. */ struct mutex truncate_mutex; + + /* + * Transactions that contain inode's metadata needed to complete + * fsync and fdatasync, respectively. + */ + atomic_t i_sync_tid; + atomic_t i_datasync_tid; + struct inode vfs_inode; }; -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
[PATCH 2/4] ext4: Avoid issuing barriers on error recovery pathThere is no reason why we should issue IO barriers when we
just hit an error in ext4_sync_file. So move out: label to just return and introduce flush: label to those places which really need it. Signed-off-by: Jan Kara <jack@...> --- fs/ext4/fsync.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 2b15312..2bf9413 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -88,7 +88,7 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) ret = sync_mapping_buffers(inode->i_mapping); if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) - goto out; + goto flush; /* * The VFS has written the file data. If the inode is unaltered @@ -103,8 +103,9 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) if (ret == 0) ret = err; } -out: +flush: if (journal && (journal->j_flags & JBD2_BARRIER)) blkdev_issue_flush(inode->i_sb->s_bdev, NULL); +out: return ret; } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
[PATCH 3/4] ext4: Fix error handling ext4_ind_get_blocksWhen an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem by checking for error properly. Signed-off-by: Jan Kara <jack@...> --- fs/ext4/inode.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5c5bc5d..9105f40 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1020,7 +1020,7 @@ static int ext4_ind_get_blocks(handle_t *handle, struct inode *inode, if (!err) err = ext4_splice_branch(handle, inode, iblock, partial, indirect_blks, count); - else + if (err) goto cleanup; set_buffer_new(bh_result); -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
[PATCH 4/4] ext4: Wait for proper transaction commit on fsyncWe cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@...> --- fs/ext4/ext4.h | 7 +++++++ fs/ext4/ext4_jbd2.h | 14 ++++++++++++++ fs/ext4/extents.c | 14 ++++++++++++-- fs/ext4/fsync.c | 40 +++++++++++++++++----------------------- fs/ext4/inode.c | 29 +++++++++++++++++++++++++++++ fs/ext4/super.c | 2 ++ 6 files changed, 81 insertions(+), 25 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 984ca0c..5639f30 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -702,6 +702,13 @@ struct ext4_inode_info { struct list_head i_aio_dio_complete_list; /* current io_end structure for async DIO write*/ ext4_io_end_t *cur_aio_dio; + + /* + * Transactions that contain inode's metadata needed to complete + * fsync and fdatasync, respectively. + */ + atomic_t i_sync_tid; + atomic_t i_datasync_tid; }; /* diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index a286598..4389941 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -254,6 +254,20 @@ static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode) return 0; } +static inline void ext4_update_inode_fsync_trans(handle_t *handle, + struct inode *inode, + int datasync) +{ + if (ext4_handle_valid(handle)) { + atomic_set(&EXT4_I(inode)->i_sync_tid, + handle->h_transaction->t_tid); + if (datasync) { + atomic_set(&EXT4_I(inode)->i_datasync_tid, + handle->h_transaction->t_tid); + } + } +} + /* super.c */ int ext4_force_commit(struct super_block *sb); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 10539e3..d3ba4a9 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3057,6 +3057,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) { ret = ext4_convert_unwritten_extents_dio(handle, inode, path); + if (ret >= 0) + ext4_update_inode_fsync_trans(handle, inode, 1); goto out2; } /* buffered IO case */ @@ -3084,6 +3086,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, ret = ext4_ext_convert_to_initialized(handle, inode, path, iblock, max_blocks); + if (ret >= 0) + ext4_update_inode_fsync_trans(handle, inode, 1); out: if (ret <= 0) { err = ret; @@ -3316,10 +3320,16 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, allocated = ext4_ext_get_actual_len(&newex); set_buffer_new(bh_result); - /* Cache only when it is _not_ an uninitialized extent */ - if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) + /* + * Cache the extent and update transaction to commit on fdatasync only + * when it is _not_ an uninitialized extent. + */ + if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) { ext4_ext_put_in_cache(inode, iblock, allocated, newblock, EXT4_EXT_CACHE_EXTENT); + ext4_update_inode_fsync_trans(handle, inode, 1); + } else + ext4_update_inode_fsync_trans(handle, inode, 0); out: if (allocated > max_blocks) allocated = max_blocks; diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 2bf9413..03e0fe9 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -51,25 +51,26 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; + struct ext4_inode_info *ei = EXT4_I(inode); journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; - int err, ret = 0; + int ret = 0; + tid_t commit_tid; J_ASSERT(ext4_journal_current_handle() == NULL); trace_ext4_sync_file(file, dentry, datasync); + if (inode->i_sb->s_flags & MS_RDONLY) + goto out; + ret = flush_aio_dio_completed_IO(inode); if (ret < 0) goto out; /* - * data=writeback: + * data=writeback,ordered: * The caller's filemap_fdatawrite()/wait will sync the data. - * sync_inode() will sync the metadata - * - * data=ordered: - * The caller's filemap_fdatawrite() will write the data and - * sync_inode() will write the inode if it is dirty. Then the caller's - * filemap_fdatawait() will wait on the pages. + * Metadata is in the journal, we wait for proper transaction to + * commit here. * * data=journal: * filemap_fdatawrite won't do anything (the buffers are clean). @@ -87,23 +88,16 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) if (!journal) ret = sync_mapping_buffers(inode->i_mapping); - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) - goto flush; + if (datasync) + commit_tid = atomic_read(&ei->i_datasync_tid); + else + commit_tid = atomic_read(&ei->i_sync_tid); - /* - * The VFS has written the file data. If the inode is unaltered - * then we need not start a commit. - */ - if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) { - struct writeback_control wbc = { - .sync_mode = WB_SYNC_ALL, - .nr_to_write = 0, /* sys_fsync did this */ - }; - err = sync_inode(inode, &wbc); - if (ret == 0) - ret = err; + if (jbd2_log_start_commit(journal, commit_tid)) { + jbd2_log_wait_commit(journal, commit_tid); + goto out; } -flush: + if (journal && (journal->j_flags & JBD2_BARRIER)) blkdev_issue_flush(inode->i_sb->s_bdev, NULL); out: diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9105f40..546f175 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1024,6 +1024,8 @@ static int ext4_ind_get_blocks(handle_t *handle, struct inode *inode, goto cleanup; set_buffer_new(bh_result); + + ext4_update_inode_fsync_trans(handle, inode, 1); got_it: map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key)); if (count > blocks_to_boundary) @@ -4777,6 +4779,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) struct ext4_inode_info *ei; struct buffer_head *bh; struct inode *inode; + journal_t *journal = EXT4_SB(sb)->s_journal; long ret; int block; @@ -4842,6 +4845,31 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) ei->i_data[block] = raw_inode->i_block[block]; INIT_LIST_HEAD(&ei->i_orphan); + /* + * Set transaction id's of transactions that have to be committed + * to finish f[data]sync. We set them to currently running transaction + * as we cannot be sure that the inode or some of its metadata isn't + * part of the transaction - the inode could have been reclaimed and + * now it is reread from disk. + */ + if (journal) { + transaction_t *transaction; + tid_t tid; + + spin_lock(&journal->j_state_lock); + if (journal->j_running_transaction) + transaction = journal->j_running_transaction; + else + transaction = journal->j_committing_transaction; + if (transaction) + tid = transaction->t_tid; + else + tid = journal->j_commit_sequence; + spin_unlock(&journal->j_state_lock); + atomic_set(&ei->i_sync_tid, tid); + atomic_set(&ei->i_datasync_tid, tid); + } + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize); if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize > @@ -5102,6 +5130,7 @@ static int ext4_do_update_inode(handle_t *handle, err = rc; ei->i_state &= ~EXT4_STATE_NEW; + ext4_update_inode_fsync_trans(handle, inode, 0); out_brelse: brelse(bh); ext4_std_error(inode->i_sb, err); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 312211e..9a6716f 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -704,6 +704,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) spin_lock_init(&(ei->i_block_reservation_lock)); INIT_LIST_HEAD(&ei->i_aio_dio_complete_list); ei->cur_aio_dio = NULL; + atomic_set(&ei->i_datasync_tid, 0); + atomic_set(&ei->i_sync_tid, 0); return &ei->vfs_inode; } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncJan Kara <jack@...> writes:
> ext4_io_end_t *cur_aio_dio; > + > + /* > + * Transactions that contain inode's metadata needed to complete > + * fsync and fdatasync, respectively. > + */ > + atomic_t i_sync_tid; > + atomic_t i_datasync_tid; This might be a stupid question, but the atomic implies you don't hold any kind of reference to the transaction. So what prevents these IDs from wrapping while in there? Given it would be probably take a long time today, but might be not fully future proof to very fast future IO devices. -Andi -- ak@... -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncOn Wed 04-11-09 15:57:22, Andi Kleen wrote:
> Jan Kara <jack@...> writes: > > ext4_io_end_t *cur_aio_dio; > > + > > + /* > > + * Transactions that contain inode's metadata needed to complete > > + * fsync and fdatasync, respectively. > > + */ > > + atomic_t i_sync_tid; > > + atomic_t i_datasync_tid; > > This might be a stupid question, but the atomic implies you don't hold > any kind of reference to the transaction. So what prevents these IDs > from wrapping while in there? Given it would be probably take a long > time today, but might be not fully future proof to very fast future IO > devices. wrapping does not matter unless you manage to do 2^31 transactions inbetween. So we are still a few orders of magnitude safe with current hw. Honza -- Jan Kara <jack@...> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 0/4] Fix fsync on ext3 and ext4 (v2) Hi,
> this is a second try for a patchset which makes ext3 and ext4 properly force > a transaction commit when needed. We now do not rely on buffer dirty bits > (which does not work as pdflush can just clear them without forcing a > transaction commit) but rather keep transaction ids that need to be committed > for each inode. > Since last version, I've implemented Aneesh's and Curt's comments and also > fixed a missing initialization of the fields. I've tested that now the patch > works correctly for uninitialized extents as well as for standard writes. If > noone objects, would you merge the ext4 part Ted? I'll take care of the ext3 > patch. Honza -- Jan Kara <jack@...> SuSE CR Labs -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 0/4] Fix fsync on ext3 and ext4 (v2)On Thu, Nov 05, 2009 at 02:02:50PM +0100, Jan Kara wrote:
> Hi, > > > this is a second try for a patchset which makes ext3 and ext4 properly force > > a transaction commit when needed. We now do not rely on buffer dirty bits > > (which does not work as pdflush can just clear them without forcing a > > transaction commit) but rather keep transaction ids that need to be committed > > for each inode. > > Since last version, I've implemented Aneesh's and Curt's comments and also > > fixed a missing initialization of the fields. I've tested that now the patch > > works correctly for uninitialized extents as well as for standard writes. If > > noone objects, would you merge the ext4 part Ted? I'll take care of the ext3 > > patch. > Aneesh, Ted, is the second version of the patchset fine with you? The patches looks good. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@...> -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 0/4] Fix fsync on ext3 and ext4 (v2)On Thu 05-11-09 22:05:10, Aneesh Kumar K.V wrote:
> On Thu, Nov 05, 2009 at 02:02:50PM +0100, Jan Kara wrote: > > Hi, > > > > > this is a second try for a patchset which makes ext3 and ext4 properly force > > > a transaction commit when needed. We now do not rely on buffer dirty bits > > > (which does not work as pdflush can just clear them without forcing a > > > transaction commit) but rather keep transaction ids that need to be committed > > > for each inode. > > > Since last version, I've implemented Aneesh's and Curt's comments and also > > > fixed a missing initialization of the fields. I've tested that now the patch > > > works correctly for uninitialized extents as well as for standard writes. If > > > noone objects, would you merge the ext4 part Ted? I'll take care of the ext3 > > > patch. > > Aneesh, Ted, is the second version of the patchset fine with you? > > The patches looks good. > > Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@...> Linus soon. Ted, will you take care of ext4 changes? Thanks. Honza -- Jan Kara <jack@...> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncOn Tue, Oct 27, 2009 at 01:48:49PM +0100, Jan Kara wrote:
> + /* > + * Transactions that contain inode's metadata needed to complete > + * fsync and fdatasync, respectively. > + */ > + atomic_t i_sync_tid; > + atomic_t i_datasync_tid; Why do we need an atomic_t here at all? It's not clear it's necessary. What specific race are you worried about? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 2/4] ext4: Avoid issuing barriers on error recovery pathOn Tue, Oct 27, 2009 at 01:48:47PM +0100, Jan Kara wrote:
> There is no reason why we should issue IO barriers when we > just hit an error in ext4_sync_file. So move out: label to > just return and introduce flush: label to those places which > really need it. > > Signed-off-by: Jan Kara <jack@...> Once the out: label is moved to just before the "return ret;", it's actually cleaner to simply replace all of the "goto out" with "return ret;" statements..... - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
[PATCH] ext4: Avoid issuing unnecessary barriersWe don't to issue an I/O barrier on an error or if we force commit
because we are doing data journaling. Signed-off-by: "Theodore Ts'o" <tytso@...> Cc: Jan Kara <jack@...> --- This patch should be equivalent to Jan's "ext4: Avoid issuing barriers on error recovery path", but it removes more lines than it adds. :-) fs/ext4/fsync.c | 8 +++----- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 2b15312..a3c2507 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -60,7 +60,7 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) ret = flush_aio_dio_completed_IO(inode); if (ret < 0) - goto out; + return ret; /* * data=writeback: * The caller's filemap_fdatawrite()/wait will sync the data. @@ -79,10 +79,8 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) * (they were dirtied by commit). But that's OK - the blocks are * safe in-journal, which is all fsync() needs to ensure. */ - if (ext4_should_journal_data(inode)) { - ret = ext4_force_commit(inode->i_sb); - goto out; - } + if (ext4_should_journal_data(inode)) + return ext4_force_commit(inode->i_sb); if (!journal) ret = sync_mapping_buffers(inode->i_mapping); -- 1.6.5.216.g5288a.dirty -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 3/4] ext4: Fix error handling ext4_ind_get_blocksOn Tue, Oct 27, 2009 at 01:48:48PM +0100, Jan Kara wrote:
> When an error happened in ext4_splice_branch we failed to notice that > in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem > by checking for error properly. > > Signed-off-by: Jan Kara <jack@...> Added to the ext4 patch queue, thanks. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH] ext4: Avoid issuing unnecessary barriersOn Sun 15-11-09 21:40:42, Theodore Ts'o wrote:
> We don't to issue an I/O barrier on an error or if we force commit > because we are doing data journaling. > > Signed-off-by: "Theodore Ts'o" <tytso@...> > Cc: Jan Kara <jack@...> > --- > This patch should be equivalent to Jan's "ext4: Avoid issuing barriers > on error recovery path", but it removes more lines than it adds. :-) Yeah, it looks fine. Acked-by: Jan Kara <jack@...> Honza > > fs/ext4/fsync.c | 8 +++----- > 1 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > index 2b15312..a3c2507 100644 > --- a/fs/ext4/fsync.c > +++ b/fs/ext4/fsync.c > @@ -60,7 +60,7 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) > > ret = flush_aio_dio_completed_IO(inode); > if (ret < 0) > - goto out; > + return ret; > /* > * data=writeback: > * The caller's filemap_fdatawrite()/wait will sync the data. > @@ -79,10 +79,8 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) > * (they were dirtied by commit). But that's OK - the blocks are > * safe in-journal, which is all fsync() needs to ensure. > */ > - if (ext4_should_journal_data(inode)) { > - ret = ext4_force_commit(inode->i_sb); > - goto out; > - } > + if (ext4_should_journal_data(inode)) > + return ext4_force_commit(inode->i_sb); > > if (!journal) > ret = sync_mapping_buffers(inode->i_mapping); > -- > 1.6.5.216.g5288a.dirty > Jan Kara <jack@...> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncOn Sun 15-11-09 19:46:41, Theodore Tso wrote:
> On Tue, Oct 27, 2009 at 01:48:49PM +0100, Jan Kara wrote: > > + /* > > + * Transactions that contain inode's metadata needed to complete > > + * fsync and fdatasync, respectively. > > + */ > > + atomic_t i_sync_tid; > > + atomic_t i_datasync_tid; > > Why do we need an atomic_t here at all? It's not clear it's > necessary. What specific race are you worried about? currently synchronized against each other and I din't want to introduce any synchronization. So atomic_t seemed like a clean way of making clear that loads / stores from those fields are atomic, regardless of architecture, memory alignment or whatever insanity that might make us see just half overwritten field. On all archs where this means just plain stores / loads (such as x86), there's no performance hit since the operations are implemented as such. Honza -- Jan Kara <jack@...> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncOn Mon, Nov 16, 2009 at 11:43:31AM +0100, Jan Kara wrote:
> > Why do we need an atomic_t here at all? It's not clear it's > > necessary. What specific race are you worried about? > Well, i_[data]sync_tid are set and read from several places which aren't > currently synchronized against each other and I din't want to introduce any > synchronization. So atomic_t seemed like a clean way of making clear that > loads / stores from those fields are atomic, regardless of architecture, > memory alignment or whatever insanity that might make us see just half > overwritten field. On all archs where this means just plain stores / loads > (such as x86), there's no performance hit since the operations are > implemented as such. Sorry for not responding to this one sooner, but see this URL: http://digitalvampire.org/blog/index.php/2007/05/13/atomic-cargo-cults/ - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncHere's a revised version of your patch that I've included in the ext4
patch queue. I've removed the use of the atomic_t data type. I've also cleaned up the fsync() function some more to make it easier to follow, and I've fixed the handling in no-journal mode (we can't depend on the forcing a commit if there is no journal, so we can't drop the use of sync_inode() --- we can use simple_fsync(), though, which does everything we need if there is no journal). - Ted commit b436b9bef84de6893e86346d8fbf7104bc520645 Author: Jan Kara <jack@...> Date: Tue Dec 8 23:51:10 2009 -0500 ext4: Wait for proper transaction commit on fsync We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@...> Signed-off-by: "Theodore Ts'o" <tytso@...> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 4cfc2f0..ab31e65 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -709,6 +709,13 @@ struct ext4_inode_info { struct list_head i_aio_dio_complete_list; /* current io_end structure for async DIO write*/ ext4_io_end_t *cur_aio_dio; + + /* + * Transactions that contain inode's metadata needed to complete + * fsync and fdatasync, respectively. + */ + tid_t i_sync_tid; + tid_t i_datasync_tid; }; /* diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h index 2c2b262..05eca81 100644 --- a/fs/ext4/ext4_jbd2.h +++ b/fs/ext4/ext4_jbd2.h @@ -249,6 +249,19 @@ static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode) return 0; } +static inline void ext4_update_inode_fsync_trans(handle_t *handle, + struct inode *inode, + int datasync) +{ + struct ext4_inode_info *ei = EXT4_I(inode); + + if (ext4_handle_valid(handle)) { + ei->i_sync_tid = handle->h_transaction->t_tid; + if (datasync) + ei->i_datasync_tid = handle->h_transaction->t_tid; + } +} + /* super.c */ int ext4_force_commit(struct super_block *sb); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 5967f18..700206e 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3058,6 +3058,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) { ret = ext4_convert_unwritten_extents_dio(handle, inode, path); + if (ret >= 0) + ext4_update_inode_fsync_trans(handle, inode, 1); goto out2; } /* buffered IO case */ @@ -3085,6 +3087,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, ret = ext4_ext_convert_to_initialized(handle, inode, path, iblock, max_blocks); + if (ret >= 0) + ext4_update_inode_fsync_trans(handle, inode, 1); out: if (ret <= 0) { err = ret; @@ -3323,10 +3327,16 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, allocated = ext4_ext_get_actual_len(&newex); set_buffer_new(bh_result); - /* Cache only when it is _not_ an uninitialized extent */ - if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) + /* + * Cache the extent and update transaction to commit on fdatasync only + * when it is _not_ an uninitialized extent. + */ + if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) { ext4_ext_put_in_cache(inode, iblock, allocated, newblock, EXT4_EXT_CACHE_EXTENT); + ext4_update_inode_fsync_trans(handle, inode, 1); + } else + ext4_update_inode_fsync_trans(handle, inode, 0); out: if (allocated > max_blocks) allocated = max_blocks; diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index a3c2507..0b22497 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -51,25 +51,30 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) { struct inode *inode = dentry->d_inode; + struct ext4_inode_info *ei = EXT4_I(inode); journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; - int err, ret = 0; + int ret; + tid_t commit_tid; J_ASSERT(ext4_journal_current_handle() == NULL); trace_ext4_sync_file(file, dentry, datasync); + if (inode->i_sb->s_flags & MS_RDONLY) + return 0; + ret = flush_aio_dio_completed_IO(inode); if (ret < 0) return ret; + + if (!journal) + return simple_fsync(file, dentry, datasync); + /* - * data=writeback: + * data=writeback,ordered: * The caller's filemap_fdatawrite()/wait will sync the data. - * sync_inode() will sync the metadata - * - * data=ordered: - * The caller's filemap_fdatawrite() will write the data and - * sync_inode() will write the inode if it is dirty. Then the caller's - * filemap_fdatawait() will wait on the pages. + * Metadata is in the journal, we wait for proper transaction to + * commit here. * * data=journal: * filemap_fdatawrite won't do anything (the buffers are clean). @@ -82,27 +87,10 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) if (ext4_should_journal_data(inode)) return ext4_force_commit(inode->i_sb); - if (!journal) - ret = sync_mapping_buffers(inode->i_mapping); - - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) - goto out; - - /* - * The VFS has written the file data. If the inode is unaltered - * then we need not start a commit. - */ - if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) { - struct writeback_control wbc = { - .sync_mode = WB_SYNC_ALL, - .nr_to_write = 0, /* sys_fsync did this */ - }; - err = sync_inode(inode, &wbc); - if (ret == 0) - ret = err; - } -out: - if (journal && (journal->j_flags & JBD2_BARRIER)) + commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; + if (jbd2_log_start_commit(journal, commit_tid)) + jbd2_log_wait_commit(journal, commit_tid); + else if (journal->j_flags & JBD2_BARRIER) blkdev_issue_flush(inode->i_sb->s_bdev, NULL); return ret; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 958c3ff..f1bc1e3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -983,6 +983,8 @@ static int ext4_ind_get_blocks(handle_t *handle, struct inode *inode, goto cleanup; set_buffer_new(bh_result); + + ext4_update_inode_fsync_trans(handle, inode, 1); got_it: map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key)); if (count > blocks_to_boundary) @@ -4738,6 +4740,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) struct ext4_inode *raw_inode; struct ext4_inode_info *ei; struct inode *inode; + journal_t *journal = EXT4_SB(sb)->s_journal; long ret; int block; @@ -4802,6 +4805,31 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) ei->i_data[block] = raw_inode->i_block[block]; INIT_LIST_HEAD(&ei->i_orphan); + /* + * Set transaction id's of transactions that have to be committed + * to finish f[data]sync. We set them to currently running transaction + * as we cannot be sure that the inode or some of its metadata isn't + * part of the transaction - the inode could have been reclaimed and + * now it is reread from disk. + */ + if (journal) { + transaction_t *transaction; + tid_t tid; + + spin_lock(&journal->j_state_lock); + if (journal->j_running_transaction) + transaction = journal->j_running_transaction; + else + transaction = journal->j_committing_transaction; + if (transaction) + tid = transaction->t_tid; + else + tid = journal->j_commit_sequence; + spin_unlock(&journal->j_state_lock); + ei->i_sync_tid = tid; + ei->i_datasync_tid = tid; + } + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize); if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize > @@ -5056,6 +5084,7 @@ static int ext4_do_update_inode(handle_t *handle, err = rc; ei->i_state &= ~EXT4_STATE_NEW; + ext4_update_inode_fsync_trans(handle, inode, 0); out_brelse: brelse(bh); ext4_std_error(inode->i_sb, err); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 8ab0c95..2b13dcf 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -706,6 +706,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) spin_lock_init(&(ei->i_block_reservation_lock)); INIT_LIST_HEAD(&ei->i_aio_dio_complete_list); ei->cur_aio_dio = NULL; + ei->i_sync_tid = 0; + ei->i_datasync_tid = 0; return &ei->vfs_inode; } -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncOn Tue 08-12-09 23:54:24, tytso@... wrote:
> Here's a revised version of your patch that I've included in the ext4 > patch queue. I've removed the use of the atomic_t data type. I've OK, I'm just unsure: Isn't even 32-bit read non-atomic if it is not properly aligned e.g. on powerPC? So shouldn't we make sure those inode fields are aligned to 32-bit boundary (or at least add a comment about that)? > also cleaned up the fsync() function some more to make it easier to > follow, and I've fixed the handling in no-journal mode (we can't > depend on the forcing a commit if there is no journal, so we can't > drop the use of sync_inode() --- we can use simple_fsync(), though, > which does everything we need if there is no journal). Good catch. Thanks. Honza > commit b436b9bef84de6893e86346d8fbf7104bc520645 > Author: Jan Kara <jack@...> > Date: Tue Dec 8 23:51:10 2009 -0500 > > ext4: Wait for proper transaction commit on fsync > > We cannot rely on buffer dirty bits during fsync because pdflush can come > before fsync is called and clear dirty bits without forcing a transaction > commit. What we do is that we track which transaction has last changed > the inode and which transaction last changed allocation and force it to > disk on fsync. > > Signed-off-by: Jan Kara <jack@...> > Signed-off-by: "Theodore Ts'o" <tytso@...> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 4cfc2f0..ab31e65 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -709,6 +709,13 @@ struct ext4_inode_info { > struct list_head i_aio_dio_complete_list; > /* current io_end structure for async DIO write*/ > ext4_io_end_t *cur_aio_dio; > + > + /* > + * Transactions that contain inode's metadata needed to complete > + * fsync and fdatasync, respectively. > + */ > + tid_t i_sync_tid; > + tid_t i_datasync_tid; > }; > > /* > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h > index 2c2b262..05eca81 100644 > --- a/fs/ext4/ext4_jbd2.h > +++ b/fs/ext4/ext4_jbd2.h > @@ -249,6 +249,19 @@ static inline int ext4_jbd2_file_inode(handle_t *handle, struct inode *inode) > return 0; > } > > +static inline void ext4_update_inode_fsync_trans(handle_t *handle, > + struct inode *inode, > + int datasync) > +{ > + struct ext4_inode_info *ei = EXT4_I(inode); > + > + if (ext4_handle_valid(handle)) { > + ei->i_sync_tid = handle->h_transaction->t_tid; > + if (datasync) > + ei->i_datasync_tid = handle->h_transaction->t_tid; > + } > +} > + > /* super.c */ > int ext4_force_commit(struct super_block *sb); > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 5967f18..700206e 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3058,6 +3058,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, > if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) { > ret = ext4_convert_unwritten_extents_dio(handle, inode, > path); > + if (ret >= 0) > + ext4_update_inode_fsync_trans(handle, inode, 1); > goto out2; > } > /* buffered IO case */ > @@ -3085,6 +3087,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode, > ret = ext4_ext_convert_to_initialized(handle, inode, > path, iblock, > max_blocks); > + if (ret >= 0) > + ext4_update_inode_fsync_trans(handle, inode, 1); > out: > if (ret <= 0) { > err = ret; > @@ -3323,10 +3327,16 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, > allocated = ext4_ext_get_actual_len(&newex); > set_buffer_new(bh_result); > > - /* Cache only when it is _not_ an uninitialized extent */ > - if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) > + /* > + * Cache the extent and update transaction to commit on fdatasync only > + * when it is _not_ an uninitialized extent. > + */ > + if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) { > ext4_ext_put_in_cache(inode, iblock, allocated, newblock, > EXT4_EXT_CACHE_EXTENT); > + ext4_update_inode_fsync_trans(handle, inode, 1); > + } else > + ext4_update_inode_fsync_trans(handle, inode, 0); > out: > if (allocated > max_blocks) > allocated = max_blocks; > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > index a3c2507..0b22497 100644 > --- a/fs/ext4/fsync.c > +++ b/fs/ext4/fsync.c > @@ -51,25 +51,30 @@ > int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) > { > struct inode *inode = dentry->d_inode; > + struct ext4_inode_info *ei = EXT4_I(inode); > journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; > - int err, ret = 0; > + int ret; > + tid_t commit_tid; > > J_ASSERT(ext4_journal_current_handle() == NULL); > > trace_ext4_sync_file(file, dentry, datasync); > > + if (inode->i_sb->s_flags & MS_RDONLY) > + return 0; > + > ret = flush_aio_dio_completed_IO(inode); > if (ret < 0) > return ret; > + > + if (!journal) > + return simple_fsync(file, dentry, datasync); > + > /* > - * data=writeback: > + * data=writeback,ordered: > * The caller's filemap_fdatawrite()/wait will sync the data. > - * sync_inode() will sync the metadata > - * > - * data=ordered: > - * The caller's filemap_fdatawrite() will write the data and > - * sync_inode() will write the inode if it is dirty. Then the caller's > - * filemap_fdatawait() will wait on the pages. > + * Metadata is in the journal, we wait for proper transaction to > + * commit here. > * > * data=journal: > * filemap_fdatawrite won't do anything (the buffers are clean). > @@ -82,27 +87,10 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) > if (ext4_should_journal_data(inode)) > return ext4_force_commit(inode->i_sb); > > - if (!journal) > - ret = sync_mapping_buffers(inode->i_mapping); > - > - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) > - goto out; > - > - /* > - * The VFS has written the file data. If the inode is unaltered > - * then we need not start a commit. > - */ > - if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) { > - struct writeback_control wbc = { > - .sync_mode = WB_SYNC_ALL, > - .nr_to_write = 0, /* sys_fsync did this */ > - }; > - err = sync_inode(inode, &wbc); > - if (ret == 0) > - ret = err; > - } > -out: > - if (journal && (journal->j_flags & JBD2_BARRIER)) > + commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; > + if (jbd2_log_start_commit(journal, commit_tid)) > + jbd2_log_wait_commit(journal, commit_tid); > + else if (journal->j_flags & JBD2_BARRIER) > blkdev_issue_flush(inode->i_sb->s_bdev, NULL); > return ret; > } > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 958c3ff..f1bc1e3 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -983,6 +983,8 @@ static int ext4_ind_get_blocks(handle_t *handle, struct inode *inode, > goto cleanup; > > set_buffer_new(bh_result); > + > + ext4_update_inode_fsync_trans(handle, inode, 1); > got_it: > map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key)); > if (count > blocks_to_boundary) > @@ -4738,6 +4740,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) > struct ext4_inode *raw_inode; > struct ext4_inode_info *ei; > struct inode *inode; > + journal_t *journal = EXT4_SB(sb)->s_journal; > long ret; > int block; > > @@ -4802,6 +4805,31 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) > ei->i_data[block] = raw_inode->i_block[block]; > INIT_LIST_HEAD(&ei->i_orphan); > > + /* > + * Set transaction id's of transactions that have to be committed > + * to finish f[data]sync. We set them to currently running transaction > + * as we cannot be sure that the inode or some of its metadata isn't > + * part of the transaction - the inode could have been reclaimed and > + * now it is reread from disk. > + */ > + if (journal) { > + transaction_t *transaction; > + tid_t tid; > + > + spin_lock(&journal->j_state_lock); > + if (journal->j_running_transaction) > + transaction = journal->j_running_transaction; > + else > + transaction = journal->j_committing_transaction; > + if (transaction) > + tid = transaction->t_tid; > + else > + tid = journal->j_commit_sequence; > + spin_unlock(&journal->j_state_lock); > + ei->i_sync_tid = tid; > + ei->i_datasync_tid = tid; > + } > + > if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { > ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize); > if (EXT4_GOOD_OLD_INODE_SIZE + ei->i_extra_isize > > @@ -5056,6 +5084,7 @@ static int ext4_do_update_inode(handle_t *handle, > err = rc; > ei->i_state &= ~EXT4_STATE_NEW; > > + ext4_update_inode_fsync_trans(handle, inode, 0); > out_brelse: > brelse(bh); > ext4_std_error(inode->i_sb, err); > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index 8ab0c95..2b13dcf 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -706,6 +706,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) > spin_lock_init(&(ei->i_block_reservation_lock)); > INIT_LIST_HEAD(&ei->i_aio_dio_complete_list); > ei->cur_aio_dio = NULL; > + ei->i_sync_tid = 0; > + ei->i_datasync_tid = 0; > > return &ei->vfs_inode; > } Jan Kara <jack@...> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
|
|
Re: [PATCH 4/4] ext4: Wait for proper transaction commit on fsyncOn Tue 08-12-09 22:51:20, tytso@... wrote:
> On Mon, Nov 16, 2009 at 11:43:31AM +0100, Jan Kara wrote: > > > Why do we need an atomic_t here at all? It's not clear it's > > > necessary. What specific race are you worried about? > > Well, i_[data]sync_tid are set and read from several places which aren't > > currently synchronized against each other and I din't want to introduce any > > synchronization. So atomic_t seemed like a clean way of making clear that > > loads / stores from those fields are atomic, regardless of architecture, > > memory alignment or whatever insanity that might make us see just half > > overwritten field. On all archs where this means just plain stores / loads > > (such as x86), there's no performance hit since the operations are > > implemented as such. > > Sorry for not responding to this one sooner, but see this URL: > > http://digitalvampire.org/blog/index.php/2007/05/13/atomic-cargo-cults/ Honza -- Jan Kara <jack@...> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |