Re: [PATCH v4 07/23] ext4: do not use data=ordered mode for inodes using buffered iomap path

From: Ojaswin Mujoo

Date: Tue May 19 2026 - 09:40:18 EST

On Tue, May 19, 2026 at 04:11:30PM +0530, Ojaswin Mujoo wrote:
> On Mon, May 11, 2026 at 03:23:27PM +0800, Zhang Yi wrote:
> > From: Zhang Yi <yi.zhang@xxxxxxxxxx>
> >
> > The data=ordered mode introduces two fundamental conflicts with the
> > iomap buffered write path, leading to potential deadlocks.
> >
> > 1) Lock ordering conflict
> > In the iomap writeback path, each folio is processed sequentially:
> > the folio lock is acquired first, followed by starting a transaction
> > to create block mappings. In data=ordered mode, writeback triggered
> > by the journal commit process may attempt to acquire a folio lock
> > that is already held by iomap. Meanwhile, iomap, under that same
> > folio lock, may start a new transaction and wait for the currently
> > committing transaction to finish, resulting in a deadlock.
>
> Right, makes sense.
> >
> > 2) Partial folio submission not supported
> > When block size is smaller than folio size, a folio may contain both
> > mapped and unmapped blocks. In data=ordered mode, if the journal
> > waits for such a folio to be written back while the regular writeback
> > process has already started committing it (with the writeback flag
> > set), mapping the remaining unmapped blocks can deadlock. This is
> > because the writeback flag is cleared only after the entire folio is
> > processed and committed.
>
> Okay so IIUC, if we do end up using iomap with ordered data, there are 2
> codepaths with issues here:
>
> txn_commit
> ordered data writeback (say it goes via iomap)
> folio_lock
> iomap_writeback_folio
> folio_start_writeback
> iomap_writeback_range
> ext4_map_block
> txn_start
> wait for tnx commit - DEADLOCK
>
> Currently we avoid this by having ext4_normal_submit_inode_buffers()
> pass can_map = 0 so journal flush makese sure not to start any txn.
>
> Then we have
>
> txn_commit background writeback (via iomap)
>
> folio_lock()
> ordered data writeback
> folio_lock
>
> iomap_writeback_folio
> folio_start_writeback
> iomap_writeback_range
> ext4_map_block
> txn_start
> wait for txn commit - DEADLOCK

Sorry I forget to remove tabs

this is what I meant:

txn_commit
ordered data writeback (say it goes via iomap)
folio_lock
iomap_writeback_folio
folio_start_writeback
iomap_writeback_range
ext4_map_block
txn_start
wait for tnx commit - DEADLOCK

Currently we avoid this by having ext4_normal_submit_inode_buffers()
pass can_map = 0 so journal flush makese sure not to start any txn.

Then we have

txn_commit background writeback (via iomap)

folio_lock()
ordered data writeback
folio_lock

iomap_writeback_folio
folio_start_writeback
iomap_writeback_range
ext4_map_block
txn_start
wait for txn commit - DEADLOCK

>
> Currently, this is taken care because we try to start the txn before
> taking any folio locks/starting writeback, and hence we cannot deadlock.
>
> If the above description makes sense, do you think it'd be good to add
> them to the commit message. The reason is that although these paths seem
> obvious when we look at them a lot, it took me a good bit of time to
> understand what deadlocks you are talking about here :p
>
> Having the code traces like above makes it very clear.
> >
> > To support data=ordered mode, the iomap core would need two invasive
> > changes:
> > - Acquire the transaction handle before locking any folio for
> > writeback.
> > - Support partial folio submission.
> >
> > Both changes are complicated and risk performance regressions.
> > Therefore, we must avoid using data=ordered mode when converting to the
> > iomap path.
> >
> > Currently, data=ordered mode is used in three scenarios:
> > - Append write
> > - Post-EOF partial block truncate-up followed by append write
> > - Online defragmentation
> >
> > We can address the first two without data=ordered mode:
> > - For append write: always allocate unwritten blocks (i.e. always
> > enable dioread_nolock), preserving the behavior of current
> > extent-type inodes.
> > - For post-EOF truncate-up + append write: postpone updating i_disksize
> > until after the zeroed partial block has been written back.
>
> I'm still going through how we are addressing no data=ordered so will
> get back on this in some time.
>
> Thanks,
> Ojaswin
>
> >
> > Online defragmentation does not yet support iomap; this can be resolved
> > separately in the future.
> >
> > Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
> > ---
> > fs/ext4/ext4_jbd2.h | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> > index 63d17c5201b5..26999f173870 100644
> > --- a/fs/ext4/ext4_jbd2.h
> > +++ b/fs/ext4/ext4_jbd2.h
> > @@ -383,7 +383,12 @@ static inline int ext4_should_journal_data(struct inode *inode)
> >
> > static inline int ext4_should_order_data(struct inode *inode)
> > {
> > - return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE;
> > + /*
> > + * inodes using the iomap buffered I/O path do not use the
> > + * data=ordered mode.
> > + */
> > + return !ext4_inode_buffered_iomap(inode) &&
> > + (ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE);
> > }
> >
> > static inline int ext4_should_writeback_data(struct inode *inode)
> > --
> > 2.52.0
> >