[PATCH v4 07/23] ext4: do not use data=ordered mode for inodes using buffered iomap path
From: Zhang Yi
Date: Mon May 11 2026 - 03:36:19 EST
From: Zhang Yi <yi.zhang@xxxxxxxxxx>
The data=ordered mode introduces two fundamental conflicts with the
iomap buffered write path, leading to potential deadlocks.
1) Lock ordering conflict
In the iomap writeback path, each folio is processed sequentially:
the folio lock is acquired first, followed by starting a transaction
to create block mappings. In data=ordered mode, writeback triggered
by the journal commit process may attempt to acquire a folio lock
that is already held by iomap. Meanwhile, iomap, under that same
folio lock, may start a new transaction and wait for the currently
committing transaction to finish, resulting in a deadlock.
2) Partial folio submission not supported
When block size is smaller than folio size, a folio may contain both
mapped and unmapped blocks. In data=ordered mode, if the journal
waits for such a folio to be written back while the regular writeback
process has already started committing it (with the writeback flag
set), mapping the remaining unmapped blocks can deadlock. This is
because the writeback flag is cleared only after the entire folio is
processed and committed.
To support data=ordered mode, the iomap core would need two invasive
changes:
- Acquire the transaction handle before locking any folio for
writeback.
- Support partial folio submission.
Both changes are complicated and risk performance regressions.
Therefore, we must avoid using data=ordered mode when converting to the
iomap path.
Currently, data=ordered mode is used in three scenarios:
- Append write
- Post-EOF partial block truncate-up followed by append write
- Online defragmentation
We can address the first two without data=ordered mode:
- For append write: always allocate unwritten blocks (i.e. always
enable dioread_nolock), preserving the behavior of current
extent-type inodes.
- For post-EOF truncate-up + append write: postpone updating i_disksize
until after the zeroed partial block has been written back.
Online defragmentation does not yet support iomap; this can be resolved
separately in the future.
Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
---
fs/ext4/ext4_jbd2.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 63d17c5201b5..26999f173870 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -383,7 +383,12 @@ static inline int ext4_should_journal_data(struct inode *inode)
static inline int ext4_should_order_data(struct inode *inode)
{
- return ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE;
+ /*
+ * inodes using the iomap buffered I/O path do not use the
+ * data=ordered mode.
+ */
+ return !ext4_inode_buffered_iomap(inode) &&
+ (ext4_inode_journal_mode(inode) & EXT4_INODE_ORDERED_DATA_MODE);
}
static inline int ext4_should_writeback_data(struct inode *inode)
--
2.52.0