Re: [PATCH v4 04/23] ext4: add iomap address space operations for buffered I/O
From: Zhang Yi
Date: Tue May 19 2026 - 08:41:49 EST
On 5/19/2026 5:28 PM, Ojaswin Mujoo wrote:
> On Mon, May 11, 2026 at 03:23:24PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> Introduce initial support for iomap in the buffered I/O path for regular
>> files on ext4.
>>
>> - Add a new inode state flag EXT4_STATE_BUFFERED_IOMAP to indicate the
>> inode uses iomap instead of buffer_head for buffered I/O
>> - Add helper ext4_inode_buffered_iomap() to check the flag
>> - Add new address space operations ext4_iomap_aops with callbacks that
>> will use generic iomap implementations
>> - Add ext4_iomap_aops to ext4_set_aops() when the flag is set
>>
>> The following callbacks(read_folio(), readahead(), writepages()) are
>> provided as placeholders and will be implemented in later patches.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
>> Reviewed-by: Jan Kara <jack@xxxxxxx>
>
> Hi Zhang, looks good to me. Just a questions below:
Hi, Ojaswin! Thank you for the review of this series.
>> ---
>> fs/ext4/ext4.h | 7 +++++++
>> fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++++
>> 2 files changed, 39 insertions(+)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 94283a991e5c..1e27d73d7427 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1972,6 +1972,7 @@ enum {
>> EXT4_STATE_FC_COMMITTING, /* Fast commit ongoing */
>> EXT4_STATE_FC_FLUSHING_DATA, /* Fast commit flushing data */
>> EXT4_STATE_ORPHAN_FILE, /* Inode orphaned in orphan file */
>> + EXT4_STATE_BUFFERED_IOMAP, /* Inode use iomap for buffered IO */
>> };
>>
>> #define EXT4_INODE_BIT_FNS(name, field, offset) \
>> @@ -2040,6 +2041,12 @@ static inline bool ext4_inode_orphan_tracked(struct inode *inode)
>> !list_empty(&EXT4_I(inode)->i_orphan);
>> }
>>
>> +/* Whether the inode pass through the iomap infrastructure for buffered I/O */
>> +static inline bool ext4_inode_buffered_iomap(struct inode *inode)
>> +{
>> + return ext4_test_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP);
>> +}
>> +
>> /*
>> * Codes for operating systems
>> */
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index b1ef706987c3..178ac2be37b7 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3908,6 +3908,22 @@ const struct iomap_ops ext4_iomap_report_ops = {
>> .iomap_begin = ext4_iomap_begin_report,
>> };
>>
>> +static int ext4_iomap_read_folio(struct file *file, struct folio *folio)
>> +{
>> + return 0;
>> +}
>> +
>> +static void ext4_iomap_readahead(struct readahead_control *rac)
>> +{
>> +
>> +}
>> +
>> +static int ext4_iomap_writepages(struct address_space *mapping,
>> + struct writeback_control *wbc)
>> +{
>> + return 0;
>> +}
>> +
>> /*
>> * For data=journal mode, folio should be marked dirty only when it was
>> * writeably mapped. When that happens, it was already attached to the
>> @@ -3994,6 +4010,20 @@ static const struct address_space_operations ext4_da_aops = {
>> .swap_activate = ext4_iomap_swap_activate,
>> };
>>
>> +static const struct address_space_operations ext4_iomap_aops = {
>> + .read_folio = ext4_iomap_read_folio,
>> + .readahead = ext4_iomap_readahead,
>> + .writepages = ext4_iomap_writepages,
>> + .dirty_folio = iomap_dirty_folio,
>> + .bmap = ext4_bmap,
>> + .invalidate_folio = iomap_invalidate_folio,
>> + .release_folio = iomap_release_folio,
>> + .migrate_folio = filemap_migrate_folio,
>> + .is_partially_uptodate = iomap_is_partially_uptodate,
>> + .error_remove_folio = generic_error_remove_folio,
>> + .swap_activate = ext4_iomap_swap_activate,
>> +};
>
> So one question, for ->release_folio() we are using
> iomap_release_folio() instead of ext4_release_folio() here which doesnt
> make the jbd2_journal_try_to_free_bufferes() call. IIUC this function
> seems to be trying to clean up already checkpointed buffers.
>
> I wanted to check if ->release_folio() can be called for folios with
> ext4 metadata buffers? (from my limited understanding of
> shrink_folio_list() -> filemap_release_folio() it seems we can) And if
> it can be called, is it okay to skip the
> jbd2_journal_try_to_free_buffers call?
Here, in ->release_folio(), folio->mapping points to inode->i_data (the
file's pagecache), not the block device's pagecache. ext4 metadata
resides in the block device's pagecache, which is at a different layer
than this release_folio callback. So we don't need to call
jbd2_journal_try_to_free_buffers() in the iomap path here.
Thanks,
Yi.
>
> Regards,
> ojaswin
>
>> +
>> static const struct address_space_operations ext4_dax_aops = {
>> .writepages = ext4_dax_writepages,
>> .dirty_folio = noop_dirty_folio,
>> @@ -4015,6 +4045,8 @@ void ext4_set_aops(struct inode *inode)
>> }
>> if (IS_DAX(inode))
>> inode->i_mapping->a_ops = &ext4_dax_aops;
>> + else if (ext4_inode_buffered_iomap(inode))
>> + inode->i_mapping->a_ops = &ext4_iomap_aops;
>> else if (test_opt(inode->i_sb, DELALLOC))
>> inode->i_mapping->a_ops = &ext4_da_aops;
>> else
>> --
>> 2.52.0
>>