Re: [PATCH v6 3/4] buffer: add dropbehind writeback support

From: Tal Zussman

Date: Fri May 22 2026 - 19:19:41 EST


On 5/14/26 5:51 PM, Tal Zussman wrote:
> Add block_write_begin_iocb() which threads the kiocb through to
> __filemap_get_folio() so that buffer_head-based I/O can use DONTCACHE
> behavior. When the iocb has IOCB_DONTCACHE set, FGP_DONTCACHE is
> passed to mark the folio for dropbehind. The existing
> block_write_begin() is preserved as a wrapper that passes a NULL iocb.
>
> Set BIO_COMPLETE_IN_TASK in submit_bh_wbc() when the folio has
> dropbehind set, so that buffer_head writeback completions get deferred
> to task context.
>
> Signed-off-by: Tal Zussman <tz2294@xxxxxxxxxxxx>

Responding to Sashiko review inline:

Link: https://sashiko.dev/#/patchset/20260514-blk-dontcache-v6-0-782e2fa7477b%40columbia.edu

> ---
> fs/buffer.c | 19 +++++++++++++++++--
> include/linux/buffer_head.h | 3 +++
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index b0b3792b1496..d0abaf44d782 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2138,14 +2138,19 @@ EXPORT_SYMBOL(block_commit_write);
> *
> * The filesystem needs to handle block truncation upon failure.
> */
> -int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
> +int block_write_begin_iocb(const struct kiocb *iocb,
> + struct address_space *mapping, loff_t pos, unsigned len,
> struct folio **foliop, get_block_t *get_block)
> {
> pgoff_t index = pos >> PAGE_SHIFT;
> + fgf_t fgp_flags = FGP_WRITEBEGIN;
> struct folio *folio;
> int status;
>
> - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
> + if (iocb && iocb->ki_flags & IOCB_DONTCACHE)
> + fgp_flags |= FGP_DONTCACHE;
> +
> + folio = __filemap_get_folio(mapping, index, fgp_flags,
> mapping_gfp_mask(mapping));
> if (IS_ERR(folio))
> return PTR_ERR(folio);
> @@ -2160,6 +2165,13 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
> *foliop = folio;
> return status;
> }

Q: "Is an EXPORT_SYMBOL() or EXPORT_SYMBOL_GPL() needed for
block_write_begin_iocb()?
Since the legacy wrapper block_write_begin() remains exported, it appears
filesystems compiled as loadable modules (like ext4 or fat) will encounter
linker errors if they attempt to transition to the new API."

A: Can be added once filesystems want to use it.

> +
> +int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
> + struct folio **foliop, get_block_t *get_block)
> +{
> + return block_write_begin_iocb(NULL, mapping, pos, len, foliop,
> + get_block);
> +}
> EXPORT_SYMBOL(block_write_begin);
>
> int block_write_end(loff_t pos, unsigned len, unsigned copied,
> @@ -2715,6 +2727,9 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
>
> bio = bio_alloc(bh->b_bdev, 1, opf, GFP_NOIO);
>
> + if (folio_test_dropbehind(bh->b_folio))
> + bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
> +

Q: "Does this add unnecessary workqueue overhead for read operations?
While task-context deferral is required for asynchronous writeback completions
to allow folio_end_writeback() to sleep and invalidate the folio, read
dropbehind invalidation is handled synchronously in the reader's task context
via filemap_end_dropbehind_read().
Could this be gated by an op == REQ_OP_WRITE or op_is_write(opf) check so
that read bio completions aren't forced into a workqueue?"

A: Seems worth adding a op_is_write(opf) check to avoid deferring read
dropbehind completions.

> if (IS_ENABLED(CONFIG_FS_ENCRYPTION))
> buffer_set_crypto_ctx(bio, bh, GFP_NOIO);
>
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index e4939e33b4b5..4ce50882d621 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -260,6 +260,9 @@ int block_read_full_folio(struct folio *, get_block_t *);
> bool block_is_partially_uptodate(struct folio *, size_t from, size_t count);
> int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
> struct folio **foliop, get_block_t *get_block);
> +int block_write_begin_iocb(const struct kiocb *iocb,
> + struct address_space *mapping, loff_t pos, unsigned len,
> + struct folio **foliop, get_block_t *get_block);
> int __block_write_begin(struct folio *folio, loff_t pos, unsigned len,
> get_block_t *get_block);
> int block_write_end(loff_t pos, unsigned len, unsigned copied, struct folio *);
>