Re: [PATCH] xfs: coalesce contiguous unwritten suffix into iomap for NOWAIT writes
From: Christoph Hellwig
Date: Fri Jun 05 2026 - 05:11:57 EST
On Fri, May 29, 2026 at 03:11:00PM -0400, Brandon Allard wrote:
> In append-only direct-IO workloads with fallocate(with KEEP_SIZE +
> ZERO_RANGE) writes can span the written/unwritten boundary of the
> preallocated extent. This is from callers needing to overwrite the
> previously written block if the last write didn't completely fill it.
>
> Prior to commit 883a790a8440 ("xfs: don't allow NOWAIT DIO across extent
> boundaries") these writes succeeded by issuing two bios: one against the
> NORM head, one against the UNWRITTEN tail. That commit closed off the
> two-bio path to prevent NOWAIT callers from observing short writes when
> one bio succeeded and another returned -EAGAIN. The side effect is that
> the workload above now returns -EAGAIN on the spanning write.
>
> This patch restores the success path by coalescing the (NORM, UNWRITTEN)
> pair into a single in-memory iomap when the two records are physically
> contiguous on disk. The merged range is reported as UNWRITTEN so
> iomap_dio submits a single contiguous bio and the completion handler
> runs xfs_iomap_write_unwritten across the merged range.
>
> Short writes are still prevented. The merged iomap describes a single
> contiguous physical extent, so iomap_dio still submits a single bio
> for the full range. The caller observes the same all-or-nothing
> behaviour as a write into a single UNWRITTEN extent.
>
> The completion of the write also remains correct.
> xfs_bmapi_convert_unwritten returns early on the already-NORM prefix
> without dirtying the transaction, while the UNWRITTEN suffix flips to
> NORM and the two adjacent same-state records coalesce in the bmbt as a
> side effect of xfs_bmap_add_extent_unwritten_real.
I think the better way to do this would be a new XFS_BMAPI_ flag that
allows xfs_bmapi_read to merge maps with mismatching unwritten/written
state and just return your a single map. The big benefits are that we
only need to do one search for the extent and can then iterate the
cursor, and I think it will also end up with less and more readable code.
We actually had such a flag until v4.19, so you might be able to just
look into a manual revert of commit c3a2f9fff1bb ("xfs: remove the now
unused XFS_BMAPI_IGSTATE flag") for a start, although for use case
you'd need to ensure that the unwritten flag is set on the output
map if it is set on any of the input maps.