Re: [PATCH] btrfs: fix subpage state mismatch in cow_fixup writeback path

From: Qu Wenruo

Date: Mon May 25 2026 - 18:11:00 EST




在 2026/5/25 23:43, David Sterba 写道:
On Mon, Mar 16, 2026 at 10:56:56AM +0000, Werner Kasselman wrote:
writepage_delalloc() marks all dirty sectors as locked via
btrfs_folio_set_lock(), setting bits in the subpage locked bitmap and
incrementing nr_locked. These are cleaned up by
btrfs_folio_end_lock_bitmap() at the end of extent_writepage().

However, when btrfs_writepage_cow_fixup() returns -EAGAIN inside
extent_writepage_io(), the code calls folio_unlock() directly and
returns 1, causing extent_writepage() to skip the bitmap cleanup:

ret = btrfs_writepage_cow_fixup(folio);
if (ret == -EAGAIN) {
folio_redirty_for_writepage(bio_ctrl->wbc, folio);
folio_unlock(folio); // doesn't clear locked bitmap
return 1; // caller skips end_lock_bitmap()
}

This leaves the subpage locked bitmap out of sync with the folio lock
state: the folio is unlocked but its subpage locked bitmap still has
bits set and nr_locked is elevated. When writeback retries the folio,
btrfs_folio_set_lock() hits the ASSERT at subpage.c:746 because the
bits are still set from the previous attempt.

The cow_fixup path is largely a legacy path -- the GUP dirty-without-
informing-fs issue that triggered it has been fixed on the GUP side,
and experimental builds already catch this case with -EUCLEAN before
reaching the -EAGAIN return. However the subpage state mismatch is
still a correctness issue for non-experimental builds under error
injection or memory pressure (kzalloc failure in
btrfs_writepage_cow_fixup()).

Fix this by replacing folio_unlock() with btrfs_folio_end_lock_bitmap(),
which properly clears the locked bitmap bits before unlocking. For
non-subpage or when nr_locked is 0 (e.g. called from
extent_write_locked_range()), btrfs_folio_end_lock_bitmap() falls
through to plain folio_unlock(), so existing behavior is preserved.

Fixes: d034cdb4cc8a ("btrfs: lock subpage ranges in one go for writepage_delalloc()")
CC: stable@xxxxxxxxxxxxxxx
Signed-off-by: Werner Kasselman <werner@xxxxxxxxxxx>

I'm going through patch backlog, this patch has some relevance. We're
going to remove the fixup worker code in 7.2 completely so it cannot be
applied to the development branch anymore.

The problems are hard to hit or need error injection, I don't know if
it's worth to backport to stable.

I think the root fix, btrfs: check and set EXTENT_DELALLOC_NEW before clearing EXTENT_DELALLOC, is more relevant to backport.

And that fix is already CCed to stable, although only for 6.1+.
Older can be harder to backport.

With that fix backported, the cow fixup path will be a dead code, won't make any difference if fix the error path or not.

Thanks,
Qu

We've provided a long grace period to
the fixup worker before removal and I'm glad we can delete it and forget
about it. If somebody wants one last fix then I'm OK with that.