Re: [PATCH v4] btrfs: validate data reloc tree file extent item members

From: David Sterba

Date: Mon May 25 2026 - 11:13:28 EST


On Wed, May 13, 2026 at 01:35:44PM +0200, Teng Liu wrote:
> get_new_location() uses BUG_ON() to crash the kernel if the file extent
> item it looks up has any of offset, compression, encryption, or
> other_encoding set non-zero. The data reloc inode is only written by
> relocation's own paths and the four fields are always 0 in what the
> kernel writes:
>
> - insert_prealloc_file_extent() memsets the stack item to zero and
> only fills in type, disk_bytenr, disk_num_bytes and num_bytes, so
> offset/compression/encryption/other_encoding stay 0.
> - insert_ordered_extent_file_extent() copies oe->compress_type into
> the file extent's compression field, but the data reloc inode is
> created with BTRFS_INODE_NOCOMPRESS so compress_type is always 0;
> encryption and other_encoding are reserved-and-zero in btrfs.
>
> A non-zero value here means the leaf decoded from disk does not match
> what the kernel wrote, i.e. on-disk corruption. A malformed image
> reaches this code via balance and panics the kernel.
>
> A previous attempt to enforce all four constraints in tree-checker's
> check_extent_data_item() was merged as commit 7d0ee95979e9 ("btrfs:
> validate data reloc tree file extent item members in tree-checker")
> and then reverted by commit 1c034697fcaa after btrfs/061 produced
> false positives on arm64 with 64K pages. The reason: relocation
> writeback legitimately produces REG file_extent_items with offset != 0
> in the data reloc tree. When an ordered extent covers only the back
> portion of an underlying PREALLOC (num_bytes < ram_bytes on the input
> file_extent), insert_ordered_extent_file_extent() inserts a REG with
>
> offset = oe->offset
> num_bytes = oe->num_bytes
> ram_bytes preserved from the original PREALLOC,
>
> and this item can reach disk if a transaction commit fires while it
> is present in the leaf.
>
> The four fields belong in different layers:
>
> - compression, encryption and other_encoding are universal
> invariants for every item in the data reloc tree, regardless of
> cluster geometry. Enforce them in tree-checker's
> check_extent_data_item() so a corrupt leaf is rejected at read
> time.
>
> - offset is only an invariant at the cluster-boundary keys that
> get_new_location() searches (the key is computed as
> src_disk_bytenr - reloc_block_group_start). Partial-PREALLOC
> writebacks legitimately place REG items at non-boundary keys with
> offset != 0; tree-checker cannot reject these. The cluster-
> boundary item is always written by either
> insert_prealloc_file_extent() (offset=0 by memset) or by the
> front portion of a partial writeback (offset=0 by construction),
> so a non-zero offset there is corruption.
>
> Enforce the universal invariants in check_extent_data_item() with a
> file_extent_err() rejection. Convert the BUG_ON() in
> get_new_location() to a -EUCLEAN return paired with btrfs_print_leaf()
> and btrfs_err() so the offending leaf is logged. The caller in
> replace_file_extents() already handles non-zero returns from
> get_new_location() by breaking out of the loop without aborting the
> transaction.
>
> Suggested-by: Qu Wenruo <wqu@xxxxxxxx>
> Suggested-by: David Sterba <dsterba@xxxxxxxx>
> Reported-by: syzbot+3e20d8f3d41bac5dc9a2@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=3e20d8f3d41bac5dc9a2
> Signed-off-by: Teng Liu <27rabbitlt@xxxxxxxxx>

Added to for-next, thanks.