[BUG] ext4: delayed-free buddy load error reaches BUG_ON in ext4_process_freed_data

From: Yifei Chu

Date: Sun May 24 2026 - 11:15:03 EST


Hello,

Short version: I am reporting an ext4 delayed-free error-path bug found with targeted fault injection. The injected -EIO is in ext4_mb_load_buddy()’s normal error-return domain, and the injection is placed at the helper return boundary for the delayed-free caller. With that rare lower-layer failure made deterministic, ext4_free_data_in_buddy() reaches BUG_ON(err != 0) and crashes the kernel.

Tested kernel:

v7.1-rc4-640-g79bd2dded182
79bd2dded182b1d458b18e62684b7f82ffc682e5
x86_64 QEMU, KASAN config

The relevant code shape in fs/ext4/mballoc.c is:

err = ext4_mb_load_buddy(sb, entry->efd_group, &e4b);
/ we expect to find existing buddy because it’s pinned /
BUG_ON(err != 0);

The point of the injection is not to corrupt ext4 state. It only makes a plausible buddy/bitmap load failure deterministic at this caller, so the caller’s error handling can be tested. ext4_mb_load_buddy() already has normal negative error returns from metadata loading paths.

Reproducer shape:

  1. Mount a fresh ext4 filesystem.
  2. Create and fsync a 256 KiB file.
  3. Unlink the file.
  4. Call sync() to force delayed-free processing.
  5. The instrumentation forces ext4_mb_load_buddy() to return -EIO at the delayed-free callsite.

Two fresh image runs reproduced the same crash:

AGENT_INIT: unlink ret=0 errno=0 (Success)
AGENT_INIT: calling sync to force delayed free processing
EXT4-fs: AGENT_EXT4_FREE_DATA_BUDDY_BUGON: forcing ext4_mb_load_buddy EIO before BUG_ON
kernel BUG at fs/ext4/mballoc.c:3990!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:ext4_process_freed_data+0x1fe/0x510

I did a local duplicate sweep and found related older ext4_mb_load_buddy()/mballoc fixes, but I did not find a direct current-upstream fix for this delayed-free BUG_ON(err != 0) path.

Expected behavior:

A metadata load failure during delayed-free processing should go through ext4 error handling / transaction abort / filesystem error propagation, rather than treating the error as an impossible invariant and BUGing the kernel.

The attached tarball includes README.md, repro_init.c, instrumentation.patch, and both full serial logs.

Thanks,
Chuyifei

Attachment: ext4_free_data_buddy_load_error_bugon_20260524.tar.gz
Description: Unix tar archive