Re: [PATCH] btrfs: wait for in-flight readahead BIOs on open_ctree() error

From: Qu Wenruo

Date: Sun Mar 29 2026 - 18:06:49 EST




在 2026/3/30 03:53, Teng Liu 写道:
Thanks for your review!
On 2026-03-29 17:33, Qu Wenruo wrote:


This doesn't make any sense to me.
It confuses me as well when I try to reproduce the bug. The reported
claimed that btrfs_bio_counter_sub triggered a use-after-free but this
function lives under `dev-reaplce.c` which should have nothing to do
with the setting from the name.

However when I checked the function call chain:

open_ctree()
→ btrfs_read_sys_array() # OK — sys_chunk_array in superblock is intact
→ load_super_root(chunk_root) # OK — reads root node, passes validation
→ btrfs_read_chunk_tree()
→ btrfs_for_each_slot()
→ readahead_tree_node_children(node)
→ for each child pointer in the internal node:
btrfs_readahead_node_child()
→ btrfs_readahead_tree_block()
→ read_extent_buffer_pages_nowait()
→ btrfs_submit_bbio()
→ btrfs_submit_chunk()
→ btrfs_bio_counter_inc_blocked() ← bio_counter++
→ btrfs_map_block()
→ submit_bio() ← sent to USB drive

Even you wait for all bios, it can still cause problems.

As the bio counter is only for btrfs bio layer, we still have btrfs_bio::end_io called after btrfs_bio_counter_dec().

And if the full fs_info has been freed, then at end_bbio_meta_read(), we can still have problems as btrfs_validate_extent_buffer() will access eb (bbio->private) and fs_info (eb->fs_info), which triggers use after free.

So using that bio counter is not going to solve all problems, but only reducing the race window thus masking the problem.


After submit_bio() sends BIO to USB drive, we continue on
read_one_dev():

open_ctree()
→ btrfs_read_sys_array() # OK — sys_chunk_array in superblock is intact
→ load_super_root(chunk_root) # OK — reads root node, passes validation
→ btrfs_read_chunk_tree()
→ btrfs_for_each_slot()
→ readahead_tree_node_children(node)
→ bio_coutner++ and submit_bio() send BIO to USB drive
→ read_one_dev()

This read_one_dev will return an error since the leaf block is actually
corrupted. Then open_ctree will get into error path and try to free
fs_info.

After USB device finished BIO, it will try to decreament the counter but
the fs_info is already freed.

Any suggestions on this?

The following ideas come up to me, but neither seems as simple as your current one:

1) Introduce a dedicated counter for metadata readahead/reads
This seems to be the simplest one among all.
But the only usage is only the error handling, thus may not be
worthy.

2) Disable metadata readahead during open_ctree()
Which will delay the mount, especially for large extent tree without
bgt feature.

3) Use buffer_tree xarray to iterate through all ebs
Since this is only for error handling of open_ctree(), we're fine to
do the full xarray iteration, and wait for any eb that has
EXTENT_BUFFER_READING flag.

The problem is, we do not have a dedicated tag like
PAGECACHE_TAG_(TOWRITE|DIRTY) to easily catch all dirty/writeback
ebs.
So the only option is to go through each eb and check their flags.

I think this is the one with minimal impact, but may cause much
longer runtime during this error handling path.

My personal preference is option 3).




The wait and counter are all for dev-reaplce, not matching your description
of the generic metadata readahead.

If you want to wait for all existing metadata reads, I didn't find a good
helper, thus you will need to go through all extent buffers and wait for
EXTENT_BUFFER_READING flags.