Re: [PATCH] btrfs: wait for in-flight readahead BIOs on open_ctree() error
From: Qu Wenruo
Date: Sun Mar 29 2026 - 18:23:04 EST
在 2026/3/30 08:36, Qu Wenruo 写道:
在 2026/3/30 03:53, Teng Liu 写道:
Thanks for your review!
On 2026-03-29 17:33, Qu Wenruo wrote:
It confuses me as well when I try to reproduce the bug. The reported
This doesn't make any sense to me.
claimed that btrfs_bio_counter_sub triggered a use-after-free but this
function lives under `dev-reaplce.c` which should have nothing to do
with the setting from the name.
However when I checked the function call chain:
open_ctree()
→ btrfs_read_sys_array() # OK — sys_chunk_array in superblock is intact
→ load_super_root(chunk_root) # OK — reads root node, passes validation
→ btrfs_read_chunk_tree()
→ btrfs_for_each_slot()
→ readahead_tree_node_children(node)
→ for each child pointer in the internal node:
btrfs_readahead_node_child()
→ btrfs_readahead_tree_block()
→ read_extent_buffer_pages_nowait()
→ btrfs_submit_bbio()
→ btrfs_submit_chunk()
→ btrfs_bio_counter_inc_blocked() ← bio_counter++
→ btrfs_map_block()
→ submit_bio() ← sent to USB drive
Even you wait for all bios, it can still cause problems.
As the bio counter is only for btrfs bio layer, we still have btrfs_bio::end_io called after btrfs_bio_counter_dec().
And if the full fs_info has been freed, then at end_bbio_meta_read(), we can still have problems as btrfs_validate_extent_buffer() will access eb (bbio->private) and fs_info (eb->fs_info), which triggers use after free.
So using that bio counter is not going to solve all problems, but only reducing the race window thus masking the problem.
After submit_bio() sends BIO to USB drive, we continue on
read_one_dev():
open_ctree()
→ btrfs_read_sys_array() # OK — sys_chunk_array in superblock is intact
→ load_super_root(chunk_root) # OK — reads root node, passes validation
→ btrfs_read_chunk_tree()
→ btrfs_for_each_slot()
→ readahead_tree_node_children(node)
→ bio_coutner++ and submit_bio() send BIO to USB drive
→ read_one_dev()
This read_one_dev will return an error since the leaf block is actually
corrupted. Then open_ctree will get into error path and try to free
fs_info.
After USB device finished BIO, it will try to decreament the counter but
the fs_info is already freed.
Any suggestions on this?
The following ideas come up to me, but neither seems as simple as your current one:
1) Introduce a dedicated counter for metadata readahead/reads
This seems to be the simplest one among all.
But the only usage is only the error handling, thus may not be
worthy.
2) Disable metadata readahead during open_ctree()
Which will delay the mount, especially for large extent tree without
bgt feature.
3) Use buffer_tree xarray to iterate through all ebs
Since this is only for error handling of open_ctree(), we're fine to
do the full xarray iteration, and wait for any eb that has
EXTENT_BUFFER_READING flag.
The problem is, we do not have a dedicated tag like
PAGECACHE_TAG_(TOWRITE|DIRTY) to easily catch all dirty/writeback
ebs.
So the only option is to go through each eb and check their flags.
I think this is the one with minimal impact, but may cause much
longer runtime during this error handling path.
My personal preference is option 3).
Or the 4th one, which is only an idea and I haven't yet verified:
4) Handle error from invalidate_inode_pages2()
Currently we just call invalidate_inode_pages2() on btree inode and
expect it to return 0.
But if there is still an eb reading pending, it will make that
function to return -EBUSY, as try_release_extent_buffer() will
find a eb whose refs is not 0, and refuse the release that eb which
belongs to a folio.
That should be a good indicator of any pending metadata reads.
So if that invalidate_inode_pages2() returned -EBUSY, we should wait
retry until it returns 0.
The wait and counter are all for dev-reaplce, not matching your description
of the generic metadata readahead.
If you want to wait for all existing metadata reads, I didn't find a good
helper, thus you will need to go through all extent buffers and wait for
EXTENT_BUFFER_READING flags.