Re: [PATCH] md/raid5: skip 2-failure compute when other disk is R5_LOCKED

Next message: Rosen Penev: "[PATCH] irqchip/stm32-exti: Use kzalloc_flex"
Previous message: Yi Lai: "[PATCH 2/2] vfio: selftests: Align __iommu_unmap() check with iommufd compat mode"
In reply to: FengWei Shih: "[PATCH] md/raid5: skip 2-failure compute when other disk is R5_LOCKED"
Next in thread: Yu Kuai: "Re: [PATCH] md/raid5: skip 2-failure compute when other disk is R5_LOCKED"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Yu Kuai

Date: Fri Mar 20 2026 - 00:04:50 EST

在 2026/3/19 13:33, FengWei Shih 写道:

> When skip_copy is enabled on a doubly-degraded RAID6, a device that is
> being written to will be in R5_LOCKED state with R5_UPTODATE cleared.
> If a new read triggers fetch_block() while the write is still in
> flight, the 2-failure compute path may select this locked device as a
> compute target because it is not R5_UPTODATE.
>
> Because skip_copy makes the device page point directly to the bio page,
> reconstructing data into it might be risky. Also, since the compute
> marks the device R5_UPTODATE, it triggers WARN_ON in ops_run_io()
> which checks that R5_SkipCopy and R5_UPTODATE are not both set.
>
> This can be reproduced by running small-range concurrent read/write on
> a doubly-degraded RAID6 with skip_copy enabled, for example:
>
> mdadm -C /dev/md0 -l6 -n6 -R -f /dev/loop[0-3] missing missing
> echo 1 > /sys/block/md0/md/skip_copy
> fio --filename=/dev/md0 --rw=randrw --bs=4k --numjobs=8 \
> --iodepth=32 --size=4M --runtime=30 --time_based --direct=1
>
> Fix by checking R5_LOCKED before proceeding with the compute. The
> compute will be retried once the lock is cleared on IO completion.
>
> Signed-off-by: FengWei Shih<dannyshih@xxxxxxxxxxxx>
> ---
> drivers/md/raid5.c | 2 ++
> 1 file changed, 2 insertions(+)

Reviewed-by: Yu Kuai <yukuai@xxxxxxxxx>

--
Thansk,
Kuai