Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
From: Tal Zussman
Date: Fri May 22 2026 - 18:48:39 EST
On 5/18/26 2:48 AM, Christoph Hellwig wrote:
> On Thu, May 14, 2026 at 05:51:14PM -0400, Tal Zussman wrote:
>> Some bio completion handlers need to run from preemptible task context,
>> but bio_endio() may be called from IRQ context (e.g., buffer_head
>> writeback). Callers need a way to ensure their callback eventually runs
>> from a sleepable context. Add infrastructure for that, in two forms:
>>
>> 1. BIO_COMPLETE_IN_TASK, a bio flag the submitter sets when it knows
>> in advance that its callback needs task context (e.g., dropbehind
>> writeback). bio_endio() sees the flag and offloads completion to a
>> worker automatically.
>>
>> 2. bio_complete_in_task(), a helper that completion callbacks can
>> invoke from within bi_end_io() when the deferral decision is
>> dynamic (e.g., fserror reporting).
>
> Note that method 2 is unused as of this series. I do plan to add users
> ASAP, and at one or two could even land through the block layer in this
> merge window.
>
>> Both share a per-CPU batch list drained by a delayed work item on a
>> WQ_PERCPU workqueue. Producers push the bio onto the local CPU's batch
>> and schedule the work item, which then dispatches each bio's bi_end_io()
>> from task context. The delayed work item uses a 1-jiffie delay to allow
>> batches of completions to accumulate before processing.
>
> But this 1-jiffie delay also means we unconditionally increase
> completion latency, which feels like a bad idea. Do you have any
> measurements that show where it does benefit? Note that queing work
> already often has very measurable latency on it's own. This also
> directly contradics the erofs experience that even went to a RT
> thread to reduce the latency.
I added this per Dave's feedback on v4, where he noted that XFS inodegc
uses a delayed work item to avoid context switch storms. There's only a
delay for the first bio in a batch to complete, as we only delay when the
list is empty. I'll run some experiments and measure context switches,
completion latency, etc. to see if this is necessary.
>> Both methods are gated on bio_in_atomic(), which returns true in any
>> context where a sleeping bi_end_io() is unsafe, including
>> non-preemptible task context. This logic is copied from commit
>> c99fab6e80b7 ("erofs: fix atomic context detection when
>> !CONFIG_DEBUG_LOCK_ALLOC").
>
> Let's not copy it, but have a prep patch that moves the erofs logic
> into the block layer under the new bio_in_atomic name.
Will do.
>> + while ((bio = bio_list_pop(&list)))
>> + bio->bi_end_io(bio);
>> +
>> + if (need_resched()) {
>> + bool is_empty;
>> +
>> + local_lock_irq(&bio_complete_batch.lock);
>> + is_empty = bio_list_empty(&batch->list);
>> + local_unlock_irq(&bio_complete_batch.lock);
>> + if (!is_empty)
>> + mod_delayed_work_on(batch->cpu,
>> + bio_complete_wq,
>> + &batch->work, 0);
>> + break;
>> + }
>> + }
>
> Ån all mainstream architetures we now default to lazy preempt, which
> should remove the need for need_resched() calls. Do you have evidence
> that we actually need this handling on recent kernels?
No evidence - I added this per feedback on v3, but agreed that it can be
simplified.
> Otherwise this looks good to me.
>
Thanks - AI review found a couple more small things, which I'll respond to
in a separate message.