Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
From: Christoph Hellwig
Date: Mon May 18 2026 - 02:48:33 EST
On Thu, May 14, 2026 at 05:51:14PM -0400, Tal Zussman wrote:
> Some bio completion handlers need to run from preemptible task context,
> but bio_endio() may be called from IRQ context (e.g., buffer_head
> writeback). Callers need a way to ensure their callback eventually runs
> from a sleepable context. Add infrastructure for that, in two forms:
>
> 1. BIO_COMPLETE_IN_TASK, a bio flag the submitter sets when it knows
> in advance that its callback needs task context (e.g., dropbehind
> writeback). bio_endio() sees the flag and offloads completion to a
> worker automatically.
>
> 2. bio_complete_in_task(), a helper that completion callbacks can
> invoke from within bi_end_io() when the deferral decision is
> dynamic (e.g., fserror reporting).
Note that method 2 is unused as of this series. I do plan to add users
ASAP, and at one or two could even land through the block layer in this
merge window.
> Both share a per-CPU batch list drained by a delayed work item on a
> WQ_PERCPU workqueue. Producers push the bio onto the local CPU's batch
> and schedule the work item, which then dispatches each bio's bi_end_io()
> from task context. The delayed work item uses a 1-jiffie delay to allow
> batches of completions to accumulate before processing.
But this 1-jiffie delay also means we unconditionally increase
completion latency, which feels like a bad idea. Do you have any
measurements that show where it does benefit? Note that queing work
already often has very measurable latency on it's own. This also
directly contradics the erofs experience that even went to a RT
thread to reduce the latency.
> Both methods are gated on bio_in_atomic(), which returns true in any
> context where a sleeping bi_end_io() is unsafe, including
> non-preemptible task context. This logic is copied from commit
> c99fab6e80b7 ("erofs: fix atomic context detection when
> !CONFIG_DEBUG_LOCK_ALLOC").
Let's not copy it, but have a prep patch that moves the erofs logic
into the block layer under the new bio_in_atomic name.
> + while ((bio = bio_list_pop(&list)))
> + bio->bi_end_io(bio);
> +
> + if (need_resched()) {
> + bool is_empty;
> +
> + local_lock_irq(&bio_complete_batch.lock);
> + is_empty = bio_list_empty(&batch->list);
> + local_unlock_irq(&bio_complete_batch.lock);
> + if (!is_empty)
> + mod_delayed_work_on(batch->cpu,
> + bio_complete_wq,
> + &batch->work, 0);
> + break;
> + }
> + }
Ån all mainstream architetures we now default to lazy preempt, which
should remove the need for need_resched() calls. Do you have evidence
that we actually need this handling on recent kernels?
Otherwise this looks good to me.