Re: [PATCH v6 1/4] block: add task-context bio completion infrastructure
From: Christoph Hellwig
Date: Mon May 25 2026 - 01:25:00 EST
[adding the PREEMPT-RT maintainers and list for one and a half questions
for them a bit below]
On Fri, May 22, 2026 at 07:09:59PM -0400, Tal Zussman wrote:
> > + while ((bio = bio_list_pop(&list)))
> > + bio->bi_end_io(bio);
>
> Q: "Does this code handle bios that do not have a completion callback?
> If a bio is submitted with the BIO_COMPLETE_IN_TASK flag but has a NULL
> bi_end_io pointer (like some flush or discard bios), executing
> bio->bi_end_io(bio) directly will cause a NULL pointer dereference.
> The synchronous completion path in bio_endio() checks if bio->bi_end_io is
> non-NULL before calling it. Should a similar check be added here?"
>
> A: Every bio that reaches this point should have bi_end_io set. There's no
> point in deferring a bio without a completion handler.
Yes. The only bios without bi_end_io that can end up in bio_endio
are those attached to passthrough requests, and deferring them of
course doesn't make sense.
> Q: "Could this loop also monopolize the CPU and cause latency spikes?
> Because the work uses a 1-jiffie delay, tens of thousands of bios might
> accumulate under heavy I/O load. Processing the entire batch sequentially
> without yielding or calling cond_resched() inside the inner loop could
> lead to scheduler latency or watchdog warnings on non-preemptible kernels."
>
> A: With PREEMPT_LAZY as the default, the worker is preemptible, as per
> Hillf and Christoph, so this is unnecessary.
Yes. Sashiko tends to spew this a lot, which is odd as there were clear
recommendations to only do this absolutely if needed even before
PREEMPT_LAZY.
> > +
> > + local_lock_irqsave(&bio_complete_batch.lock, flags);
>
> Q: "Is it safe to use local_lock_irqsave() here when called from an atomic
> context?
> On CONFIG_PREEMPT_RT kernels, local_lock_t maps to a per-CPU spinlock_t,
> which is a sleepable rt_mutex. Since __bio_complete_in_task() is specifically
> called when bio_in_atomic() is true (which includes hardware interrupts or
> execution under a raw_spinlock_t), attempting to acquire a sleepable lock
> here would trigger an "Invalid wait context" lockdep warning.
> Would a lockless list (llist) be more appropriate here to avoid sleeping
> in atomic contexts?"
>
> A: This seems legit, but I'm not super familiar with PREEMPT_RT. I don't want
> to switch to raw_spinlock_t, as it seems like that would add unnecessary
> overhead on non-PREEMPT_RT kernels. I think switching to use local_irq_save()
> (as is done for the per-CPU bio allocation cache) should work.
Adding the PREEMPT_RT maintainers for this as it is above my pay grade.
> Q: "Does creating this workqueue with WQ_MEM_RECLAIM break the local_lock_irq()
> protection in bio_complete_work_fn()?
> When a workqueue has WQ_MEM_RECLAIM, it spawns a global rescuer thread that
> can execute per-CPU work items during worker pool congestion. This rescuer
> thread executes unbound, meaning it could run on CPU B while processing
> CPU A's work item.
> Since local_lock operates strictly on the currently executing CPU, the
> rescuer thread on CPU B would acquire CPU B's lock, while popping elements
> from CPU A's list (derived via container_of()).
> If an interrupt on CPU A concurrently calls __bio_complete_in_task(),
> it will acquire CPU A's lock and modify the same list without mutual
> exclusion, potentially causing list corruption."
>
> A: The rescuer should run on the same CPU, not unbound, so this is not an
> issue.
This is another area where the PREEMPT_RT/scheduler folks might be able
to help.
> static inline bool bio_complete_in_task(struct bio *bio)
> {
> if (bio_flagged(bio, BIO_COMPLETE_IN_TASK))
> return false;
> if (!bio_in_atomic())
> return false;
> bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
> __bio_complete_in_task(bio);
> return true;
> }
>
> We can use the BIO_COMPLETE_IN_TASK flag to indicate that it's already
> been deferred to the workqueue as is safe to run.
Would be nice to avoid this, but yes.