Re: [PATCH RFC v4 1/3] block: add BIO_COMPLETE_IN_TASK for task-context completion

From: Christoph Hellwig

Date: Fri Mar 27 2026 - 02:02:52 EST


On Wed, Mar 25, 2026 at 02:43:00PM -0400, Tal Zussman wrote:
> Some bio completion handlers need to run in task context but bio_endio()
> can be called from IRQ context (e.g. buffer_head writeback). Add a
> BIO_COMPLETE_IN_TASK flag that bio submitters can set to request
> task-context completion of their bi_end_io callback.
>
> When bio_endio() sees this flag and is running in non-task context, it
> queues the bio to a per-cpu list and schedules a work item to call
> bi_end_io() from task context. A CPU hotplug dead callback drains any
> remaining bios from the departing CPU's batch.
>
> This will be used to enable RWF_DONTCACHE for block devices, and could
> be used for other subsystems like fscrypt that need task-context bio
> completion.
>
> Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Signed-off-by: Tal Zussman <tz2294@xxxxxxxxxxxx>
> ---
> block/bio.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/blk_types.h | 1 +
> 2 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 8203bb7455a9..69ee0d93041f 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -18,6 +18,7 @@
> #include <linux/highmem.h>
> #include <linux/blk-crypto.h>
> #include <linux/xarray.h>
> +#include <linux/local_lock.h>
>
> #include <trace/events/block.h>
> #include "blk.h"
> @@ -1714,6 +1715,60 @@ void bio_check_pages_dirty(struct bio *bio)
> }
> EXPORT_SYMBOL_GPL(bio_check_pages_dirty);
>
> +struct bio_complete_batch {
> + local_lock_t lock;
> + struct bio_list list;
> + struct work_struct work;
> +};
> +
> +static DEFINE_PER_CPU(struct bio_complete_batch, bio_complete_batch) = {
> + .lock = INIT_LOCAL_LOCK(lock),
> +};
> +
> +static void bio_complete_work_fn(struct work_struct *w)
> +{
> + struct bio_complete_batch *batch;
> + struct bio_list list;
> +
> +again:
> + local_lock_irq(&bio_complete_batch.lock);
> + batch = this_cpu_ptr(&bio_complete_batch);
> + list = batch->list;
> + bio_list_init(&batch->list);
> + local_unlock_irq(&bio_complete_batch.lock);
> +
> + while (!bio_list_empty(&list)) {
> + struct bio *bio = bio_list_pop(&list);
> + bio->bi_end_io(bio);
> + }

bio_list_pop already does a NULL check, so this could be:

while ((bio = bio_list_pop(&batch->list)))
bio->bi_end_io(bio);

In fact that same pattern is repeated later, so maybe just add a helper
for it? But I think Dave's idea of just using a llist (and adding a
new llist member to the bio for this) seems sensible. Just don't forget
the llist_reverse_order call to avoid reordering.

> +
> + local_lock_irq(&bio_complete_batch.lock);
> + batch = this_cpu_ptr(&bio_complete_batch);
> + if (!bio_list_empty(&batch->list)) {
> + local_unlock_irq(&bio_complete_batch.lock);
> +
> + if (!need_resched())
> + goto again;
> +
> + schedule_work_on(smp_processor_id(), &batch->work);
> + return;
> + }
> + local_unlock_irq(&bio_complete_batch.lock);

I don't really understand this requeue logic. Can you explain it?

> + schedule_work_on(smp_processor_id(), &batch->work);

We'll probably want a dedicated workqueue here to avoid deadlocks
vs other system wq uses.

> +static int bio_complete_batch_cpu_dead(unsigned int cpu)
> +{
> + struct bio_complete_batch *batch = per_cpu_ptr(&bio_complete_batch, cpu);

Overly long line.