Re: [PATCH] bpf: Always defer local storage free
From: Kumar Kartikeya Dwivedi
Date: Mon Mar 16 2026 - 19:41:19 EST
On Mon, 16 Mar 2026 at 23:28, Andrea Righi <arighi@xxxxxxxxxx> wrote:
>
> bpf_task_storage_delete() can be invoked from contexts that hold a raw
> spinlock, such as sched_ext's ops.exit_task() callback, that is running
> with the rq lock held.
>
> The delete path eventually calls bpf_selem_unlink(), which frees the
> element via bpf_selem_free_list() -> bpf_selem_free(). For task storage
> with use_kmalloc_nolock, call_rcu_tasks_trace() is used, which is not
> safe from raw spinlock context, triggering the following:
>
Paul posted [0] to fix it in SRCU. It was always safe to
call_rcu_tasks_trace() under raw spin lock, but became problematic on
RT with the recent conversion that uses SRCU underneath, please give
[0] a spin. While I couldn't reproduce the warning using scx_cosmos, I
verified that it goes away for me when calling the path from atomic
context.
[0]: https://lore.kernel.org/rcu/841c8a0b-0f50-4617-98b2-76523e13b910@paulmck-laptop
> =============================
> [ BUG: Invalid wait context ]
> 7.0.0-rc1-virtme #1 Not tainted
> -----------------------------
> (udev-worker)/115 is trying to lock:
> ffffffffa6970dd0 (rcu_tasks_trace_srcu_struct_srcu_usage.lock){....}-{3:3}, at: spin_lock_irqsave_ssp_contention+0x54/0x90
> other info that might help us debug this:
> context-{5:5}
> 3 locks held by (udev-worker)/115:
> #0: ffff8e16c634ce58 (&p->pi_lock){-.-.}-{2:2}, at: _task_rq_lock+0x2c/0x100
> #1: ffff8e16fbdbdae0 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x24/0xb0
> #2: ffffffffa6971b60 (rcu_read_lock){....}-{1:3}, at: __bpf_prog_enter+0x64/0x110
> ...
> Sched_ext: cosmos_1.0.7_g780e898fc_dirty_x86_64_unknown_linux_gnu (enabled+all), task: runnable_at=-2ms
> Call Trace:
> dump_stack_lvl+0x6f/0xb0
> __lock_acquire+0xf86/0x1de0
> lock_acquire+0xcf/0x310
> _raw_spin_lock_irqsave+0x39/0x60
> spin_lock_irqsave_ssp_contention+0x54/0x90
> srcu_gp_start_if_needed+0x2a7/0x490
> bpf_selem_unlink+0x24b/0x590
> bpf_task_storage_delete+0x3a/0x90
> bpf_prog_3b623b4be76cfb86_scx_pmu_task_fini+0x26/0x2a
> bpf_prog_4b1530d9d9852432_cosmos_exit_task+0x1d/0x1f
> bpf__sched_ext_ops_exit_task+0x4b/0xa7
> __scx_disable_and_exit_task+0x10a/0x200
> scx_disable_and_exit_task+0xe/0x60
>
> Fix by deferring memory deallocation to ensure it occurs outside the raw
> spinlock context.
>
> Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
> Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
> ---
> [...]