Re: [PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
From: Amery Hung
Date: Tue Mar 17 2026 - 13:36:07 EST
On Mon, Mar 16, 2026 at 5:40 PM Levi Zim <rsworktech@xxxxxxxxxxx> wrote:
>
>
>
> On 3/17/26 3:53 AM, Amery Hung wrote:
> > On Sat, Mar 14, 2026 at 9:02 AM Levi Zim via B4 Relay
> > <devnull+rsworktech.outlook.com@xxxxxxxxxx> wrote:
> >>
> >> From: Levi Zim <rsworktech@xxxxxxxxxxx>
> >>
> >> kmalloc_nolock always fails for architectures that lack cmpxchg16b.
> >> For example, this causes bpf_task_storage_get with flag
> >> BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.
> >>
> >> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
> >> But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
> >> for correctness. Add a comment about this limitation that architecture's
> >> lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
> >> bpf_local_storage_alloc always fail.
> >
> > Let's not do this.
> >
> > This re-introduces deadlock to local storage. In addition, local
> > storage will switch to using kmalloc_nolock() entirely.
>
> I noticed the PREEMPT_RT case needs kmalloc_nolock for correctness and didn't
> disable kmalloc_nolock in that code path when !HAVE_CMPXCHG_DOUBLE.
>
It is also not correct to use kmalloc() on !PREEMPT_RT. A BPF program
in NMI or tracing kmalloc internal that tries to allocate memory for
local storage can deadlock.
> But in the original series [1], It appears that switching to kmalloc_nolock
> is purely for performance benefits and not for fixing deadlocks in local storage.
> And I didn't see any "Fixes" tag in [1].
>
There is no deadlock issue with BPF memory allocator and
kmalloc_nolock(). For context, the patch you are referencing is part
of a series of refactoring to
- Move from BPF memory allocator to kmalloc_nolock() [1]
- Remove percpu busy lock [2]
Please refer to the cover letter for the motivation and changes made
[1] https://lore.kernel.org/bpf/20251112175939.2365295-1-ameryhung@xxxxxxxxx/
[2] https://lore.kernel.org/bpf/20260205222916.1788211-1-ameryhung@xxxxxxxxx/
> Could you provide a more detailed explanation? Thanks!
>
> [1]: https://lore.kernel.org/all/20251114201329.3275875-1-ameryhung@xxxxxxxxx/
>
> > For riscv hardware without zacas extension, I think a workaround with
> > some performance overhead is to enable CONFIG_SLUB_DEBUG and
> > slub_debug options.
>
> There is a patch [2] that enables HAVE_CMPXCHG_DOUBLE for riscv so I think
> it will not be a problem for riscv in the future, even for hardware without zacas
> extension because the kernel has fallback implementation for zacas.
>
> However, this would still be an issue for other architectures. Currently only
> x86, arm64, s390 and loongarch have HAVE_CMPXCHG_DOUBLE in 7.0-rc4.
> I don't think letting users enable slub_debug would be a reasonable workaround.
>
> [2]: https://patchew.org/linux/20260220074449.8526-1-mssola@xxxxxxxxxx/
>
> Best regards,
> Levi
>
> >>
> >> Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Signed-off-by: Levi Zim <rsworktech@xxxxxxxxxxx>
> >> ---
> >> I find that bpf_task_storage_get with flag BPF_LOCAL_STORAGE_GET_F_CREATE
> >> always fails for me on 6.19 kernel on riscv64 and bisected it.
> >>
> >> In f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock()
> >> in local storage"), bpf memory allocator is replaced with kmalloc_nolock.
> >> This approach is problematic for architectures that lack CMPXCHG_DOUBLE
> >> because kmalloc_nolock always fails in this case:
> >>
> >> In function kmalloc_nolock (kmalloc_nolock_noprof):
> >>
> >> if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
> >> /*
> >> * kmalloc_nolock() is not supported on architectures that
> >> * don't implement cmpxchg16b, but debug caches don't use
> >> * per-cpu slab and per-cpu partial slabs. They rely on
> >> * kmem_cache_node->list_lock, so kmalloc_nolock() can
> >> * attempt to allocate from debug caches by
> >> * spin_trylock_irqsave(&n->list_lock, ...)
> >> */
> >> return NULL;
> >>
> >> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
> >> (But not for a PREEMPT_RT case as explained in the comment and commitmsg)
> >>
> >> Note for stable: this only needs to be picked into v6.19 if the patch
> >> makes it into 7.0.
> >> ---
> >> Changes in v3:
> >> - Use macro instead of const static variable to avoid triggering
> >> warnings.
> >> - Wrap lines at 80 columns
> >> - Link to v2: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v2-1-576e33e4fa67@xxxxxxxxxxx
> >>
> >> Changes in v2:
> >> - Drop the modification to the PREEMPT_RT case as it requires
> >> kmalloc_nolock for correctness.
> >> - Add a comment to the PREEMPT_RT case about the limitation when
> >> not HAVE_CMPXCHG_DOUBLE but enables PREEMPT_RT.
> >> - Link to v1: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v1-1-24abf3f75a9f@xxxxxxxxxxx
> >> ---
> >> include/linux/bpf_local_storage.h | 1 +
> >> kernel/bpf/bpf_cgrp_storage.c | 3 ++-
> >> kernel/bpf/bpf_local_storage.c | 4 ++++
> >> kernel/bpf/bpf_task_storage.c | 3 ++-
> >> 4 files changed, 9 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> >> index 8157e8da61d40..d8f2c5d63a80e 100644
> >> --- a/include/linux/bpf_local_storage.h
> >> +++ b/include/linux/bpf_local_storage.h
> >> @@ -18,6 +18,7 @@
> >> #include <asm/rqspinlock.h>
> >>
> >> #define BPF_LOCAL_STORAGE_CACHE_SIZE 16
> >> +#define KMALLOC_NOLOCK_SUPPORTED IS_ENABLED(CONFIG_HAVE_CMPXCHG_DOUBLE)
> >>
> >> struct bpf_local_storage_map_bucket {
> >> struct hlist_head list;
> >> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
> >> index c2a2ead1f466d..cd18193c44058 100644
> >> --- a/kernel/bpf/bpf_cgrp_storage.c
> >> +++ b/kernel/bpf/bpf_cgrp_storage.c
> >> @@ -114,7 +114,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
> >>
> >> static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
> >> {
> >> - return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
> >> + return bpf_local_storage_map_alloc(attr, &cgroup_cache,
> >> + KMALLOC_NOLOCK_SUPPORTED);
> >> }
> >>
> >> static void cgroup_storage_map_free(struct bpf_map *map)
> >> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> >> index 9c96a4477f81a..a6c240da87668 100644
> >> --- a/kernel/bpf/bpf_local_storage.c
> >> +++ b/kernel/bpf/bpf_local_storage.c
> >> @@ -893,6 +893,10 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
> >> /* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
> >> * preemptible context. Thus, enforce all storages to use
> >> * kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
> >> + *
> >> + * However, kmalloc_nolock would fail on architectures that do not
> >> + * have CMPXCHG_DOUBLE. On such architectures with PREEMPT_RT,
> >> + * bpf_local_storage_alloc would always fail.
> >> */
> >> smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
> >>
> >> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> >> index 605506792b5b4..6e8597edea314 100644
> >> --- a/kernel/bpf/bpf_task_storage.c
> >> +++ b/kernel/bpf/bpf_task_storage.c
> >> @@ -212,7 +212,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
> >>
> >> static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
> >> {
> >> - return bpf_local_storage_map_alloc(attr, &task_cache, true);
> >> + return bpf_local_storage_map_alloc(attr, &task_cache,
> >> + KMALLOC_NOLOCK_SUPPORTED);
> >> }
> >>
> >> static void task_storage_map_free(struct bpf_map *map)
> >>
> >> ---
> >> base-commit: e06e6b8001233241eb5b2e2791162f0585f50f4b
> >> change-id: 20260314-bpf-kmalloc-nolock-60da80e613de
> >>
> >> Best regards,
> >> --
> >> Levi Zim <rsworktech@xxxxxxxxxxx>
> >>
> >>
>