Re: [PATCH bpf v3] bpf: do not use kmalloc_nolock when !HAVE_CMPXCHG_DOUBLE
From: Levi Zim
Date: Mon Mar 16 2026 - 20:41:28 EST
On 3/17/26 3:53 AM, Amery Hung wrote:
> On Sat, Mar 14, 2026 at 9:02 AM Levi Zim via B4 Relay
> <devnull+rsworktech.outlook.com@xxxxxxxxxx> wrote:
>>
>> From: Levi Zim <rsworktech@xxxxxxxxxxx>
>>
>> kmalloc_nolock always fails for architectures that lack cmpxchg16b.
>> For example, this causes bpf_task_storage_get with flag
>> BPF_LOCAL_STORAGE_GET_F_CREATE to fails on riscv64 6.19 kernel.
>>
>> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
>> But leave the PREEMPT_RT case as is because it requires kmalloc_nolock
>> for correctness. Add a comment about this limitation that architecture's
>> lack of CMPXCHG_DOUBLE combined with PREEMPT_RT could make
>> bpf_local_storage_alloc always fail.
>
> Let's not do this.
>
> This re-introduces deadlock to local storage. In addition, local
> storage will switch to using kmalloc_nolock() entirely.
I noticed the PREEMPT_RT case needs kmalloc_nolock for correctness and didn't
disable kmalloc_nolock in that code path when !HAVE_CMPXCHG_DOUBLE.
But in the original series [1], It appears that switching to kmalloc_nolock
is purely for performance benefits and not for fixing deadlocks in local storage.
And I didn't see any "Fixes" tag in [1].
Could you provide a more detailed explanation? Thanks!
[1]: https://lore.kernel.org/all/20251114201329.3275875-1-ameryhung@xxxxxxxxx/
> For riscv hardware without zacas extension, I think a workaround with
> some performance overhead is to enable CONFIG_SLUB_DEBUG and
> slub_debug options.
There is a patch [2] that enables HAVE_CMPXCHG_DOUBLE for riscv so I think
it will not be a problem for riscv in the future, even for hardware without zacas
extension because the kernel has fallback implementation for zacas.
However, this would still be an issue for other architectures. Currently only
x86, arm64, s390 and loongarch have HAVE_CMPXCHG_DOUBLE in 7.0-rc4.
I don't think letting users enable slub_debug would be a reasonable workaround.
[2]: https://patchew.org/linux/20260220074449.8526-1-mssola@xxxxxxxxxx/
Best regards,
Levi
>>
>> Fixes: f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage")
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Levi Zim <rsworktech@xxxxxxxxxxx>
>> ---
>> I find that bpf_task_storage_get with flag BPF_LOCAL_STORAGE_GET_F_CREATE
>> always fails for me on 6.19 kernel on riscv64 and bisected it.
>>
>> In f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock()
>> in local storage"), bpf memory allocator is replaced with kmalloc_nolock.
>> This approach is problematic for architectures that lack CMPXCHG_DOUBLE
>> because kmalloc_nolock always fails in this case:
>>
>> In function kmalloc_nolock (kmalloc_nolock_noprof):
>>
>> if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s))
>> /*
>> * kmalloc_nolock() is not supported on architectures that
>> * don't implement cmpxchg16b, but debug caches don't use
>> * per-cpu slab and per-cpu partial slabs. They rely on
>> * kmem_cache_node->list_lock, so kmalloc_nolock() can
>> * attempt to allocate from debug caches by
>> * spin_trylock_irqsave(&n->list_lock, ...)
>> */
>> return NULL;
>>
>> Fix it by enabling use_kmalloc_nolock only when HAVE_CMPXCHG_DOUBLE.
>> (But not for a PREEMPT_RT case as explained in the comment and commitmsg)
>>
>> Note for stable: this only needs to be picked into v6.19 if the patch
>> makes it into 7.0.
>> ---
>> Changes in v3:
>> - Use macro instead of const static variable to avoid triggering
>> warnings.
>> - Wrap lines at 80 columns
>> - Link to v2: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v2-1-576e33e4fa67@xxxxxxxxxxx
>>
>> Changes in v2:
>> - Drop the modification to the PREEMPT_RT case as it requires
>> kmalloc_nolock for correctness.
>> - Add a comment to the PREEMPT_RT case about the limitation when
>> not HAVE_CMPXCHG_DOUBLE but enables PREEMPT_RT.
>> - Link to v1: https://lore.kernel.org/r/20260314-bpf-kmalloc-nolock-v1-1-24abf3f75a9f@xxxxxxxxxxx
>> ---
>> include/linux/bpf_local_storage.h | 1 +
>> kernel/bpf/bpf_cgrp_storage.c | 3 ++-
>> kernel/bpf/bpf_local_storage.c | 4 ++++
>> kernel/bpf/bpf_task_storage.c | 3 ++-
>> 4 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
>> index 8157e8da61d40..d8f2c5d63a80e 100644
>> --- a/include/linux/bpf_local_storage.h
>> +++ b/include/linux/bpf_local_storage.h
>> @@ -18,6 +18,7 @@
>> #include <asm/rqspinlock.h>
>>
>> #define BPF_LOCAL_STORAGE_CACHE_SIZE 16
>> +#define KMALLOC_NOLOCK_SUPPORTED IS_ENABLED(CONFIG_HAVE_CMPXCHG_DOUBLE)
>>
>> struct bpf_local_storage_map_bucket {
>> struct hlist_head list;
>> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
>> index c2a2ead1f466d..cd18193c44058 100644
>> --- a/kernel/bpf/bpf_cgrp_storage.c
>> +++ b/kernel/bpf/bpf_cgrp_storage.c
>> @@ -114,7 +114,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
>>
>> static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>> {
>> - return bpf_local_storage_map_alloc(attr, &cgroup_cache, true);
>> + return bpf_local_storage_map_alloc(attr, &cgroup_cache,
>> + KMALLOC_NOLOCK_SUPPORTED);
>> }
>>
>> static void cgroup_storage_map_free(struct bpf_map *map)
>> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
>> index 9c96a4477f81a..a6c240da87668 100644
>> --- a/kernel/bpf/bpf_local_storage.c
>> +++ b/kernel/bpf/bpf_local_storage.c
>> @@ -893,6 +893,10 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
>> /* In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non
>> * preemptible context. Thus, enforce all storages to use
>> * kmalloc_nolock() when CONFIG_PREEMPT_RT is enabled.
>> + *
>> + * However, kmalloc_nolock would fail on architectures that do not
>> + * have CMPXCHG_DOUBLE. On such architectures with PREEMPT_RT,
>> + * bpf_local_storage_alloc would always fail.
>> */
>> smap->use_kmalloc_nolock = IS_ENABLED(CONFIG_PREEMPT_RT) ? true : use_kmalloc_nolock;
>>
>> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
>> index 605506792b5b4..6e8597edea314 100644
>> --- a/kernel/bpf/bpf_task_storage.c
>> +++ b/kernel/bpf/bpf_task_storage.c
>> @@ -212,7 +212,8 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
>>
>> static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
>> {
>> - return bpf_local_storage_map_alloc(attr, &task_cache, true);
>> + return bpf_local_storage_map_alloc(attr, &task_cache,
>> + KMALLOC_NOLOCK_SUPPORTED);
>> }
>>
>> static void task_storage_map_free(struct bpf_map *map)
>>
>> ---
>> base-commit: e06e6b8001233241eb5b2e2791162f0585f50f4b
>> change-id: 20260314-bpf-kmalloc-nolock-60da80e613de
>>
>> Best regards,
>> --
>> Levi Zim <rsworktech@xxxxxxxxxxx>
>>
>>