Re: [PATCH v3] libbpf: fix UAF in strset__add_str()

From: Andrii Nakryiko

Date: Fri May 22 2026 - 14:36:27 EST


On Mon, May 18, 2026 at 10:36 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Sun, May 17, 2026 at 10:05 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote:
> >
> > strset_add_str_mem() might reallocate the strset data buffer in order to
> > accommodate the provided string 's'. However, if 's' points to a string
> > already present in the buffer, it becomes dangling after the realloc.
> > This leads to a use-after-free when attempting to memcpy() the string
> > into the new buffer.
> >
> > One scenario that triggers this problematic path is when resolve_btfids
> > attempts to patch kfunc prototypes using existing BTF parameter names:
> >
> > | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> > | Segmentation fault (core dumped)
> >
> > Compiling resolve_btfids with fsanitize=address generates a detailed
> > report of the UAF:
> >
> > | =================================================================
> > | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> > | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> > | READ of size 5 at 0x7f4c4a500bd4 thread T0
> > | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> > | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> > | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> > | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> > | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> > | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> > | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> > | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> > | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> > | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> > |
> > | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> > | freed by thread T0 here:
> > | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> > | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> > | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> > |
> > | previously allocated by thread T0 here:
> > | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> > | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
> >
> > While resolve_btfids could be refactored to avoid this call path, let's
> > instead fix this issue at the source in strset__add_str() and avoid
> > similar scenarios.
> >
> > Let's check if set->strs_data was reallocated and whether 's' points to
> > an internal string within the old strset buffer. In such case, 's' is
> > reconstructed to point to the new buffer.
> >
> > While already here, also fix strset__find_str() which suffers from the
> > same problem by factoring out the common operations into a new helper
> > function strset_str_append().
> >
> > Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> > Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > Suggested-by: Mykyta Yatsenko <yatsenko@xxxxxxxx>
> > Signed-off-by: Carlos Llamas <cmllamas@xxxxxxxxxx>
> > ---
> > v3:
> > Switch to 's' reconstruction approach suggested by Andrii.
> > Adjusted names and commit log accordingly.
> >
> > v2:
> > Implemented the fix in strset__offset() helper as suggested by Mykyta.
> > Added support to handle "substrings" of existing ones.
> > Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
> > https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@xxxxxxxxxx/
> >
> > v1:
> > https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@xxxxxxxxxx/
> > tools/lib/bpf/strset.c | 58 +++++++++++++++++++++++++++---------------
> > 1 file changed, 37 insertions(+), 21 deletions(-)
> >
> > diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> > index 2464bcbd04e0..d229961ff2fc 100644
> > --- a/tools/lib/bpf/strset.c
> > +++ b/tools/lib/bpf/strset.c
> > @@ -107,6 +107,37 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> > set->strs_data_len, set->strs_data_max_len, add_sz);
> > }
> >
> > +static long strset_str_append(struct strset *set, const char *s)
> > +{
> > + const char *old_data = strset__data(set);
>
> sashiko suggests to use uintptr_t for this detection to avoid
> undefined behavior, let's do that then)?
>
> but other than that and a nit below, looks good, thanks!
>
> pw-bot: cr

Ping. Carlos, are you planning to send a follow up?

>
> > + long len = strlen(s) + 1;
> > + void *p;
> > +
> > + /* Hashmap keys are always offsets within set->strs_data, so to even
> > + * look up some string from the "outside", we need to first append it
> > + * at the end, so that it can be addressed with an offset. Luckily,
> > + * until set->strs_data_len is incremented, that string is just a piece
> > + * of garbage for the rest of the code, so no harm, no foul. On the
> > + * other hand, if the string is unique, it's already appended and
> > + * ready to be used, only a simple set->strs_data_len increment away.
> > + */
> > + p = strset_add_str_mem(set, len);
> > + if (!p)
> > + return -ENOMEM;
> > +
> > + /* The set->strs_data might have reallocated and if 's' pointed
> > + * to an internal string within the old buffer, then it became
> > + * dangling and needs to be reconstructed before the copy.
> > + */
> > + if (old_data && old_data != strset__data(set) &&
> > + s >= old_data && s < old_data + set->strs_data_len)
> > + s = strset__data(set) + (s - old_data);
>
> nit: strset__data() is more of an "external api", note how we access
> strs_data_len directly through field. So let's use s->strs_data access
> consistently.
>
> > +
> > + memcpy(p, s, len);
> > +
> > + return len;
> > +}
> > +
> > /* Find string offset that corresponds to a given string *s*.
> > * Returns:
> > * - >0 offset into string data, if string is found;
> > @@ -116,16 +147,12 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> > int strset__find_str(struct strset *set, const char *s)
> > {
> > long old_off, new_off, len;
> > - void *p;
> >
> > - /* see strset__add_str() for why we do this */
> > - len = strlen(s) + 1;
> > - p = strset_add_str_mem(set, len);
> > - if (!p)
> > - return -ENOMEM;
> > + len = strset_str_append(set, s);
> > + if (len < 0)
> > + return len;
> >
> > new_off = set->strs_data_len;
> > - memcpy(p, s, len);
> >
> > if (hashmap__find(set->strs_hash, new_off, &old_off))
> > return old_off;
> > @@ -142,24 +169,13 @@ int strset__find_str(struct strset *set, const char *s)
> > int strset__add_str(struct strset *set, const char *s)
> > {
> > long old_off, new_off, len;
> > - void *p;
> > int err;
> >
> > - /* Hashmap keys are always offsets within set->strs_data, so to even
> > - * look up some string from the "outside", we need to first append it
> > - * at the end, so that it can be addressed with an offset. Luckily,
> > - * until set->strs_data_len is incremented, that string is just a piece
> > - * of garbage for the rest of the code, so no harm, no foul. On the
> > - * other hand, if the string is unique, it's already appended and
> > - * ready to be used, only a simple set->strs_data_len increment away.
> > - */
> > - len = strlen(s) + 1;
> > - p = strset_add_str_mem(set, len);
> > - if (!p)
> > - return -ENOMEM;
> > + len = strset_str_append(set, s);
> > + if (len < 0)
> > + return len;
> >
> > new_off = set->strs_data_len;
> > - memcpy(p, s, len);
> >
> > /* Now attempt to add the string, but only if the string with the same
> > * contents doesn't exist already (HASHMAP_ADD strategy). If such
> > --
> > 2.54.0.563.g4f69b47b94-goog
> >