Re: kalloc_objs() may not be as safe as it seems

From: Alejandro Colomar

Date: Tue Mar 17 2026 - 17:41:54 EST

Hi Kees,

On 2026-03-17T14:14:11-0700, Kees Cook wrote:
> On Mon, Mar 16, 2026 at 07:33:34PM +0100, Alejandro Colomar wrote:
> > Hi Kees,
> >
> > I just learnt about kalloc_objs() et al. from
> > <https://lwn.net/Articles/1062856/>.
> >
> > ptr = kmalloc_obj(*ptr);
> > ptr = kmalloc_objs(*ptr, n);
> >
> > This resembles a lot the macro we have in shadow-utils, malloc_T(),
> > which would be used as (to resemble the above):
> >
> > ptr = malloc_T(1, typeof(*ptr)); // But we'd really pass the type
> > ptr = malloc_T(n, typeof(*ptr)); // But we'd really pass the type
> >
> > But I've noticed some design mistakes that make it not as safe as it
> > seems.
> >
> > Default arguments
> > ~~~~~~~~~~~~~~~~~
> >
> > I tend to think it's simpler to have a single API that works for both
> > 1 element and multiple elements. The special-casing of 1 element seems
> > unnecessary, and having a literal 1 works just fine.
>
> This is a reasonable opinion, but not one I wanted to try to fight for
> with the Linux developer community. Linus has tended to frown on adding
> any new burden when making these kinds of changes, and expecting
> everyone to add a (seemingly) redundant "1" to all API calls (or an
> empty comma) seems unlikely to fly.

Should we add Linus to CC? I think the safety arguments, especially
after his patch adding the default argument, talk strongly about using
the literal 1, and dropping the _obj() macros. I hope the "1" is
something people could consider.

I understand the empty comma might be more controversial; although
I hope we can eventually discuss it too.

> > I think the combination of having the macros be variadic (for gfp) with
> > having two very similar APIs that differ in number of arguments, and
> > all those arguments being integer types, is prone to errors. Consider
> > the case where one would accidentally write
> >
> > ptr = kmalloc_obj(*ptr, n); // Bogus
> > instead of
> > ptr = kmalloc_objs(*ptr, n);
>
> This loss of GFP flags wasn't part of my original design, and I would
> agree that given the lack of type checking for GFP flags, this does look
> like a potential foot-gun.

Indeed; I agree Linus's change was the worst part.

Having just one of these two issues separately would be relatively fine,
and it's the combination of both that makes it such a foot-gun. From
the two issues, the variadic argument list is probably the most
dangerous part.

> > The compiler wouldn't realize at all. That's a strong argument in
> > favour of having default arguments be required to be explicit, with an
> > empty argument:
> >
> > ptr = kmalloc_obj(*ptr,);
> > ptr = kmalloc_objs(*ptr, n,);
> >
> > I know you (and Linus too, FWIW) have previously claimed that it looks
> > weird to the eye. But I'm pretty sure you could get used to it. That's
> > certainly going to be safer.
> >
> > With mandatory empty arguments, the compiler would easily distinguish
> > mistakes like the one above.
>
> I'd rather we get something like the __strict typedef so "gfp_t" would
> be a true separate type, not just a silent alias of "int".

Yup, that would help too. Although in general I still think that
variadic argument lists are very dangerous: there's no difference
between "the programmer forgot to specify the argument, and the compiler
didn't remind them" and "the programmer wanted the default value". This
is a source of long-term issues.

I understand the extra comma might be difficult to sell at the moment,
but we should try to figure out a way to sell it in the long term.

> > Type safety
> > ~~~~~~~~~~~
> >
> > Apart from the issues with the above, the ability to pass a variable
> > instead of a type name is also a bad choice. In shadow-utils, we
>
> This was a first-order requirement, or we'd never be able to refactor
> the codebase to use the new API. There was already a heavy mixture of
> types and variables used within sizeof(), and trying to take that part
> and find types exceeded Coccinelle's abilities.
>
> Making the refactor "trivial" was important; but see below.

Sounds reasonable. This imperfect API is still (or was, until Linus
turned it into a double foot-gun) better than what was before, and if it
serves as a step to eventually reach the better APIs, it's great.

> > require a type name, and a variable is rejected. We implement that with
> > the typeas() macro:
> >
> > #define typeas(T) typeof((T){0})
> >
> > This macro works exactly like typeof(), but it requires that the input
> > is also a type. Passing a variable is a syntax error. We implement
> > malloc_T() with it:
> >
> > // malloc_T - malloc type-safe
> > #define malloc_T_(n, T) \
> > ({ \
> > (typeas(T) *){reallocarray(n, sizeof(T))}; \
> > })
> >
> > which is used as (taking some arbitrary examples from shadow-utils):
> >
> > lp = xmalloc_T(1, struct link_name);
> > targs = xmalloc_T(n_args + 3, char *);
> >
> > Some reasons for passing a type name instead of a variable are:
> >
> > - It allows grepping for all allocations of a given type.
> > - It adds readability. It's similar to declaring variables with some
> > explicit type, vs. using 'auto' (__auto_type) everywhere.
>
> Sure, and that's why many places were already using type names, and
> the use of "auto" was even one of Linus's driving examples for why the
> new API could be very easily used.

Nice.

> > But there's also a safety aspect. Consider we want to allocate an array
> > of 42 ints. And consider the programmer accidentally swaps arguments.
> >
> > int *p = malloc_T(int, 42); // syntax error
> > int *p = malloc_T(42, int);
> > vs
> > int *p = kmalloc_objs(*p, 42);
> > int *p = kmalloc_objs(42, *p); // Bogus
> >
> > The latter is dereferencing an uninitialized pointer. If for some
> > reason the pointer had a value before this call, you'd be allocating as
> > many elements as *p says, which would be bogus, and since typeof(42) is
> > the same as typeof(*p), the return type would be valid, so this would
> > still compile.
>
> Yeah, this is a foot-gun too. I'm open to requiring a type, but it's a
> significant amount of careful refactoring needed to accomplish it. It
> was a weakness of the existing API, though it was more "obvious" since
> it was visually contained by "sizeof()".

Agree. The good part is that the refactor doesn't need to be as
careful, because incorrect types will result in build errors, and not
bugs. But yes, it needs significantly more work than what the current
API required.

Maybe you could diagnose with coccinelle where expressions are used?
Then people could slowly look at their diagnostics and replace them.

Have a lovely night!
Alex

--
<https://www.alejandro-colomar.es>

Attachment: signature.asc
Description: PGP signature