Re: [PATCH v2] netlink, audit: prevent false ENOBUFS on timeout expiry

From: Ricardo Robaina

Date: Wed May 27 2026 - 15:29:52 EST


On Mon, May 18, 2026 at 8:04 AM Simon Horman <horms@xxxxxxxxxx> wrote:
>
> From: 'Simon Horman' <horms@xxxxxxxxxx>
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> netlink, audit: prevent false ENOBUFS on timeout expiry
>
> This commit modifies netlink_unicast() to return -EAGAIN when a blocking
> sender's timeout expires. This bypasses the netlink_overrun() call to
> allow the audit subsystem to gracefully enqueue the pending event into its
> internal backlog without falsely returning ENOBUFS to user-space.
>
> > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> > --- a/net/netlink/af_netlink.c
> > +++ b/net/netlink/af_netlink.c
> > @@ -1351,8 +1351,18 @@ int netlink_unicast(struct sock *ssk, struct sk_buff *skb,
> > }
> >
> > err = netlink_attachskb(sk, skb, &timeo, ssk);
> > - if (err == 1)
> > + if (err == 1) {
> > + /* timeo may have been zeroed by schedule_timeout inside
> > + * netlink_attachskb. If the caller is a timed-blocking sender
> > + * (not genuinely nonblocking), don't re-enter with timeo=0 as
> > + * that would misfire netlink_overrun on the next iteration.
> > + */
> > + if (timeo == 0 && !nonblock) {
> > + kfree_skb(skb);
> > + return -EAGAIN;
>
> Could this regression cause silent packet drops in non-audit subsystems?
>
> While this change prevents false ENOBUFS for the audit subsystem, other
> subsystems like InfiniBand's IWPM use rdma_nl_unicast_wait(), which sets
> nonblock=0 and operates on a kernel socket with a timeout.
>
> If the timeout expires because the receiving user-space daemon is
> bottlenecked, this new path will free the skb and return -EAGAIN without
> ever invoking netlink_overrun().
>
> Without netlink_overrun(), the receiver's socket error is never set to
> ENOBUFS. Would this prevent user-space from being notified of the dropped
> message, leading to permanent state desynchronization between the kernel
> and user-space?
>
> > + }
> > goto retry;
> > + }
> > if (err)
> > return err;
>

Hi Simon,

Thanks for reviewing this patch!

You are correct that the timeo == 0 && !nonblock heuristic in v2 relies on
an implicit assumption about finite sk_sndtimeo. While RDMA/IWPM with
MAX_SCHEDULE_TIMEOUT would never reach this path in practice, your concern
correctly identifies that the heuristic is not surgical enough.

I've submitted v3 [1] with an explicit NETLINK_UNICAST_TIMED constant
(value 2). Callers must explicitly opt into this contract, leaving IWPM and
all other subsystems completely untouched:

if (timeo == 0 && nonblock == NETLINK_UNICAST_TIMED)

This ensures zero risk of silent drops or state desynchronization in other
subsystems. Does this address your concern?

[1] https://lore.kernel.org/audit/20260527192150.949400-1-rrobaina@xxxxxxxxxx/T/#u

Best regards,
Ricardo