Re: [PATCH net 2/4] net/mlx5e: Fix HV VHCA stats agent registration race

From: Jacob Keller

Date: Thu Jun 04 2026 - 15:26:11 EST


On 6/4/2026 6:50 AM, Tariq Toukan wrote:
> From: Feng Liu <feliu@xxxxxxxxxx>
>
> mlx5e_hv_vhca_stats_create() registers the stats agent through
> mlx5_hv_vhca_agent_create(). The helper publishes the agent in
> hv_vhca->agents[type] under agents_lock and immediately schedules an
> asynchronous control invalidation on the HV VHCA workqueue before
> returning to mlx5e.
>
> The asynchronous invalidation invokes the control agent's invalidate
> callback, which reads the hypervisor control block and forwards the
> command to mlx5e_hv_vhca_stats_control(). That callback may either:
>
> - call cancel_delayed_work_sync(&priv->stats_agent.work), or
> - call queue_delayed_work(priv->wq, &sagent->work, sagent->delay).
>
> However, the delayed_work and priv->stats_agent.agent are only
> initialized after mlx5_hv_vhca_agent_create() returns to mlx5e:
>
> agent = mlx5_hv_vhca_agent_create(...); /* publish + invalidate */
> ...
> priv->stats_agent.agent = agent; /* too late */
> INIT_DELAYED_WORK(&priv->stats_agent.work, ...); /* too late */
>
> If the asynchronous control path runs before the two assignments
> above, it can:
>
> - Operate on an uninitialized delayed_work whose timer.function is
> NULL. queue_delayed_work() calls add_timer() unconditionally, so
> when the timer expires the timer softirq invokes a NULL function
> pointer.
> - Re-initialize the timer later through INIT_DELAYED_WORK() while
> the timer is already enqueued in the timer wheel, corrupting the
> hlist (entry.pprev cleared while the previous bucket node still
> points at this entry).
> - When the worker eventually runs, mlx5e_hv_vhca_stats_work() reads
> sagent->agent (NULL) and dereferences it inside
> mlx5_hv_vhca_agent_write().
>
> Fix this by:
>
> - Initializing priv->stats_agent.work before invoking
> mlx5_hv_vhca_agent_create(), so the work is always in a valid
> state when the control callback observes it.
> - Adding a struct mlx5_hv_vhca_agent **ctx_update out-parameter
> to mlx5_hv_vhca_agent_create(). The helper writes the agent
> pointer to *ctx_update before publishing into hv_vhca->agents[]
> and triggering the agents_update flow, so any callback
> subsequently invoked from that flow already sees a valid
> priv->stats_agent.agent. This avoids having the control
> callback participate in agent initialization.
>
> While at it, clear priv->stats_agent.{agent,buf} after teardown and
> on the agent_create() failure path. Without this, an enable/disable
> cycle hitting an early-return in create can lead to a UAF or
> double-destroy of stale pointers from the previous cycle.
>
> Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
> Signed-off-by: Feng Liu <feliu@xxxxxxxxxx>
> Reviewed-by: Eran Ben Elisha <eranbe@xxxxxxxxxx>
> Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
> ---

Reviewed-by: Jacob Keller <jacob.e.keller@xxxxxxxxx>