[PATCH net V4 0/4] net/mlx5: Fixes for Socket-Direct

From: Tariq Toukan

Date: Tue Apr 28 2026 - 02:02:42 EST


Hi,

This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.

Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.

Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.

Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.

Patch 4 fixes missing cleanup on ETH probe errors. The analogous gap on
the resume path requires introducing sd_suspend/resume APIs that only
destroy FW resources and is left for a follow-up series.

Regards,
Tariq

V4:
- Link to V3:
https://lore.kernel.org/all/20260423123104.201552-1-tariqt@xxxxxxxxxx/
- Adjust "net/mlx5e: SD, Fix missing cleanup on probe/resume error" to
cleanup SD only on probe; the resume gap is deferred to a follow-up
series that will introduce sd_suspend/resume APIs.
- Fix concurrent cleanup vs. init race in
"net/mlx5: SD: Serialize init/cleanup".
- Remove leftover sentence in commit message of
"net/mlx5: SD: Serialize init/cleanup"

Shay Drory (4):
net/mlx5: SD: Serialize init/cleanup
net/mlx5: SD, Keep multi-pf debugfs entries on primary
net/mlx5e: SD, Fix missing cleanup on probe error
net/mlx5e: SD, Fix race condition in secondary device probe/remove

.../net/ethernet/mellanox/mlx5/core/en_main.c | 26 +++++--
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 76 ++++++++++++++++---
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 +
3 files changed, 87 insertions(+), 17 deletions(-)


base-commit: 3bc179bc7146c26c9dff75d2943d10528274e301
--
2.44.0