[PATCH net V4 0/4] net/mlx5: Fixes for Socket-Direct
From: Tariq Toukan
Date: Tue Apr 28 2026 - 02:02:42 EST
Hi,
This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.
Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.
Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.
Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.
Patch 4 fixes missing cleanup on ETH probe errors. The analogous gap on
the resume path requires introducing sd_suspend/resume APIs that only
destroy FW resources and is left for a follow-up series.
Regards,
Tariq
V4:
- Link to V3:
https://lore.kernel.org/all/20260423123104.201552-1-tariqt@xxxxxxxxxx/
- Adjust "net/mlx5e: SD, Fix missing cleanup on probe/resume error" to
cleanup SD only on probe; the resume gap is deferred to a follow-up
series that will introduce sd_suspend/resume APIs.
- Fix concurrent cleanup vs. init race in
"net/mlx5: SD: Serialize init/cleanup".
- Remove leftover sentence in commit message of
"net/mlx5: SD: Serialize init/cleanup"
Shay Drory (4):
net/mlx5: SD: Serialize init/cleanup
net/mlx5: SD, Keep multi-pf debugfs entries on primary
net/mlx5e: SD, Fix missing cleanup on probe error
net/mlx5e: SD, Fix race condition in secondary device probe/remove
.../net/ethernet/mellanox/mlx5/core/en_main.c | 26 +++++--
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 76 ++++++++++++++++---
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 +
3 files changed, 87 insertions(+), 17 deletions(-)
base-commit: 3bc179bc7146c26c9dff75d2943d10528274e301
--
2.44.0