Re: [PATCH net-next V2 12/13] net/mlx5e: TC, enable steering for SD LAG
From: Shay Drori
Date: Wed Jun 03 2026 - 02:44:17 EST
On 03/06/2026 5:26, Jakub Kicinski wrote:
External email: Use caution opening links or attachments
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5e: TC, enable steering for SD LAG
Extends TC flow steering to support the SD LAG mode by adding
mlx5_lag_is_sd()/mlx5_sd_is_primary() helpers, redirecting uplink
priv/proto_dev queries to the primary eswitch in SD setups, and adding
peer-flow handling for secondary devices.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 2a16368a948e..910492eb51f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
[ ... ]
@@ -4196,9 +4197,26 @@ static bool is_lag_dev(struct mlx5e_priv *priv,
same_hw_reps(priv, peer_netdev));
}
+static bool is_sd_eligible(struct mlx5e_priv *priv,
+ struct net_device *peer_netdev)
+{
+ struct mlx5e_priv *peer_priv;
+
+ peer_priv = netdev_priv(peer_netdev);
+ return same_hw_reps(priv, peer_netdev) &&
+ mlx5_lag_is_sd(priv->mdev) &&
+ (mlx5_sd_get_primary(priv->mdev) ==
+ mlx5_sd_get_primary(peer_priv->mdev));
+}
[Medium]
With the redefinition of mlx5_sd_get_primary() to return NULL when
sd->devcom is not ready, can both calls here legitimately return NULL
during a teardown race and make the equality test succeed?
no, mlx5_lag_is_sd() is the gateway. e.g.: if mlx5_lag_is_sd() returns
true, mlx5_sd_get_primary() returns a valid primary, never NULL. The
two-NULLs-and-equal scenario isn't reachable.
In that situation is_sd_eligible() would report two unrelated devices
as belonging to the same SD primary.
Would something like this be more robust:
struct mlx5_core_dev *primary = mlx5_sd_get_primary(priv->mdev);
return same_hw_reps(priv, peer_netdev) &&
mlx5_lag_is_sd(priv->mdev) &&
primary &&
primary == mlx5_sd_get_primary(peer_priv->mdev);
+
static bool is_multiport_eligible(struct mlx5e_priv *priv, struct net_device *out_dev)
{
- return same_hw_reps(priv, out_dev) && mlx5_lag_is_mpesw(priv->mdev);
+ struct mlx5_core_dev *primary = mlx5_sd_get_primary(priv->mdev);
+
+ if (!primary)
+ return false;
+
+ return same_hw_reps(priv, out_dev) && mlx5_lag_is_mpesw(primary);
}
[ ... ]
@@ -4351,7 +4372,7 @@ static struct rhashtable *get_tc_ht(struct mlx5e_priv *priv,
return &tc->ht;
}
-static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow)
+static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow, bool *is_sd)
{
struct mlx5_esw_flow_attr *esw_attr = flow->attr->esw_attr;
struct mlx5_flow_attr *attr = flow->attr;
@@ -4372,6 +4393,13 @@ static bool is_peer_flow_needed(struct mlx5e_tc_flow *flow)
if (mlx5_lag_is_mpesw(esw_attr->in_mdev))
return true;
+ if (mlx5_lag_is_sd(esw_attr->in_mdev) &&
+ !mlx5_sd_is_primary(esw_attr->in_mdev)) {
+ if (!mlx5_lag_is_mpesw(mlx5_sd_get_primary(esw_attr->in_mdev)))
+ *is_sd = true;
+ return true;
+ }
+
return false;
}
[High]
Can mlx5_sd_get_primary(esw_attr->in_mdev) return NULL here?
no, same as above.
[Medium]
On the topic of unsynchronized SD-state reads in this branch, the
writer (mlx5_sd_cleanup) updates sd->primary and devcom->ready under
the devcom rwsem, but the readers introduced by this patch take none
of those locks:
- mlx5_sd_is_primary() reads sd->primary with no lock and no
READ_ONCE.
- mlx5_sd_get_primary() pairs READ_ONCE(ready) with a plain read of
sd->primary and sd->primary_dev.
- mlx5_lag_is_sd() reads pf->group_id under lag_lock, which does
not order against the devcom rwsem at all.
Since sd->primary also selects which arm of the union in struct
mlx5_sd is valid (secondaries[] vs primary_dev), can a torn or stale
read of sd->primary make these helpers read the wrong union member?
These helpers are intentionally lockless; it's the caller's
responsibility to call them in a context where the returned state is
guaranteed to remain valid. In this patch, the TC/rep callers only run
while SD is valid.
@@ -4628,19 +4657,26 @@ mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
struct mlx5e_tc_flow **__flow)
{
struct mlx5_devcom_comp_dev *devcom = priv->mdev->priv.eswitch->devcom, *pos;
+ struct netlink_ext_ack *extack = f->common.extack;
struct mlx5e_rep_priv *rpriv = priv->ppriv;
struct mlx5_eswitch_rep *in_rep = rpriv->rep;
struct mlx5_core_dev *in_mdev = priv->mdev;
struct mlx5_eswitch *peer_esw;
struct mlx5e_tc_flow *flow;
+ bool is_sd = false;
int err;
+ if (mlx5_lag_is_sd(in_mdev) && !mlx5_lag_is_active(in_mdev)) {
+ NL_SET_ERR_MSG_MOD(extack, "SD shared FDB not yet active");
+ return -EOPNOTSUPP;
+ }
+
flow = __mlx5e_add_fdb_flow(priv, f, flow_flags, filter_dev, in_rep,
in_mdev);
if (IS_ERR(flow))
return PTR_ERR(flow);
- if (!is_peer_flow_needed(flow)) {
+ if (!is_peer_flow_needed(flow, &is_sd)) {
*__flow = flow;
return 0;
}
@@ -4651,6 +4687,15 @@ mlx5e_add_fdb_flow(struct mlx5e_priv *priv,
}
mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
+ if (is_sd) {
+ /* SD shared FDB: only the matching SD primary. */
+ if (mlx5_sd_get_primary(in_mdev) !=
+ mlx5_sd_get_primary(peer_esw->dev))
+ continue;
+ } else {
+ if (!mlx5_sd_is_primary(peer_esw->dev))
+ continue;
+ }
err = mlx5e_tc_add_fdb_peer_flow(f, flow, flow_flags, peer_esw);
if (err)
goto peer_clean;
[Medium]
Same NULL == NULL question applies here. The devcom being iterated is
priv->mdev->priv.eswitch->devcom, i.e. the eswitch devcom, so peer_esw
entries are not necessarily SD peers and may belong to different SD
groups (or to none).
The is_sd branch is only entered when is_peer_flow_needed() returns
is_sd=true, which requires the caller's mdev to be SD. As mentioned
above, mlx5_sd_get_primary() is gated by mlx5_lag_is_sd(), so
mlx5_sd_get_primary(in_mdev) can't be NULL here. Different SD groups
have different primaries, so the is_sd branch explicitly filters to
the matching group.
If mlx5_sd_get_primary() can transiently return NULL on either side
during an SD teardown race, both sides may evaluate to NULL and the
!= test will not skip the unrelated peer:
if (mlx5_sd_get_primary(in_mdev) !=
mlx5_sd_get_primary(peer_esw->dev))
continue;
Would explicitly requiring a non-NULL primary on at least one side, or
caching mlx5_sd_get_primary(in_mdev) once and rejecting NULL up front,
be more correct?
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
index ec606851feb8..25286ecd724e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -49,13 +49,16 @@ static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
return sd->host_buses;
}
-static struct mlx5_core_dev *mlx5_sd_get_primary(struct mlx5_core_dev *dev)
+struct mlx5_core_dev *mlx5_sd_get_primary(struct mlx5_core_dev *dev)
{
struct mlx5_sd *sd = mlx5_get_sd(dev);
if (!sd)
return dev;
+ if (!mlx5_devcom_comp_is_ready(sd->devcom))
+ return NULL;
+
return sd->primary ? dev : sd->primary_dev;
}
[ ... ]