Re: [PATCH net-next] mlxsw: spectrum_dcb: Reject writes owned by an offloaded qdisc

From: Petr Machata

Date: Tue May 19 2026 - 12:33:41 EST



Keno Fischer <keno@xxxxxxxxxxxx> writes:

> Some bits of state in the mlxsw driver can be controlled by two
> mechanisms. The first is the DCB netlink command set (RTM_SETDCB
> and e.g. DCB_ATTR_IEEE_ETS/DCB_ATTR_IEEE_MAXRATE), reachable e.g. from
> the `dcb` userspace utility. The second is via the qdisc framework
> (reachable e.g. from the `tc` userspace utility). Because they touch
> the same bit of state, it is currently possible for the ETS config
> to get out of sync with what the qdisc expects. One hard-to-debug
> consequence is that a config script becomes non-idempotent. E.g.,
> consider
>
> ```
> dcb ets set dev "${port}" prio-tc 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7
> tc qdisc replace dev "${port}" root handle 1: prio bands 8 \
> priomap 0 1 2 3 4 5 6 7
> ```
>
> Here the user made a mistake and put the priomap in the wrong order.
> The first time this code is run, the `qdisc` specification overwrites
> the `prio-tc` specification. However, the second time this code is run,
> the subsystem sees that the qdisc is identical and does nothing, causing
> the `prio-tc` setting to stick and the final mapping to be different.
> This can be very confusing, since the same config script creates two
> different states.
>
> I think there's two reasonable ways to fix this. One would be by
> reflecting the ets state changes back into the qdisc, so that the
> qdisc subsystem notices that it's been changed. However, I think it
> is clearer to just forbid the non-qdisc API from trying to make
> changes while the qdisc is offloaded.
>
> This is similar to the existing check in mlxsw_sp_dcbnl_setbuffer,
> which does essentially the same thing in reverse (buffers are only
> explicitly managed when there is an offloaded qdisc, otherwise
> they're managed explicitly).
>
> I hope this saves the next person a few hours of debugging wondering
> why their tc_no_buffer_discard_uc_tc_ counters are going up.
>
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Keno Fischer <keno@xxxxxxxxxxxx>

We do have this overwriting behavior documented on the wiki, but I don't
think it was ever really a deliberate consideration. I think it makes
sense to distance the DCB and qdisc configurations that way. I ran the
mlxsw-specific qos / sch tests that we have have, and everything passes,
which indicates that at least we didn't rely on the behavior in any way.

I'll let it pass through a full regression tonight. Overall the patch
looks OK to me, but Sashiko found something:

> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> index 73519301b744..c2ce4f3f562e 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
> @@ -2316,6 +2316,49 @@ int mlxsw_sp_setup_tc_block_qevent_mark(struct mlxsw_sp_port *mlxsw_sp_port,
> action_mask);
> }
>
> +bool mlxsw_sp_qdisc_has_prio_ets(struct mlxsw_sp_port *mlxsw_sp_port)
> +{
> + struct mlxsw_sp_qdisc_state *qdisc_state = mlxsw_sp_port->qdisc;
> + struct mlxsw_sp_qdisc *root_qdisc;
> + bool found;
> +
> + mutex_lock(&qdisc_state->lock);
> + root_qdisc = &qdisc_state->root_qdisc;
> + found = root_qdisc->ops &&
> + (root_qdisc->ops->type == MLXSW_SP_QDISC_PRIO ||
> + root_qdisc->ops->type == MLXSW_SP_QDISC_ETS);
> + mutex_unlock(&qdisc_state->lock);
> +
> + return found;
> +}

>From Sashiko review:

The check here only examines the root qdisc type. However, a prio or ets
qdisc can be a valid child of a root tbf qdisc in the offloaded
configuration.

This is correct, ETS/PRIO are offloaded when under root TBF as well.