Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control

From: Ratheesh Kannoth

Date: Sun Jun 07 2026 - 22:28:52 EST

On 2026-06-05 at 12:02:41, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> CN20K NPC MCAM is split into 32 subbanks that are searched in a
> predefined order during allocation. Lower-numbered subbanks have
> higher priority than higher-numbered ones.
>
> Add a runtime "srch_order" to control the order in which
> subbanks are searched during MCAM allocation.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>> + USED = 1,
>> + };
>> +
>> + if (cnt != npc_priv.num_subbanks) {
>> + dev_err(rvu->dev, "Number of entries(%u) != %u\n",
>> + cnt, npc_priv.num_subbanks);
>> + return -EINVAL;
>> + }
>> +
>> + mutex_lock(&mcam->lock);
>> + npc_lock_all_subbank();
>Can the encoding of subbank_srch_order[] be reconciled between
>initialization and this setter?
>The boot-time fill done by npc_create_srch_order() and npc_subbank_init()
>populates the array as subbank_srch_order[subbank_idx] = priority (i.e.
>indexed by subbank, value is priority). For cnt = 8 it produces
>[3, 5, 2, 6, 1, 7, 0, 4], whose actual hardware search order is
>sb6, sb4, sb2, sb0, sb7, sb1, sb3, sb5.
>The new setter writes the same array back as
>subbank_srch_order[priority] = subbank_idx (indexed by priority, value is
>subbank):
> for (int i = 0; i < cnt; i++)
> subbank_srch_order[i] = (u32)narr[i];
>If a user runs devlink dev param show (which calls the GET path returning
>the boot-time encoding) and then re-applies the same array via
>devlink dev param set, the SET path treats it as a permutation of subbank
>indices and produces search order sb3, sb5, sb2, sb6, sb1, sb7, sb0, sb4
>instead of the previous order.
>The validator also checks that arr[i] in 0..sz-1 occurs exactly once each,
>which matches the SET form but is also true of the boot-time content
>(subbank_srch_order[7] == 4 in the example), so the validator does not
>catch the round-trip mismatch.
>Should both writers agree on a single encoding so that show + set
>round-trips are stable?

Agree, but npc_create_srch_order() and npc_subbank_init() are not added by this function
and is preexisting issue. WIll send patch to "net" tree soon after this series
is merged.

> + xa_erase(&npc_priv.xa_sb_used, prio);
>> + }
>> +
>> + for (int i = 0; i < cnt; i++)
>> + subbank_srch_order[i] = (u32)narr[i];
>> +
>> + restrict_valid = false;
>Is the unconditional restrict_valid = false here intentional, and is
>there a way to undo it?

No.
This is intentional.

>The cleanup loop directly above does not only erase newly-added entries.
>The check rsrc[FREE][prio] == 1 (and the USED counterpart) also fires for
>slots whose successful iteration replaced a pre-existing entry whose
>paired decrementing iteration never ran due to the failure.
>For example, with sb 0 FREE arr_idx = 0 and sb 1 FREE arr_idx = 1 and
>narr = [1, 0]:
> - iter prio = 0 succeeds: xa_sb_free[0] is replaced from 0 to 1,
> rsrc[FREE][0] becomes +1, rsrc[FREE][1] becomes -1.
> - iter prio = 1 fails.
> - cleanup loop sees rsrc[FREE][0] == 1 and erases xa_sb_free[0].
>The rollback loop then calls xa_store(xa_sb_free, save[0] = 0, sb 0)
>against an empty slot, which does need a fresh xarray node under
>GFP_KERNEL. If that allocation fails, the only response is WARN(!!err, ...)
>and the function returns the original error.
>After such a failure, sb->arr_idx is restored but the corresponding
>xa_sb_used / xa_sb_free entry is gone. Would future MCAM allocations via
>npc_subbank_noref_alloc() then silently skip those subbanks? Could
>xa_reserve() before the destructive phase, or not pre-erasing entries in
>the cleanup loop, make rollback unconditionally safe?

Suppose there are only two subbnaks and both are free.

When first loop runs.

rsrc[FREE][0] = 1 - 1 = 0
rsrc[FREE][1] = 1 - 1 = 0

xa_erse check if any value is positive, in this case nothing is positive, nothing
will be removed. So rollback will be safe.

>> +
>> + npc_unlock_all_subbank();
>> + mutex_unlock(&mcam->lock);
>> +
>> + return rc;
>> +}
>Is the locking on the allocator side sufficient to make this setter
>atomic with respect to MCAM allocation?

Yes. allocation is done in mbox handler; which is protected by mcam->lock.

>Is the locking on the allocator side sufficient to make this setter
>atomic with respect to MCAM allocation?
>The setter takes mcam->lock plus all per-subbank locks and rewrites
>xa_sb_used / xa_sb_free, modifies every sb->arr_idx, and toggles
>restrict_valid.
>The allocation hot path
>rvu_mbox_handler_npc_mcam_alloc_entry()
> -> npc_mcam_alloc_entries()
> -> npc_cn20k_ref_idx_alloc()
> -> npc_subbank_noref_alloc()
>iterates xa_for_each(&npc_priv.xa_sb_used, ...) and
>xa_for_each(&npc_priv.xa_sb_free, ...) without mcam->lock held
>(npc_mcam_alloc_entries() takes mcam->lock only after
>npc_cn20k_ref_idx_alloc() returns) and without holding any per-subbank
>lock between iterations (the per-subbank lock is taken inside
>npc_subbank_alloc() / npc_subbank_suits() and released before the next
>iteration step).
>xa_for_each is RCU-safe so this is not a use-after-free, but a concurrent
>SET can re-key xarray entries while the allocator is mid-walk, causing
>the allocator to visit some subbanks twice and skip others entirely.
>restrict_valid is also a plain bool read by npc_subbank_restrict_usage()
>and npc_defrag_skip_restricted_sb() and a plain bool write by the setter,
>with no READ_ONCE / WRITE_ONCE.
>Should the allocator iteration also hold mcam->lock (or the relevant
>subbank locks) for the duration of the walk, and should restrict_valid
>use READ_ONCE / WRITE_ONCE?
mcam_allocation() is done under mutex_unlock(&mcam->lock); in mbox handler.
Mbox are handled serially by AF. So this case wont' happen.