Re: [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem

From: Ratheesh Kannoth

Date: Sun Jun 07 2026 - 22:33:12 EST


On 2026-06-05 at 12:02:43, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> Flashing updated firmware on deployed devices is cumbersome. Provide a
> mechanism to load a custom KPU (Key Parse Unit) profile directly from
> the filesystem at module load time.
>
> When the rvu_af module is loaded with the kpu_profile parameter, the
> specified profile is read from /lib/firmware/kpu and programmed into
> the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
> used by filesystem-loaded profiles and support ptype/ptype_mask in
> npc_config_kpucam when profile->from_fs is set.
>
> Usage:
> 1. Copy the KPU profile file to /lib/firmware/kpu.
> 2. Build OCTEONTX2_AF as a module.
> 3. Load: insmod rvu_af.ko kpu_profile=<profile_name>
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>> + struct npc_kpu_profile_action *action2;
>> + struct npc_kpu_profile_cam2 *cam2;
>> };
>The on-disk difference between npc_kpu_profile_cam and
>npc_kpu_profile_cam2 is two extra trailing bytes (ptype, ptype_mask).
>Could a single normalized in-memory representation work, with the FS
>loader filling ptype/ptype_mask to zero when reading the legacy layout?
>Every consumer now has to dispatch through accessor helpers
>(npc_get_kpu_cam_nth_entry(), npc_get_num_kpu_cam_entries(),
>npc_get_ikpu_nth_entry(), ...) gated on profile->from_fs, and any new
>consumer that forgets the accessor reads the wrong table silently.
>The names cam2 / action2 / ikpu2 also do not carry semantic meaning on
>their own — would something tied to the format (for example
>cam_with_ptype) read more clearly?
>The from_fs flag also conflates two orthogonal concerns: where the bytes
>came from, and which on-disk schema they use. Is it intentional that the
>new extended format cannot be shipped through the FW-DB path, and that a
>legacy-format profile cannot be loaded from /lib/firmware/kpu/?
This is a new feature request for customer, prior to this we did not have support
for filesystem loading

> + u8 ptype = kpucam2->ptype;
>> + u8 pmask = kpucam2->ptype_mask;
>> +
>> + *val |= FIELD_PREP(GENMASK_ULL(57, 56), ptype & pmask);
>> + *mask |= FIELD_PREP(GENMASK_ULL(57, 56), ~ptype & pmask);
>> + }
>The encoding writes ptype into bits [57:56] of the 64-bit CAM word, but
>struct npc_kpu_cam declares those bits as a reserved 8-bit field:
> struct npc_kpu_cam {
> #if defined(__BIG_ENDIAN_BITFIELD)
> u64 rsvd_63_56 : 8;
> u64 state : 8;
> ...
>Should the bitfield in npc_kpu_cam be extended with a named ptype member
>so the encoding is visible in the struct layout (and checkable by sparse)?
>ptype and ptype_mask in npc_kpu_profile_cam2 are u8 (8-bit), but only the
>low 2 bits land in the CAM word via GENMASK_ULL(57, 56); any user passing
>ptype with bits above bit 1 silently has those bits dropped. Is that
>intentional, and if so could the field width be enforced (range-checked
>on load) to surface bad firmware data?
>There is also no #define for the bit positions or for the constant 0x03
>that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in
>this patch — a symbolic name would make the relationship between the
>two writes clear.
This is customer specific and all these are intentional.

>that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in
>this patch — a symbolic name would make the relationship between the
>two writes clear.
>[ ... ]
>> +static int npc_apply_custom_kpu_from_fs(struct rvu *rvu,
>> + struct npc_kpu_profile_adapter *profile)
>> +{
>> + size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0;
>> + const struct npc_kpu_profile_fwdata *fw;
>> + struct npc_kpu_profile_action *action;
>> + struct npc_kpu_profile_cam2 *cam2;
>> + struct npc_kpu_fwdata *fw_kpu;
>> + int entries, ret, entry, kpu;
>>
>> fw = rvu->kpu_fwdata;
>>
>> + /* Binary blob contains ikpu actions entries at start of data[0] */
>> + profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
>> + sizeof(ikpu_action_entries),
>> + GFP_KERNEL);
>> + if (!profile->ikpu2)
>> + return -ENOMEM;
>> +
>> + action = (struct npc_kpu_profile_action *)(fw->data + offset);
>> +
>> + if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
>> + return -EINVAL;
>Does the order here leave profile->ikpu2 charged to the device with
>uninitialized contents on the -EINVAL path?
>The devm_kcalloc() runs before the size check, and on error
>profile->from_fs is not set, so npc_get_ikpu_nth_entry() will not
>dereference ikpu2 today — but any future caller that consults
>profile->ikpu2 directly would see uninitialized memory at the right size.
>Could the validation move ahead of the allocation, with memcpy following
>both?

>
>> + /* The firmware layout does dependent on the internal size of
>> + * ikpu_action_entries.
>> + */
>> + memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> + offset += sizeof(ikpu_action_entries);
>The fwdata header declares the KPU count via fw->kpus but does not
>declare an ikpu/pkind count for this leading region — its size is
>implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree
>kernel array. The comment acknowledges this hidden ABI dependency.
>If ikpu_action_entries ever grows or shrinks in-tree, every previously
>distributed FS blob silently misaligns: the kernel reads the wrong
>number of bytes for ikpu actions, and all subsequent KPU CAM/action
>offsets are off. The size check
>"if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only
>ensures there are enough bytes — it does not ensure the bytes actually
>represent ikpu actions of that exact count.
>Could the on-disk format carry an explicit length / pkind count for the
>ikpu region so the kernel can validate and reject mismatched blobs?
>
if npc_apply_custom_kpu_from_fs() fails with above mentioned -EINVAL,
npc_prepare_default_kpu(rvu, profile) again intialize profile->ikpu2 to
static memory in npc_load_kpu_profile() function.

>profile->ikpu2 directly would see uninitialized memory at the right size.
>Could the validation move ahead of the allocation, with memcpy following
>both?
>> + /* The firmware layout does dependent on the internal size of
>> + * ikpu_action_entries.
>> + */
>> + memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> + offset += sizeof(ikpu_action_entries);
>The fwdata header declares the KPU count via fw->kpus but does not
>declare an ikpu/pkind count for this leading region — its size is
>implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree
>kernel array. The comment acknowledges this hidden ABI dependency.
>If ikpu_action_entries ever grows or shrinks in-tree, every previously
>distributed FS blob silently misaligns: the kernel reads the wrong
>number of bytes for ikpu actions, and all subsequent KPU CAM/action
>offsets are off. The size check
>"if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only
>ensures there are enough bytes — it does not ensure the bytes actually
>represent ikpu actions of that exact count.
>Could the on-disk format carry an explicit length / pkind count for the
>ikpu region so the kernel can validate and reject mismatched blobs?

There is no field (currently) in structure to indicate number of ikpu_action_entries
in binary blob. So we keep the number same in all binary blob. We can't add the size
as it will break backward compatability.

> + for (kpu = 0; kpu < fw->kpus; kpu++) {
>[ ... ]
>> + fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
>> + if (fw_kpu->entries <= 0) {
>> + dev_warn(rvu->dev,
>> + "Invalid kpu entries on KPU%d\n", kpu);
>> + return -EINVAL;
>> + }
>> +
>> + entries = min_t(int, fw_kpu->entries, rvu->hw->npc_kpu_entries);
>[ ... ]
>> + cam2 = (struct npc_kpu_profile_cam2 *)fw_kpu->data;
>> + offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam2);
>> + action = (struct npc_kpu_profile_action *)(fw->data + offset);
>> + offset += fw_kpu->entries * sizeof(*action);
>> + if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
>Could fw_kpu->entries get an upper bound check before being used in the
>offset arithmetic? The FW-DB sibling npc_apply_custom_kpu_from_fw()
>clamps via min_t(int, fw_kpu->entries, KPU_MAX_CST_ENT) before doing the
>arithmetic, while this path only clamps after. On a 64-bit size_t the
>post-arithmetic check correctly catches malformed blobs, but defensive
>symmetry with the FW-DB path would be nice.
OK. will add to net-next tree after this series is merged.

>[ ... ]
>> + fw = rvu->kpu_fwdata;
>> if (le64_to_cpu(fw->signature) != KPU_SIGN) {
>> dev_warn(rvu->dev, "Invalid KPU profile signature %llx\n",
>> fw->signature);
>Two distinct on-disk binary formats now share the same KPU_SIGN
>("kpuprof\0") signature and the same npc_kpu_profile_fwdata header.
>The FW-DB layout expects fwdata->data[0] to start directly with KPU
>entries; the new FS layout expects data[0] to start with
>sizeof(ikpu_action_entries) bytes of ikpu actions, then KPU entries
>encoded as npc_kpu_profile_cam2 (with ptype/ptype_mask) instead of
>npc_kpu_profile_cam.
>Nothing in npc_apply_custom_kpu() distinguishes the two; the choice is
>made by which load path is invoked (the from_fs argument). Is that a
>deliberate decision?
Yes. from_fs binary blob has npc_kpu_profile_cam2 format.

>If a user accidentally drops a FW-DB-format blob into /lib/firmware/kpu,
>the signature passes, npc_apply_custom_kpu_from_fs() memcpys the first
>sizeof(ikpu_action_entries) bytes (which are KPU CAM/action data in that
>layout) into profile->ikpu2, and npc_parser_profile_init() then programs
>those bytes into NPC pkind action registers via
>npc_config_kpuaction(rvu, blkaddr, npc_get_ikpu_nth_entry(rvu, idx),
>0, idx, true). The remaining KPU entries are read with the wrong layout
>(npc_kpu_profile_cam2 vs npc_kpu_profile_cam) on top.
>Could the blob carry a layout magic or version field independent of
>load source, and the loader reject mismatches?
User has to load correct FW to get requested feature. We will suggest customer
your suggestion to have different signatures.

> + *fw_kpus = fw->kpus;
>The cn20k early return at the top of this function returns without
>writing *fw_kpus. Should that path also set *fw_kpus, or should the
>parameter be removed from cn20k call sites?
>Today npc_load_kpu_profile_from_fw() declares int fw_kpus = 0 and never
>reads it back, so the dead local hides the asymmetry. If cn20k is ever
>wired into the FS path the trailing
>"rvu->kpu.kpus = fw_kpus; profile->kpus = fw_kpus;" in
>npc_load_kpu_profile_from_fs() will silently zero kpus and disable KPU
>programming because *fw_kpus was never written.

Currently only cn10k has support for custom binary support from filesystem.

> + int ret, fw_kpus = 0;
>> + char path[512] = "kpu/";
>> +
>> + if (strlen(kpu_profile) > sizeof(path) - strlen("kpu/") - 1) {
>> + dev_err(rvu->dev, "kpu profile name is too big\n");
>> + return -ENOSPC;
>> + }
>> +
>> + strcat(path, kpu_profile);
>rvu->kpu_pfl_name is char[KPU_NAME_LEN] (32 bytes), so strlen() of that
>buffer can never exceed sizeof(path) - 5 = 507. Is the strlen() check
>intended as defense-in-depth, or is a smaller path[KPU_NAME_LEN +
>sizeof("kpu/")] buffer (or snprintf(path, sizeof(path), "kpu/%.32s",
>kpu_profile)) preferable so the bound and intent line up?
We will add it in defensive pathset to net-next following this patchseries
is merged.

>> + rvu->kpu.kpus = fw_kpus;
>> + profile->kpus = fw_kpus;
>> + profile->from_fs = true;
>> + return 0;
>> +}
>The success path leaves rvu->kpu_fwdata_sz non-zero while
>rvu->kpu_fwdata has been set to NULL. The error path correctly zeroes
>both.
>npc_load_mkex_profile() and npc_cn20k_load_mkex_profile() consult
>kpu_fwdata_sz to decide whether to skip the FW-DB mkex lookup, so the
>asymmetry is harmless today, but any future consumer that uses
>kpu_fwdata_sz to gate a kpu_fwdata dereference will hit a NULL deref.
>Could kpu_fwdata_sz be cleared symmetrically on success?
This is intentional. We dont need to keep this symmetry.

>> +void npc_load_kpu_profile(struct rvu *rvu)
>> +{
>[ ... ]
>> + /* Filesystem-based KPU loading is not supported on cn20k.
>> + * npc_prepare_default_kpu() was invoked earlier, but control
>> + * reached this point because the default profile was not selected.
>> + * No need to call it again.
>> + */
>> + if (!is_cn20k(rvu->pdev)) {
>> + if (!npc_load_kpu_profile_from_fs(rvu))
>> + return;
>> + }
>> +
>> + /* First prepare default KPU, then we'll customize top entries. */
>> + npc_prepare_default_kpu(rvu, profile);
>> + if (!npc_load_kpu_profile_from_fw(rvu))
>> + return;
>>
>> npc_prepare_default_kpu(rvu, profile);
>> }
>The comment says "No need to call it again" but the very next executable
>line after the if (!is_cn20k(...)) block is exactly
>npc_prepare_default_kpu(rvu, profile). The second call is in fact needed
>on non-cn20k to reset state corrupted by a partial FS attempt before the
>FW path runs. Could the comment be reworded to reflect that?
>
This indicate that npc_prepare_default_kpu() called above, so we dont have to callit before
calling npc_load_kpu_profile_from_fs() in cn10k.

>[ ... ]
>> for (idx = 0; idx < num_kpus; idx++)
>> npc_program_kpu_profile(rvu, blkaddr, idx, &rvu->kpu.kpu[idx]);
>> +
>> + if (profile->from_fs) {
>> + rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(54), 0x03);
>> + rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(58), 0x03);
>> + }
>> }
>A few questions about these two writes:
>The pkind indices 54 and 58 match npc.h enum values
>NPC_RX_CPT_HDR_PTP_PKIND = 54 and NPC_RX_CPT_HDR_PKIND = 58 — could
>those named constants be used here so the relationship is explicit?
>The constant 0x03 has no symbolic name in rvu_reg.h, and there is no
>#define for the bits in NPC_AF_PKINDX_TYPE. Could a named macro be added
>that documents what value is being programmed?
>The same function carefully bounds pkind iteration earlier with
>"num_pkinds = min_t(int, hw->npc_pkinds, num_pkinds);" but these two
>writes target indices 54 and 58 unconditionally. hw->npc_pkinds comes
>from NPC_AF_CONST1[19:12] and is silicon-dependent — should these be
>gated on hw->npc_pkinds >= 59 (or on the corresponding pkind being
>defined) to avoid writing undefined CSRs on a future variant?
>Finally, is it intentional that this hardware register configuration is
>chosen based on the load mechanism (from_fs) rather than profile
>content? Any future FS profile that does not expect pkinds 54 and 58 to
>be configured this way would silently misbehave. Could the writes be
>driven from the profile data itself?

This is very customer specific and intentional. I did not find any speific
names for these values as there are many values in npc_profie.h as well.