Re: [PATCH v18 net-next 6/8] octeontx2-af: npc: Support for custom KPU profile from filesystem

From: Ratheesh Kannoth

Date: Wed Jun 03 2026 - 23:09:03 EST


On 2026-06-02 at 11:33:57, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> Flashing updated firmware on deployed devices is cumbersome. Provide a
> mechanism to load a custom KPU (Key Parse Unit) profile directly from
> the filesystem at module load time.
>
> When the rvu_af module is loaded with the kpu_profile parameter, the
> specified profile is read from /lib/firmware/kpu and programmed into
> the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
> used by filesystem-loaded profiles and support ptype/ptype_mask in
> npc_config_kpucam when profile->from_fs is set.
>
> Usage:
> 1. Copy the KPU profile file to /lib/firmware/kpu.
> 2. Build OCTEONTX2_AF as a module.
> 3. Load: insmod rvu_af.ko kpu_profile=<profile_name>
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260602060359.1894952-1-rkannoth%40marvell.com says

>> - for (idx = 0; idx < num_pkinds; idx++)
>> - npc_config_kpmaction(rvu, blkaddr, &rvu->kpu.ikpu[idx],
>> + /* Cn20k does not support Custom profile from filesystem */
>> + for (idx = 0; idx < num_pkinds; idx++) {
>> + act = npc_get_ikpu_nth_entry(rvu, idx);
>> + if (!act)
>> + continue;
>> +
>> + npc_config_kpmaction(rvu, blkaddr, act,
>> 0, idx, true);
>> + }
>Can act ever be NULL on this path? npc_get_ikpu_nth_entry() returns
>either &profile->ikpu2[n] (when from_fs is true) or &profile->ikpu[n],
>neither of which can be NULL. And as the comment above notes, cn20k
>does not take the from_fs path at all (it is gated by !is_cn20k() in
>npc_load_kpu_profile()), so ikpu always points at the static
>ikpu_action_entries[].
>If that is the case, the if (!act) continue check is dead.
I agree, but there is a plan to add it in future, so would like to keep this check

>> + if (profile->from_fs) {
>> + u8 ptype = kpucam2->ptype;
>> + u8 pmask = kpucam2->ptype_mask;
>> +
>> + *val |= FIELD_PREP(GENMASK_ULL(57, 56), ptype & pmask);
>> + *mask |= FIELD_PREP(GENMASK_ULL(57, 56), ~ptype & pmask);
>> + }
>The cast const struct npc_kpu_profile_cam2 *kpucam2 = (void *)kpucam is
>unconditional, but the function's parameter is typed as
>struct npc_kpu_profile_cam *. The two structures differ in size
>(12 vs 14 bytes packed).
>The reads of kpucam2->ptype and kpucam2->ptype_mask are gated by
>profile->from_fs, but the gate relies on a cross-function invariant:
>the from_fs flag must imply that npc_get_kpu_cam_nth_entry() was used
>and returned a cam2-typed element. There is no static assertion or
>runtime check tying these two together.
>Could a BUILD_BUG_ON ensuring
>offsetof(struct npc_kpu_profile_cam2, ptype) == sizeof(struct npc_kpu_profile_cam)
>be added so that a future layout change is caught at compile time?
>The bits that ptype/ptype_mask are written into are bits 57:56 of cam0
>and cam1, but struct npc_kpu_cam in npc.h still declares those bits as
>part of rsvd_63_56 : 8 in both endianness branches. Should the cam
>struct be updated with an explicit ptype : 2 / rsvd_63_58 : 6 split (or
>similar) so the contract that those two reserved bits are now live is
>visible at the struct definition?
>Is there also a way to assert that the silicon variants on which the
>filesystem profile path runs (cn9k, cn10k) actually decode bits 57:56
>as ptype-match?

The rsvd_63_56 field in struct npc_kpu_cam remains intentionally generic because this shared structure header
is consumed across multiple legacy silicon generations where those bits are strictly reserved.
Modifying the base bitfield definition would break backward compatibility metrics for older targets.
The localized use of FIELD_PREP(GENMASK_ULL(57, 56), ...) inside the from_fs block serves as the explicit.

>> + return -ENOMEM;
>> +
>> + kpu->action2 = devm_kcalloc(rvu->dev, num_entries,
>> + sizeof(*kpu->action2), GFP_KERNEL);
>> + if (!kpu->action2)
>> + return -ENOMEM;
>> +
>> + return 0;
>> +}
>Could this function be storing per-device devm allocations in
>module-global static state?
>adapter->kpu is set by npc_prepare_default_kpu() to npc_kpu_profiles
>from npc_profile.h, which is a static module-global array.
>kpu->cam2 and kpu->action2 are devm_kcalloc(rvu->dev, ...) so the
>underlying memory is freed at device release, but the pointers (and
>the cam_entries2/action_entries2 counters set in
>npc_apply_custom_kpu_from_fs()) remain in the static array.
>After unbind, would a subsequent rebind that uses the default profile
>or the from_fw path leave dangling pointers and non-zero
There is only one AF device per system.

> +static int npc_apply_custom_kpu_from_fw(struct rvu *rvu,
>> + struct npc_kpu_profile_adapter *profile)
>> {
>> size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0;
>> + const struct npc_kpu_profile_fwdata *fw;
>> struct npc_kpu_profile_action *action;
>> - struct npc_kpu_profile_fwdata *fw;
>> struct npc_kpu_profile_cam *cam;
>> struct npc_kpu_fwdata *fw_kpu;
>> - int entries;
>> - u16 kpu, entry;
>> + int entries, entry, kpu;
>>
>> - if (is_cn20k(rvu->pdev))
>> - return npc_cn20k_apply_custom_kpu(rvu, profile);
>> + fw = rvu->kpu_fwdata;
>> +
>> + for (kpu = 0; kpu < fw->kpus; kpu++) {
>> + if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
>> + dev_warn(rvu->dev,
>> + "Profile size mismatch on KPU%i parsing\n",
>> + kpu + 1);
>> + return -EINVAL;
>> + }
>> +
>> + fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
>> + if (fw_kpu->entries < 0) {
>Is the size check above missing sizeof(*fw_kpu)? The check guards
>that offset is reachable, but the next statement reads fw_kpu->entries
>which is at fw->data + offset + 0..3. When kpu_fwdata_sz equals
>hdr_sz + offset exactly, the read of fw_kpu->entries is past the end
>of the firmware buffer.
This is a prexisintg issue in the code, will fix it as part of patch to "net" tree
once this series is merged.

>>
>> fw = rvu->kpu_fwdata;
>>
>> + /* Binary blob contains ikpu actions entries at start of data[0] */
>> + profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
>> + sizeof(ikpu_action_entries),
>> + GFP_KERNEL);
>> + if (!profile->ikpu2)
>> + return -ENOMEM;
>> +
>> + action = (struct npc_kpu_profile_action *)(fw->data + offset);
>> +
>> + if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
>> + return -EINVAL;
>Should the size validation happen before the devm allocation? When
>kpu_fwdata_sz is too small the function returns -EINVAL with
>profile->ikpu2 already allocated and stored. Devm reclaims at device
>release, but on every malformed-firmware probe a small buffer is left
>orphaned and reachable through profile->ikpu2 (with from_fs reset to
>false on the fallback path, so it is not currently dereferenced).

There is only one AF device per system and this loading of profiles is done during driver
init() time.

>> +
>> + /* The firmware layout does dependent on the internal size of
>> + * ikpu_action_entries.
>> + */
>> + memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> + offset += sizeof(ikpu_action_entries);
>Is the firmware ABI here tied to a kernel-internal sizeof?
>ikpu_action_entries[] is a static array in npc_profile.h. The
>filesystem blob has no field declaring how many ikpu entries it
>contains, so the offset advance depends on the kernel build. If a
>future kernel adds a pkind to ikpu_action_entries, every existing
>filesystem blob will be silently misparsed (offset advances by the
>wrong amount, then cam entries are read from the wrong location)
>even though the major version check still passes.
We have support only for little endian. And ikpu_action_entries structure does not have a fieled
to hold "size" currently. we cant add it now as it would break backward compatability.

>> return npc_cn20k_apply_custom_kpu(rvu, profile);
>>
>> if (rvu->kpu_fwdata_sz < hdr_sz) {
>> dev_warn(rvu->dev, "Invalid KPU profile size\n");
>> return -EINVAL;
>> }
>> +
>> + fw = rvu->kpu_fwdata;
>> if (le64_to_cpu(fw->signature) != KPU_SIGN) {
>> dev_warn(rvu->dev, "Invalid KPU profile signature %llx\n",
>> fw->signature);
>The new filesystem-loaded blob and the existing fwdb blob share the
>same KPU_SIGN signature and the same struct npc_kpu_profile_fwdata
>header, but data[] has a different binary layout: the filesystem
>format prepends a sizeof(ikpu_action_entries) block of ikpu actions
>and uses npc_kpu_profile_cam2 (16 bytes packed) instead of
>npc_kpu_profile_cam (14 bytes packed).
>Dispatch is decided entirely by the loader path:
> return from_fs ? npc_apply_custom_kpu_from_fs(rvu, profile) :
> npc_apply_custom_kpu_from_fw(rvu, profile);
>with nothing in the file itself indicating which layout applies.
>If a fwdb-formatted blob is dropped into /lib/firmware/kpu/<name>,
>signature and version checks pass, then the parser silently
>misinterprets the bytes, programming arbitrary values plus enabled
>ptype-match bits into the KPU CAM and PKIND ACTION CSRs.
>Should the on-disk header carry a format/cam_format_version field, or
>should the cam2 layout use a distinct signature?
This is intentional as per customer requirement.

>> +
>> profile->custom = 1;
>> - profile->name = fw->name;
>> + profile->name = sfw->name;
>> profile->version = le64_to_cpu(fw->version);
>> - profile->mcam_kex_prfl.mkex = &fw->mkex;
>> - profile->lt_def = &fw->lt_def;
>If npc_load_kpu_profile_from_fs() runs first and fails, then
>npc_load_kpu_profile_from_fw() retries via npc_apply_custom_kpu()
>again, a second sfw is allocated and overwrites profile->name,
>profile->mcam_kex_prfl.mkex and profile->lt_def. The first sfw is
>still tracked by devm but no longer referenced.
>Is one orphaned sfw per probe acceptable, or should the helper reuse
>the existing sfw on retry?
There is only one AF device per system

>> + release_firmware(fw);
>> + rvu->kpu_fwdata = NULL;
>> +
>> + if (ret) {
>> + rvu->kpu_fwdata_sz = 0;
>> + dev_err(rvu->dev,
>> + "Loading KPU profile from filesystem failed\n");
>> + return ret;
>> + }
>On the success path, kpu_fwdata is cleared but kpu_fwdata_sz is left
>non-zero. On the failure path both are cleared. Is the asymmetry
>intentional?
Yes.

>
>> for (idx = 0; idx < num_kpus; idx++)
>> npc_program_kpu_profile(rvu, blkaddr, idx, &rvu->kpu.kpu[idx]);
>> +
>> + if (profile->from_fs) {
>> + rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(54), 0x03);
>> + rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(58), 0x03);
>> + }
>> }
>A few questions about these unconditional writes:
>PKIND indices 54 and 58 are hardcoded. Other sites in the file clamp
>via min_t(int, hw->npc_pkinds, num_pkinds) before programming PKIND
>CSRs. Is there a guarantee that hw->npc_pkinds is greater than 58 on
>all silicon variants that take this path?
>The value 0x03 has no comment explaining what it programs. Could a
>brief comment or named constant be added?
>These writes are profile-specific (they encode knowledge of two
>particular PKINDs the customer's profile uses) but apply to any
>filesystem-loaded blob. Should these settings be expressed in the
>firmware blob itself rather than hardcoded in the kernel?
>The commit message describes the change as programming the profile
>read from the file and does not mention these register writes.
This very much hw specific and intentional.