Re: [PATCH v6 36/40] arm_mpam: Add workaround for T241-MPAM-1

From: James Morse

Date: Fri Mar 27 2026 - 11:52:58 EST

Hi Gavin,

On 24/03/2026 04:16, Gavin Shan wrote:
> On 3/14/26 12:46 AM, Ben Horgan wrote:
>> From: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
>>
>> The MPAM bandwidth partitioning controls will not be correctly configured,
>> and hardware will retain default configuration register values, meaning
>> generally that bandwidth will remain unprovisioned.
>>
>> To address the issue, follow the below steps after updating the MBW_MIN
>> and/or MBW_MAX registers.
>>
>> - Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
>>     (0x360048 + slice*0x10000 + partid*8). These registers are read-only.
>> - Continue iterating until all 12 shadow register values match in a loop.
>>     pr_warn_once if the values fail to match within the loop count 1000.
>> - Perform 64b writes with the value 0x0 to the two spare registers at
>>     offsets 0x1b0000 and 0x1c0000.
>>
>> In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
>> are transformed into broadcast writes to the 12 shadow registers. The
>> final two writes to the spare registers cause a final rank of downstream
>> micro-architectural MPAM registers to be updated from the shadow copies.
>> The intervening loop to read the 12 shadow registers helps avoid a race
>> condition where writes to the spare registers occur before all shadow
>> registers have been updated.

> One question below.
>
> Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx>

>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index e66631f3f732..b1753498f07f 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -630,7 +640,45 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc
>> *msc,
>>       return ERR_PTR(-ENOENT);
>> }
>> +static int mpam_enable_quirk_nvidia_t241_1(struct mpam_msc *msc,
>> +                       const struct mpam_quirk *quirk)
>> +{
>> +    s32 soc_id = arm_smccc_get_soc_id_version();
>> +    struct resource *r;
>> +    phys_addr_t phys;
>> +
>> +    /*
>> +     * A mapping to a device other than the MSC is needed, check
>> +     * SOC_ID is NVIDIA T241 chip (036b:0241)
>> +     */
>> +    if (soc_id < 0 || soc_id != SMCCC_SOC_ID_T241)
>> +        return -EINVAL;
>> +
>> +    r = platform_get_resource(msc->pdev, IORESOURCE_MEM, 0);
>> +    if (!r)
>> +        return -EINVAL;
>> +
>> +    /* Find the internal registers base addr from the CHIP ID */
>> +    msc->t241_id = T241_CHIP_ID(r->start);
>> +    phys = FIELD_PREP(GENMASK_ULL(45, 44), msc->t241_id) | 0x19000000ULL;
>> +
>> +    t241_scratch_regs[msc->t241_id] = ioremap(phys, SZ_8M);
>> +    if (WARN_ON_ONCE(!t241_scratch_regs[msc->t241_id]))
>> +        return -EINVAL;
>
> Those IO regions aren't unmapped when the MSCs are removed. I guess it would be
> something to be improved? :-)

It's just leaking some VA space in the unlikely event the error interrupt goes off.
That is never expected to happen - all the errors indicate a software bug, so its
not a case of being unlucky. (This assumes T241 supports the error interrupt!).

Adding some teardown would just be for this erratum, I expect it to be the only one
that needs to map some other device to poke at. I'm not sure its worth it.

I'm also very nervous changing this quirk as its difficult for me to test!

>> +
>> +    pr_info_once("Enabled workaround for NVIDIA T241 erratum T241-MPAM-1\n");
>> +
>> +    return 0;
>> +}
>> +
>> static const struct mpam_quirk mpam_quirks[] = {
>> +    {
>> +    /* NVIDIA t241 erratum T241-MPAM-1 */
>> +    .init       = mpam_enable_quirk_nvidia_t241_1,
>> +    .iidr       = MPAM_IIDR_NVIDIA_T241,
>> +    .iidr_mask = MPAM_IIDR_MATCH_ONE,
>> +    .workaround = T241_SCRUB_SHADOW_REGS,
>
> Perhaps we need a more leading space for every line in the above block.

Sure, done locally.

>> +    },
>>       { NULL } /* Sentinel */
>> };

Thanks,

James