Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
From: Ben Horgan
Date: Fri Jun 05 2026 - 12:45:05 EST
Hi Reinette,
On 6/5/26 16:39, Reinette Chatre wrote:
> Hi Ben,
>
> On 6/5/26 7:53 AM, Ben Horgan wrote:
>> On 6/4/26 18:43, Reinette Chatre wrote:
>>> On 6/3/26 8:15 AM, Ben Horgan wrote:
>>>> On 5/29/26 19:06, Reinette Chatre wrote:
>
> ...
>
>>>
>>>> I plumbed in support for the MB_MIN resource schema which also works under light
>>>> testing. The only fs resctrl code change I needed was:
>>>>
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>>>> resctrl_ctrl *ctrl)
>>>> case RESCTRL_CTRL_BITMAP:
>>>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>>>> case RESCTRL_CTRL_SCALAR:
>>>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>>>> + return ctrl->membw.min_bw;
>>>> +
>>>> return ctrl->membw.max_bw;
>>>> }
>>>>
>>>>
>>>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>>>> as the maximum bandwidth controls only take effect if their value is higher than
>>>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>>>> breaks your ctrl->type based classification but that's fixable by just adding a
>>>> default field to membw.
>>>
>>> This I am not sure about. In my understanding a typical "default" value means
>>> "no throttling" and, at least on Intel, this default hardware state has been
>>> summarized as "min" == "max" == "optimal".
>>
>> Ok, this sounds odd to me but that is probably because I don't know what Intel
>> systems do. On MPAM systems a MIN control is a boost rather than a throttling
>> control. Although, you can always think of that as throttling the traffic with
>> the other PARTIDs.
>>
>>>
>>> Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
>>> do not take effect? Could you please elaborate what happens if "min" == "max"?
>>
>> Table 5-4 from section 5.2.8 of the IHI0099B.b shows the interaction between the
>> min and maximum controls.
>>
>> If used bandwidth is The preference is Description
>> Below the minimum High Only high requests compete with this
>> request.
>> Above the minimum:
>> Below the maximum Medium High requests are serviced first then
>>
>> this request competes with other
>> medium requests.
>>
>> Above the maximum, Low Requests are not serviced if any high
>> when HARDLIM is 0 or medium requests are available.
>>
>> Above the maximum, None Requests are not serviced
>> when HARDLIM is 1
>>
>> So if we keep the minimum and the maximum controls values always the same then
>> all traffic will be given "high" preference until the target bandwidth is
>> reached. For some MPAM systems it is recommended to set the minimum value as 5%
>> less than the maximum value to get a reliable target bandwidth. As 5% seems
>> implementation specific and some systems don't have min controls it seemed
>> better to just match the MB control with a maximum bandwidth control and let the
>> user have freedom to choose the minimum bandwidth control when MB_MIN support is
>> added.
>>
>> If a default for the minimum of the maximum possible bandwidth is used (100%)
>> then any change of the maximum won't have any effect as it's always less than
>> minimum (if that's unchanged) and so all traffic is high preference. I now see
>> from your reply below that you are planning on not allowing this kind of
>> configuration.
>>
>> If the minimum always tracks the maximum then we lose the distinction between
>> medium and high preference traffic and so to reserve some high preference
>> bandwidth for one control group we'd have to change the configuration in the
>> other controls groups so that they're bandwidth preference is medium (minimum
>> value at 0).
>
> I do not think we are talking about the same thing here. I am *not* saying
> that minimum and maximum controls should always be the same.
>
> The discussion is about a proposed change to resctrl_get_default_ctrlval(). resctrl
> uses this function in two places:
> - When creating a new resource group:
> The intention here is that when user space creates a new resource group it should
> be created with maximum allocations possible. For MBA this means "unthrottled".
I would contend that for minimum controls that a policy of 'maximum allocation
possible' isn't a useful default. I try and explain a bit more below.
> After creating the resource group user space can adjust allocations to match
> workload requirements.
> - When unmounting the resctrl fs.
> The intention here is that all controls are set to unthrottled to stop any possible
> impact to system when user space stops using resctrl.
>
> resctrl_get_default_ctrlval() is thus intended to support an unthrottled baseline from
> where user space can make configuration changes as supported by hardware and required
> by workloads.
The baseline that I see makes most sense for a minimum control is to have the
default as 0. This just means that there is no "guaranteed"/high preference
bandwidth reserved for the control group. I would say this still unthrottled but
just not giving a boost. With this default the user can use MB (backed by max
bandwidth) without having to know about MB_MIN (keeping it constant). If the
default is 100% for min bandwidth then the user needs to know to set MB_MIN to
be able to use MB. Having a default of 100% for max bandwidth, correspondingly
means a user can change MB_MIN and see guaranteed bandwidth effects without
having to know about MB/MB_MAX.
Does this make sense?
>
> I see that the MPAM driver internally uses resctrl_get_default_ctrlval() in a couple
> of places and I am not considering this usage here. If internally MPAM has other
> usages for this function where it does not mean "unthrottled" then perhaps
> it would be better to create a new function that matches the usage?
I don't think the internal usage makes a difference here.
One process thing I was wondering about so that I know how to structure my
patches. In the series you have a few patches which touch all architectures;
these have the prefix mpam,x86,fs/resctrl. Is this how you would like cross
architectures patches to look like or is it just for convenience in the rfc and
a patch per-architecture is preferable?
Thanks,
Ben