Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Babu Moger
Date: Wed Apr 08 2026 - 16:46:14 EST
Hi Reinette,
On 4/7/26 23:45, Reinette Chatre wrote:
Hi Babu,
On 4/7/26 6:01 PM, Babu Moger wrote:
Hi Reinette,
On 4/7/26 12:48, Reinette Chatre wrote:
Hi Babu,
On 4/6/26 3:45 PM, Babu Moger wrote:
Hi Reinette,
Sorry for the late response. I was trying to get confirmation about the use case.
No problem. I appreciate that you did this so that we can make sure resctrl supports
needed use cases.
On 3/31/26 17:24, Reinette Chatre wrote:
On 3/30/26 11:46 AM, Babu Moger wrote:
On 3/27/26 17:11, Reinette Chatre wrote:
On 3/26/26 10:12 AM, Babu Moger wrote:
On 3/24/26 17:51, Reinette Chatre wrote:
On 3/12/26 1:36 PM, Babu Moger wrote:
can have domains that span different CPUs. There thus seem to be a built in assumption of what a "domain"
means for PQR_PLZA_ASSOC so it sounds to me as though, instead of saying that "PQR_PLZA_ASSOC needs
to be the same in QoS domain" it may be more accurate to, for example, say that "PQR_PLZA_ASSOC has L3 scope"?
Yes.
Above is about L3 scope ...
Yes. The scope for PQR_PLZA_ASSOC is L3.
Is that what you are asking here?
I was trying to point out that there appears to be a mismatch between the actual scope and
the planned implementation. As highlighted below during the discussion about "global" this is
fine with me and I just wanted to confirm that this matches your intentions.
Ack.
This seems to be what this implementation does since it hardcodes PQR_PLZA_ASSOC scope to the L3
resource but that creates dependency to the L3 resource that would make PLZA unusable if, for example,
the user boots with "rdt=!l3cat" while wanting to use PLZA to manage MBA allocations when in kernel?
Yes. that is correct. It should not be attached to one resource. We need to change it to global scope.
Can I interpret "global scope" as "all online CPUs"? Doing so will simplify
Yes. That is correct.
supporting this feature. It does not sound practical for a user wanting to assign
different resource groups to kernel work done in different domains ... the guidance should
instead be to just set the allocations of one resource group to what is needed in the different
domains? There may be more flexibility when supporting per-domain RMIDs though but so far
it sounds as though the focus is global. We can consider what needs to be done to support
some type of "per-domain" assignment as exercise whether current interface could support it
in the future.
Yes. Makes sense.
...
The PLZA MSR is updated when user changes the association to the
file. No context switch code changes are needed. This will be
dedicated group. The current resctrl group files, "cpus, cpus_list
Why does this have to be a dedicated group? One of the conclusions from v1
discussion was that the "PLZA group" need *not* be a dedicated group. I repeated that
in my earlier response that I left quoted above. You did not respond to these
conclusions and statements in this regard while you keep coming back to this
needing to be a dedicated group without providing a motivation to do so.
Could you please elaborate why a dedicated group is required?
If the same group applies identical limits to both user and kernel
space, it essentially behaves like a current resctrl group. In that
sense, it’s not really a PLZA group. PLZA’s key value is the ability
to separate allocations between user space and kernel space. A
The plan has never been to force identical allocations for user and kernel
space since that would go against this feature entirely. Even so, just as
user and kernel space cannot be forced to have identical allocations they
also cannot be forced to have different allocations. Specifically,
a task *can* use the same CLOSID for user and kernel space work just as easily
as it can use *different* CLOSID for user and kernel space work. There
should not be any CLOSID reserved just for kernel work. Or am I missing something?
No. You are not missing anything.
single CPU can belong to two groups: one group manages the user-
space allocation for that CPU, while another manages the kernel-mode
allocation.
Exactly. This is why it is important to have two files for this CPU association
within a resource group. The cpus/cpus_list file continues to be used as today
while the new kernel_mode_cpus/kernel_mode_cpus_list is used for kernel work.
With this a task can be associated with any resource group for its user space
allocations but when it runs on one of the CPUs within kernel_mode_cpus then
its kernel work will be done with allocations of the resource group the
kernel_mode_cpus file belongs to, which may or may not be the same
resource group that the user space task belongs to.
Yes. Exactly.
This approach also simplifies file handling, which is another reason
I prefer it.
I *think* we have different interpretations of "dedicated group":
It sounds as though you interpret "dedicated group" as a way that enforces
the same allocations to user space and kernel work.
I interpret "dedicated group" essentially as a CLOSID reserved for kernel
work. Since I do not see that resctrl should dedicate a CLOSID/resource group
for kernel work I have been pushing against such "dedicated group".
Actually, our understanding is same. Probably, I am not explaining it right. Hope we get there soon.
That said, I’m open to not having a dedicated group if we can still support all the features that PLZA provides without it.
I find that enabling user space to share CLOSID/RMID between user space
and kernel space to indeed support what PLZA provides. I think I am missing
something here since below proposal again attempts to isolate a resource group
(CLOSID) for kernel work.
No. I dont want to isolate a group just for PLZA. All I am saying is, we should provide option to create a dedicated group if the user wants to do it.
Add a file, "info/kmode_monitor", to describe how kmode is monitored.
# cat info/kmode_monitor
[inherit_ctrl_and_mon] <- Kernel uses the same CLOSID/RMID as user. Default option for the "global"
assign_ctrl_inherit_mon <- One CLOSID for all kernel work; RMID inherited from user.
assign_ctrl_assign_mon <- One resource group (CLOSID+RMID) for all kernel work. Default option for "cpu" type.
My first thought is that the naming is confusing. resctrl has a very strong relationship between
"RMID" and "monitoring" so naming a file "monitor" that deals with allocation/ctrl/CLOSID is
potentially confusion.
Apart from that, while I think I understand where you are going by separating the mode into
two files I am concerned about future complications needing to accommodate all different
combinations of the (now) essentially two modes. My preference is thus to keep this simple by
keeping the mode within one file.
Even so, when stepping back, it does not really look like we need to separate the "global"
and "per CPU" modes. We could just have a single "per CPU" mode and the "global" is just
its default of "all CPUs", no?
Yes. That correct.
Consider, for example, the implementation just consisting of:
# cat info/kernel_mode
[inherit_ctrl_and_mon]
global_assign_ctrl_inherit_mon_per_cpu
global_assign_ctrl_assign_mon_per_cpu
Rename “kernel_mode_assignment” to “kmode_group” to assign the specific group to kmode. This file usage is same as before.
#cat info/kmode_groups (Renamed "kernel_mode_assignment")
//
Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
"kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
In summary, I think this can be simplified by introducing just two new files in info/ that enables the
user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
"kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
it will be all online CPUs. The resource group can continue to be used to manage allocations of and
monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
A user wanting just "global" settings will get just that when writing the group to
info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
expected to inherit both CLOSID and RMID from user space for all kernel work.
After further consideration, I don’t think the info/kernel_mode file
is necessary. There’s no need to enforce a specific mode for all the
PLZA groups. Avoiding this constraint makes the design more
flexible, particularly as we move toward supporting multiple PLZA
groups in the future. MPAM already appears capable of handling more
than one group—for example, one group could use
inherit_ctrl_and_mon, while another could use
global_assign_ctrl_inherit_mon_per_cpu.
You are looking ahead at future capabilities for which we do not know all requirements
at this time. I think it is very good to consider how things may progress and your example
of MPAM is of course on point. I believe the current design does consider this progression.
Please see https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@xxxxxxxxx/
(search for "per_group_assign_ctrl_assign_mon"). In that exploration per-group assignment
is actually accomplished with global files. I thus think we should not make such a big
architectural decision that does not benefit the immediate feature using partial information.
As it is, a "info/kernel_mode" gives the flexibility to expand to, if needed, configuration
files within a resource group. That is why the intention is to associate the mode within
info/kernel_mode with the presence/absence of info/kernel_mode_assignment (search for
"Visibility depends on active mode in info/kernel_mode" in linked email) since in the
future resctrl may need to enable a mode that needs configuration files within each
resource group and when enabling such mode the per-resource group files will appear
instead of the global info/kernel_mode_assignment.
The mode can simply be determined on a per-group basis. We can introduce two new files—kernel_mode_cpus and kernel_mode_cpus_list—within each resctrl group when kmode (or PLZA) is supported.
I think having these files in every resource group is confusing since user can only interact
with these files in one resource group for current PLZA. Why not *just* have the files in the
resource group that matches the group in info/kernel_mode_assignment?
The default group can also serve as the PLZA group.
#cat info/kernel_mode_assignment
//
At this point, the (kmode_cpus / kmode_cpus_list) files will exist in the default group:
Then user changes the PLZA group to "test".
#echo "test//" > info/kernel_mode_assignment
At this point, we expect the files "(kmode_cpus/kmode_cpus_list)" to be visible in "test//" group.
One open question is whether we should remove the visibility of these files from the default group. It’s unclear if we can safely do this dynamically.
An alternative approach would be to always keep the files present, but allow access to them only for groups that are listed in "info/kernel_mode_assignment".
The info/kernel_mode_assignment file would indicate which resctrl
group(or groups) is used for PLZA. The files—kernel_mode_cpus and
kernel_mode_cpus_list would indicate how the plza is applied which
each group.
The "how PLZA is applied" should be learned from info/kernel_mode where user
space learns whether RMID is inherited or not. While I find kernel_mode_cpus
and kernel_mode_cpus_list to be just for configuration and just found in the
resource group listed in info/kernel_mode_assignment.
ok.
Files and behavior:
- cpus / cpus_list:
CPUs listed here use the same allocation for both user and kernel space.
Both user and kernel space?
As it stands today, the CPU list is written to MSR_PQR_ASSOC, resulting in the same allocation for both user and kernel within a given CLOS.
Kernel-mode allocation changes only if specific CPUs are included in the kmode_cpus list.
Monitoring would depend on info/kernel_mode_assignment ("inherit_mon")
and kernel space allocation would depend on whether the CPU on which the task runs
can be found in kernel_mode_cpus, no?
Yes. that is correct.
There is no change to the current semantics of these files.
If these files are empty, the group effectively becomes a PLZA-dedicated group.
I do not see it this way. If the cpu/cpus_list files are empty then it means that the
tasks in the group will use their own CLOSID/RMID for user space allocation and
monitoring. What allocations/monitoring is used by tasks when in kernel mode depends
on whether the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
file. If the CPU the task is running on can be found in a kernel_mode_cpus/kernel_mode_cpuslist
file then it will inherit whatever the PQR_PLZA setting of that CPU which is the allocation
associated with the resource group to which that kernel_mode_cpus/kernel_mode_cpuslist belongs.
If the CPU the task is running on cannot be found in kernel_mode_cpus/kernel_mode_cpuslist
then its kernel work will inherit its user space allocations and monitoring.
Yes. that is correct. I think our understanding is correct, but our implementation ideas are different it seems.
- kernel_mode_cpus / kernel_mode_cpus_list:
These files determine whether a separate kernel allocation is applied.
If empty, user and kernel share the same allocation.
If non-empty, the kernel uses a separate allocation.
The group can be CTL_MON or MON group. Based on type the group the CLOSID and RMID will be used to enable PLZA. If it is MON, then rmid_en = 1 when writing PLZA MSR.
This will be difficult to get right since CTRL_MON groups also have RMID assigned.
Here’s the proposed flow:
# mount -t resctrl resctrl /sys/fs/resctrl/
# cd /sys/fs/resctrl/
# cat info/kernel_mode_assignment
//
By default, the root (default) group is PLZA-enabled when resctrl is mounted. All CPUs use CLOSID 0 for both user and kernel-mode allocation.
# cat cpus_list
1-64
# cat kmode_cpus_list
1-64
Next, create a new group for PLZA:
# mkdir plza_group
# echo "plza_group//" > info/kernel_mode_assignment
At this point, plza_group becomes the new PLZA-enabled group, and the PLZA-related MSRs are updated accordingly.
It really looks like you are getting back to trying to dedicate a resource group to
kernel work and that is not something that resctrl should enforce.
# cat plza_group/cpus_list
<empty>
# cat plza_group/kmode_cpus_list
1-64
The user can then update kmode_cpus_list to apply PLZA only to a specific subset of CPUs, if desired.
What do you think of this approach?
It is difficult to predict how the "next" PLZA will actually end up looking like and I find resctrl creating a complicated
interface to support this to be risky. Instead I would prefer to focus on efficiently supporting what PLZA can do today
and make it extensible. Apart from that I find the implicit interface, "If it is MON, then rmid_en = 1" to be too
architecture specific for a generic interface while also not able to accurately capture user's intent (i.e. user may
indeed, for example, want "a CTRL_MON group to have rmid_en = 1"). Finally, I am just so confused about why the implementations
keep needing to dedicate a resource group/CLOSID to kernel work.
Let me make sure I understand what you mentioned earlier. Copied the text below from the thread for the context:
https://lore.kernel.org/lkml/3305c18e-9e50-4df0-b9f1-c61028628967@xxxxxxxxx/
=====================================================================
Please consider the intent of this file when thinking about names. The idea is that "info/kernel_mode"
specifies the "mode" of how kernel work is handled and it determines the configuration files used in that
mode as well as the syntax when interacting with those files. By renaming "kernel_mode_assignment" to
"kmode_groups" it implicitly requires all future kernel mode enhancements to need some data related to "groups".
In summary, I think this can be simplified by introducing just two new files in info/ that enables the
user to (a) select and (b) configure the "kernel mode". To start there can be just two modes,
global_assign_ctrl_inherit_mon_per_cpu and global_assign_ctrl_assign_mon_per_cpu.
global_assign_ctrl_inherit_mon_per_cpu mode requires a control group in kernel_mode_assignment while
global_assign_ctrl_assign_mon_per_cpu requires a control and monitoring group.
The resource group in info/kernel_mode_assignment gets two additional files "kernel_mode_cpus" and
"kernel_mode_cpus_list" that contains the CPUs enabled with the kernel mode configuration, by default
it will be all online CPUs. The resource group can continue to be used to manage allocations of and
monitor user space tasks. Specifically, the "cpus", "cpus_list", and "tasks" files remain.
A user wanting just "global" settings will get just that when writing the group to
info/kernel_mode_assignment. A user wanting "per CPU" settings can follow the
info/kernel_mode_assignment setting with changes to that resource group's kernel_mode_cpus/kernel_mode_cpus_list
files. Any task running on a CPU that is *not* in kernel_mode_cpus/kernel_mode_cpus_list can be
expected to inherit both CLOSID and RMID from user space for all kernel work.
======================================================================
Let me try to get few clarification on things here.
# cat info/kernel_mode
[inherit_ctrl_and_mon]
global_assign_ctrl_inherit_mon_per_cpu
global_assign_ctrl_assign_mon_per_cpu
My understanding of "inherit_ctrl_and_mon" is that the kernel inherits both the CLOS and the RMID from user space. Basically both user and kernel uses same CLOSID and RMID. This reflects the current behavior (without PLZA) correct? This would correspond to the default group when resctrl is mounted.
The modes "global_assign_ctrl_inherit_mon_per_cpu" and "global_assign_ctrl_assign_mon_per_cpu" represent the actual PLZA modes.
Both of these modes introduce new files kernel_mode_cpus/ and kernel_mode_cpus_list in the resctrl group.
When the user echoes a group name into info/kernel_mode_assignment, PLZA is applied globally across all CPUs. This is default behavior.
If the user wants PLZA to apply only to a specific subset of CPUs, then the kernel_mode_cpus or kernel_mode_cpus_list files need to be updated accordingly.
global_assign_ctrl_inherit_mon_per_cpu : The group needs to be CTLR_MON group. This mode uses rmid_en=0 when writing PLZA MSR.
global_assign_ctrl_assign_mon_per_cpu: The group needs to be CTLR_MON/MON group. This mode uses rmid_en=1 when writing PLZA MSR.
Did I get it right?
Thanks
Babu