Re: [PATCH v3 1/9] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount
From: Zeng Heng
Date: Sat Mar 21 2026 - 02:39:52 EST
On 2026/3/21 12:11, Zeng Heng wrote:
Hi Ben,
On 2026/3/21 1:07, Ben Horgan wrote:
Hi Zeng,
On 3/17/26 13:21, Zeng Heng wrote:
This patch fixes a pre-existing issue in the resctrl filesystem teardown
sequence where premature clearing of cdp_enabled could lead to MPAM Partid
parsing errors.
The closid to partid conversion logic inherently depends on the global
cdp_enabled state. However, rdt_disable_ctx() clears this flag early in
the umount path, while free_rmid() operations will reference after that.
This creates a window where partid parsing operates with inconsistent CDP
state, potentially make monitor reads with wrong partid mapping.
Additionally, rmid_entry remaining in limbo between mount sessions may
trigger potential partid out-of-range errors, leading to MPAM fault
interrupts and subsequent MPAM disablement.
Reorder rdt_kill_sb() to delay rdt_disable_ctx() until after
rmdir_all_sub() and resctrl_fs_teardown() complete. This ensures
all rmid-related operations finish with correct CDP state.
Introduce rdt_flush_limbo() to flush and cancel limbo work before the
filesystem teardown completes. An alternative approach would be to cancel
The code looks correct but it does introduce a subtle change of behaviour which
may or may not be acceptable. A busy rmid may now be allocated after remount.
Clean rmids were never guaranteed, e.g. when a domain goes offline, but this
weakens the guarantee.
Yes, this would indeed weaken MPAM's guarantee for clean RMIDs.
Hopefully, no one is doing this in production, repeatedly switching
resctrl mount modes while monitoring workloads (which sounds more like
testing to me), and still expecting strict guarantees of clean RMID
allocation.
limbo work on umount and restart it on remount with remaked bitmap.
However, this would require substantial changes in the resctrl layer to
handle CDP state transitions across mount sessions, which is beyond the
scope of the reqpartid feature work this patchset focuses on. The current
Another option to consider is whether limbo could be replaced by checking whether
an rmid is busy at allocation.
Do your changes here to resctrl_arch_rmid_idx_encode() have an impact on how
limbo works?
In follow-up patches, resctrl_arch_rmid_idx_encode() also needs to
depend on the CDP state because it needs to check out the intpartid and
reqpartid. Between remount sessions, RMIDs residing in limbo also have a
parsing error issue.
For this reason, had to make this patch as a prerequisite fix in the
patch series.
Best Regards,
Zeng Heng