[PATCH v3 1/9] fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state during umount
From: Zeng Heng
Date: Tue Mar 17 2026 - 09:23:30 EST
This patch fixes a pre-existing issue in the resctrl filesystem teardown
sequence where premature clearing of cdp_enabled could lead to MPAM Partid
parsing errors.
The closid to partid conversion logic inherently depends on the global
cdp_enabled state. However, rdt_disable_ctx() clears this flag early in
the umount path, while free_rmid() operations will reference after that.
This creates a window where partid parsing operates with inconsistent CDP
state, potentially make monitor reads with wrong partid mapping.
Additionally, rmid_entry remaining in limbo between mount sessions may
trigger potential partid out-of-range errors, leading to MPAM fault
interrupts and subsequent MPAM disablement.
Reorder rdt_kill_sb() to delay rdt_disable_ctx() until after
rmdir_all_sub() and resctrl_fs_teardown() complete. This ensures
all rmid-related operations finish with correct CDP state.
Introduce rdt_flush_limbo() to flush and cancel limbo work before the
filesystem teardown completes. An alternative approach would be to cancel
limbo work on umount and restart it on remount with remaked bitmap.
However, this would require substantial changes in the resctrl layer to
handle CDP state transitions across mount sessions, which is beyond the
scope of the reqpartid feature work this patchset focuses on. The current
fix addresses the immediate correctness issue with minimal churn.
Signed-off-by: Zeng Heng <zengheng4@xxxxxxxxxx>
---
fs/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5da305bd36c9..bc0735eef92a 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3165,6 +3165,25 @@ static void resctrl_fs_teardown(void)
rdtgroup_destroy_root();
}
+static void rdt_flush_limbo(void)
+{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+ struct rdt_l3_mon_domain *d;
+
+ if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+ return;
+
+ if (!resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
+ return;
+
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (has_busy_rmid(d)) {
+ __check_limbo(d, true);
+ cancel_delayed_work(&d->cqm_limbo);
+ }
+ }
+}
+
static void rdt_kill_sb(struct super_block *sb)
{
struct rdt_resource *r;
@@ -3172,13 +3191,14 @@ static void rdt_kill_sb(struct super_block *sb)
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
- rdt_disable_ctx();
-
/* Put everything back to default values. */
for_each_alloc_capable_rdt_resource(r)
resctrl_arch_reset_all_ctrls(r);
resctrl_fs_teardown();
+ rdt_flush_limbo();
+ rdt_disable_ctx();
+
if (resctrl_arch_alloc_capable())
resctrl_arch_disable_alloc();
if (resctrl_arch_mon_capable())
--
2.25.1