Re: [PATCH v3 9/9] fs/resctrl: Fix UAF from worker threads when domains are removed
From: Reinette Chatre
Date: Tue May 26 2026 - 13:55:11 EST
Hi Tony,
On 5/26/26 8:32 AM, Luck, Tony wrote:
Instead of deleting the patch without any comments, could you please provide some
insight to what problems it has and why your solution is better?
> On Fri, May 22, 2026 at 12:15:13PM -0700, Reinette Chatre wrote:
>> - Adding a reference count to the domain structure to avoid the worker
>> needing to take CPU hotplug lock. This ended up being very complicated
>> with the architecture needing new APIs to manage the reference count
>> which cannot cleanly integrate into MPAM since it uses a single
>> architecture domain structure to contain both the control and monitoring
>> domain structures. Managing the references across mount, unmount,
>> online, offline, as well as worker self exit resulted in several
>> asymmetrical and complicated paths that were error prone. Locking also
>> proved to be complicated since architecture would need to initiate
>> domain free that will need to call back into resctrl that will take
>> rdtgroup_mutex which means that references need to be taken/released
>> without locking.
>
> I'd been working on a reference count approach too. The MPAM combined
> domain for control and monitoring doesn't seem insurmountable. Mostly
While technically possible I do not think it is a clean solution to have
the lifetime of the control domain be controlled with a reference in the
monitoring domain.
> because it seems unlikely that the problem with worker threads would
> ever apply to control domains. Maybe I missed something, but just adding
> an architecture *release() function that can be used by file system code
> to drop reference counts on the domain when worker threads exit seems
> enough.
Did you consider the locking implications that I mention in the description
you quoted? More below ...
>
> My patch below.
heh
>
> -Tony
>
>
> From 611fd8ad816abd37ef9a65b39175ce05907a1d41 Mon Sep 17 00:00:00 2001
> From: Tony Luck <tony.luck@xxxxxxxxx>
> Date: Thu, 21 May 2026 15:14:27 -0700
> Subject: [PATCH] fs,mpam,x86/resctrl: Track reference count for L3 monitor
> domains
>
> There are race conditions[1] when the last CPU of a domain is taken offline
> and a worker thread may access the domain structure after it is freed.
>
> Add a rdt_l3_mon_domain::kref to track users of the domain. Don't try
> to cancel worker threads when CPUs are taken offline. Just set the
> target CPU for the thread to nr_cpu_ids to indicate the worker needs
> to take action next time it runs.
One test I have found to be useful when digging into this is to offline all CPUs
of a domain starting with lowest number. Since overflow worker runs on lowest number
and then is moved to next CPU when it goes offline this stresses this new mechanics.
Have you tried something similar or could you try this test with this solution?
...
> @@ -680,18 +692,16 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
>
> switch (r->rid) {
> case RDT_RESOURCE_L3: {
> - struct rdt_hw_l3_mon_domain *hw_dom;
> struct rdt_l3_mon_domain *d;
>
> if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
> return;
>
> d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
> - hw_dom = resctrl_to_arch_mon_dom(d);
> resctrl_offline_mon_domain(r, hdr);
> list_del_rcu(&hdr->list);
> synchronize_rcu();
> - l3_mon_domain_free(hw_dom);
> + kref_put(&d->kref, resctrl_arch_l3_mon_domain_release);
> break;
> }
To me the idea behind a "domain reference count" is to provide guarantee to any holder of
a reference that the domain *and* its data remains accessible while it holds the reference.
There is the domain structure itself and then the architecture specific, for example,
rdt_hw_l3_mon_domain::arch_mbm_states, and the fs state, for example rdt_l3_mon_domain::mbm_states.
Above snippet moves the freeing of the *architecture* state to be called on kref_put()
while *always* freeing the fs state (resctrl_offline_mon_domain()->domain_destroy_l3_mon_state()).
A worker may thus have a reference to the domain but when it runs it runs without
fs state which is just a new use-after-free.
As I mentioned in the description the release managed by architecture implies that
reference needs to be dropped without rdtgroup_mutex held since the architecture
should also call resctrl_offline_mon_domain() as part of release. Locking needs
to be reworked and needs to adhere to kref rules on kref_get()/kref_put() without
locking.
Reinette