Re: [PATCH v4 10/10] fs/resctrl: Fix UAF from worker threads when domains are removed

From: Reinette Chatre

Date: Thu Jun 04 2026 - 11:55:00 EST

On 6/2/26 8:27 PM, Reinette Chatre wrote:
> The mbm_handle_overflow() and cqm_handle_limbo() workers read event
> counters and may sleep while doing so. They are scheduled via
> delayed_work embedded in struct rdt_l3_mon_domain. Architecture allocates
> and frees these domains from CPU hotplug callbacks under cpus_write_lock(),
> and the workers acquire cpus_read_lock() to keep the domain alive across
> their access.
>
> A use-after-free can occur when a worker is blocked waiting for
> cpus_read_lock() while the hotplug core holds cpus_write_lock():
> the architecture frees the rdt_l3_mon_domain that contains the worker's
> work_struct. When the worker unblocks, the container_of() it performs on
> the embedded work pointer dereferences freed memory.
>
> Drop cpus_read_lock() from the workers and instead drain pending and
> in-flight work synchronously before the architecture can free the domain.
> Since architecture offlines the domain under cpus_write_lock() after it has
> been unlinked from the RCU list and a grace period has elapsed, no new work
> can be scheduled. The cancel only needs to wait out existing work.
> Drop rdtgroup_mutex during CPU offline around cancel_delayed_work_sync()
> so that a worker waiting on the mutex can complete before re-pinning the
> work on a different CPU.
>
> When offlining a CPU the architecture may iterate over resources in any
> order. For example, the MBA control domain may be offlined before or
> after a corresponding L3 monitor domain. Ensure that resctrl fs cancels
> the workers no matter what order the architecture offlines the domains.
>
> Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
> Reported-by: Sashiko <sashiko-bot@xxxxxxxxxx>
> Closes: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com # [1]
> Co-developed-by: Tony Luck <tony.luck@xxxxxxxxx>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> ---

Sashiko highlighted [1] that since this patch ensures the workers can complete
as part of CPU offline the worker is more likely to run with on a domain while that
domain's cpu_mask is empty. There are places where the workers peek into the
domain's cpu_mask. For example,
- if SNC is enabled the limbo handler needs to know the NUMA node ID of the
domain and uses a CPU from cpu_mask to determine that
- both workers always look into the cpu_mask to determine where to reschedule next.

To protect against this I plan to add a check for an empty cpu_mask at the start
of both workers and just exit if the cpu_mask is empty.

Reinette

[1] https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=10