Re: [Patch v4 15/16] sched/cache: Fix possible overflow when invalidating the preferred CPU

From: Peter Zijlstra

Date: Mon May 18 2026 - 11:12:34 EST


On Wed, May 13, 2026 at 01:39:26PM -0700, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@xxxxxxxxx>
>
> epoch comes from the local rq->cpu_epoch, but mm->sc_stat.epoch is written
> by task_tick_cache() running on any CPU - potentially a different CPU whose
> rq->cpu_epoch is further ahead. The unsigned underflow wraps to a huge number,
> so the condition fires incorrectly.
>
> Fix this by converting the result to long.
>
> Fixes: df0d98475954 ("sched/cache: Introduce infrastructure for cache-aware load balancing")
> Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
> Co-developed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8617cd3642c7..7e64cd18727e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1688,7 +1688,7 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
> * If this process hasn't hit task_cache_work() for a while invalidate
> * its preferred state.
> */
> - if (epoch - READ_ONCE(mm->sc_stat.epoch) > llc_epoch_affinity_timeout ||
> + if ((long)(epoch - READ_ONCE(mm->sc_stat.epoch)) > (long)llc_epoch_affinity_timeout ||

I think your fixes is wrong; afaict this was broken by patch 6, before
that llc_epoch_affinity_timeout was EPOCH_LLC_AFFINITY_TIMEOUT, which is
a literal 5, and thus a signed type.

Anyway, a single (long) cast should be sufficient, the other side will
get promoted along IIRC.