Re: [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking

From: Peter Zijlstra

Date: Mon Mar 30 2026 - 15:11:35 EST


On Mon, Mar 30, 2026 at 09:20:01PM +0530, K Prateek Nayak wrote:
> Hello Peter,
>
> On 3/30/2026 8:10 PM, Peter Zijlstra wrote:
> > On Mon, Mar 30, 2026 at 08:07:06PM +0530, K Prateek Nayak wrote:
> >> Hello Peter,
> >>
> >> On 3/30/2026 3:40 PM, Peter Zijlstra wrote:
> >>> This means, that if the two tasks playing leapfrog can reach the
> >>> critical speed to reach the overflow point inside one tick's worth of
> >>> time, we're up a creek.
> >>>
> >>> If this is indeed the case, then the below should cure things.
> >>
> >> I have been running with this for four hours now and haven't seen
> >> any splats or crashes on my setup. I could reliably trigger the
> >> warning from __sum_w_vruntime_add() within an hour previously so
> >> it is safe to say I was hitting exactly this.
> >>
> >> Feel free to include:
> >>
> >> Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> >
> > Ha!, excellent. Thanks!
>
> Turns out I spoke too soon and it did eventually run into that
> problem again and then eventually crashed in pick_task_fair()
> later so there is definitely something amiss still :-(
>
> I'll throw in some debug traces and get back tomorrow.

Are there cgroups involved?

I'm thinking that if you have two groups, and the tick always hits the
one group, the other group can go a while without ever getting updated.

But if there's no cgroups, this can't be it.

Anyway, something like the below would rule this out I suppose.


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..19b75af31a5a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1304,6 +1304,8 @@ static void update_curr(struct cfs_rq *cfs_rq)

curr->vruntime += calc_delta_fair(delta_exec, curr);
resched = update_deadline(cfs_rq, curr);
+ if (resched)
+ avg_vruntime(cfs_rq);

if (entity_is_task(curr)) {
/*
@@ -5593,11 +5595,6 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
update_load_avg(cfs_rq, curr, UPDATE_TG);
update_cfs_group(curr);

- /*
- * Pulls along cfs_rq::zero_vruntime.
- */
- avg_vruntime(cfs_rq);
-
#ifdef CONFIG_SCHED_HRTICK
/*
* queued ticks are scheduled to match the slice, so don't bother