[PATCH] sched/fair: Update zero_vruntime after clearing on_rq in dequeue_entity()

From: Zicheng Qu

Date: Thu Mar 19 2026 - 07:50:01 EST


When dequeuing the current entity (cfs_rq->curr) in dequeue_entity(),
the cfs_rq->zero_vruntime is updated via update_entity_lag() ->
avg_vruntime() -> update_zero_vruntime() while curr->on_rq is still 1.
This means the current entity is still included in the zero_vruntime
calculation.

However, immediately after this, curr->on_rq is set to 0, which should
change the avg_vruntime() result. Without re-updating zero_vruntime, the
stale value may be used in subsequent task selection paths:

schedule() -> ... -> pick_task_fair() -> pick_next_entity() ->
pick_eevdf() -> vruntime_eligible()

If entity_tick() -> avg_vruntime() -> update_zero_vruntime() is not
triggered in time between dequeue and the next pick, vruntime_eligible()
may use an inaccurate cfs_rq->zero_vruntime. This can potentially cause
all tasks to appear ineligible, leading to NULL pointer dereference.

Add an explicit avg_vruntime(cfs_rq) call after clearing curr->on_rq to
ensure cfs_rq->zero_vruntime is properly updated before the next pick.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Zicheng Qu <quzicheng@xxxxxxxxxx>
Signed-off-by: Zhang Qiao <zhangqiao22@xxxxxxxxxx>
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..f8070767c2f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5461,6 +5461,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (se != cfs_rq->curr)
__dequeue_entity(cfs_rq, se);
se->on_rq = 0;
+ /* update the cfs_rq->zero_vruntime again after curr->on_rq = 0 */
+ if (se == cfs_rq->curr)
+ avg_vruntime(cfs_rq);
account_entity_dequeue(cfs_rq, se);

/* return excess runtime on last dequeue */
--
2.34.1