Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()
From: K Prateek Nayak
Date: Tue Apr 28 2026 - 12:34:36 EST
(+ scheduler folks)
Hello Zhan,
On 4/28/2026 8:19 PM, Zhan Xusheng wrote:
> After commit 556146ce5e94 ("sched/fair: Avoid overflow in
> enqueue_entity()"), place_entity() can shift cfs_rq->zero_vruntime
> towards a newly enqueued heavy entity. This can make (vruntime -
> zero_vruntime) very large for other entities and cause key * load in
> vruntime_eligible() to overflow s64, flipping the eligibility result.
So the commit in question moves the zero_vruntime only when the
load > sum_weight.
You seem to have found a case where the entity_key() is already large
enough that moving the zero_vruntime farther will make the eligibility
check overflow which we were hoping will not be the case.
Do you have a reproducer that fails pick_eevdf() after introduction of
commit 556146ce5e94? Also, do you see any splats in the dmesg since we
have a defensive WARN_ON() to catch an overflow.
>
> Use check_mul_overflow() for the multiplication and fall back to a
> sign-based result on overflow.
Don't we have PARANOID_AVG for that? Can you do:
echo PARANOID_AVG > /sys/kernel/debug/sched/features
# run your workload
grep "sum_shift" /sys/kernel/debug/sched/debug
and check if the "sum_shift" turns non-zero.
>
> Fixes: 556146ce5e94 ("sched/fair: Avoid overflow in enqueue_entity()")
So we avoid a warning with that optimization but didn't see a crash
anywhere without it during my testing.
If you have a workload that crashes from those changes, perhaps we can
see if there is a cheap enough way to move the zero_vruntime closer to
the true average.
--
Thanks and Regards,
Prateek