Re: [PATCH] [QUESTION] sched/fair: Potential vruntime underflow and unconstrained vlag scaling in rescale_entity()

From: chenjinghuang

Date: Mon May 18 2026 - 10:16:30 EST


Hello Prateek,

On 5/14/2026 11:19 PM, K Prateek Nayak wrote:
> Hello Chen,
>
> On 5/14/2026 6:55 PM, Chen Jinghuang wrote:
>> Hi all,
>>
>> While analyzing cgroup weight adjustment scenarios in EEVDF, I observed a
>> potential vruntime underflow issue caused by unconstrained vlag scaling in
>> rescale_entity(). I would like to consult the community on whether this
>> behavior is expected or if it represents a bug in the current implementation.
>>
>> I notice this my trace in a multi-level cgroup environment:
>>
>> CPU 3
>> CURRENT: PID: 12485 TASK: ffff003027f49440 COMMAND: "cpu_sim"
>> ROOT_TASK_GROUP: ffffd714095439c0 CFS_RQ: ffff002fbfa3d140
>> TASK_GROUP: ffff00211fdfe800 CFS_RQ: ffff00213f2c9800 <throttle_test>
>> TASK_GROUP: ffff20300f2e7000 CFS_RQ: ffff0021190ca000 <case_cpu_idle>
>> TASK_GROUP: ffff203016880c00 CFS_RQ: ffff00211ece4c00 <child_1>
>> [120] PID: 12485 TASK: ffff003027f49440 COMMAND: "cpu_sim" [CURRENT]
>> TASK_GROUP: ffff203016884000 CFS_RQ: ffff002156835c00 <child_2>
>> [120] PID: 12649 TASK: ffff003027f4e540 COMMAND: "cpu_sim"
>>
>> Trace Metrics (Before/After rescale_entity and update_load_set):
>>
>> Before: weight: 209715, avruntime: 7738112562, vlag: 3638691801, vruntime: 4099420761
>> After: weight: 614, avruntime: 7738112562, vlag: 1242814741118, vruntime: 4099420761, limit: 724001488558
>> (vruntime/avruntime stay unchanged; the scaling only touches vlag, deadline, and vprot)
>>
>> Weight drop (209715->614) during __sched_group_set_shares() causes se->vlag
>> to explode in rescale_entity(), surged from 3638691801 to 1242814741118.
>>
>> When the entity's vruntime is subsequently updated via se->vruntime =
>> avruntime - se->vlag, the massive vlag value leads to a underflow of
>> se->vruntime.
>>
>> Furthermore, I noticed that while entity_lag() typically applies a limit (calculated
>> as 724001488558 in this instance) to constrain se->vlag, rescale_entity() performs
>> the scaling without any such boundary checks. This allows se->vlag to exceed the
>> theoretical limits expected by the EEVDF algorithm.
>
> The vlag, to begin with, would be within theoretical limits right?
> Rescaling it should also put it under theoretical limits if I'm not
> wrong.
>
>>
>> Questions:
>> 1. Is this se->vruntime underflow during drastic weight reduction considered acceptable
>> within the current EEVDF design?
>
> All the vruntime considerations use signed delta from the zero_vruntime
> so based on my understanding, it should be fine.
>
>> 2. Should rescale_entity() apply a limit check (similar to entity_lag()) immediately
>> after scaling the vlag to prevent it from escaping reasonable bounds?
>>
>> Something like this?
>>
>> Signed-off-by: Chen Jinghuang <chenjinghuang2@xxxxxxxxxx>
>> ---
>> kernel/sched/fair.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3ebec186f982..351e2f7b4b28 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4046,6 +4046,15 @@ rescale_entity(struct sched_entity *se, unsigned long weight, bool rel_vprot)
>> */
>> se->vlag = div64_long(se->vlag * old_weight, weight);
>
> This vlag is already calculated using entity_lag() which is within the
> theoretical limits and is being scaled proportionally.
>
>>
>> + {
>> + u64 max_slice = cfs_rq_max_slice(cfs_rq_of(se)) + TICK_NSEC;
>
> This should also consider if "se" has the largest slice since it is
> outside of the cfs_rq at this point.
>
>> + s64 limit;
>> +
>> + limit = calc_delta_fair(max_slice, se);
>
> update_load_set() is done after rescale_entity(). The above should
> actually scale the limits based on the new weight.
>
In the trace data I recorded above, the limit = 724001488558 was computed by calc_delta_fair after
se was scaled by update_load_set(), like this

+ rescale_entity(se, weight, rel_vprot);
+
+ update_load_set(&se->load, weight);
+
+ dl_record(se, cfs_rq, 1, _RET_IP_);

dl_record() can capture the values of vlag, vruntime, limit and other data at the time of recording.
The scaled se->vlag after rescale_entity() is 1242814741118, which does not fall within the expected range
defined by the corresponding limit calculated for that se.
>> +
>> + se->vlag = clamp(se->vlag, -limit, limit);
>
> If we use the updated weight, se->vlag should already be within
> those limits.
>
Currently, the calculation of limit is misplaced. Move it after update_load_set() to ensure it is calculated
right after the weights for se are updated.

The correvted code is as follows:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 351e2f7b4b28..eb273a9047f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4104,6 +4104,11 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,

update_load_set(&se->load, weight);

+ u64 max_slice = cfs_rq_max_slice(cfs_rq) + TICK_NSEC;
+ s64 limit = calc_delta_fair(max_slice, se);
+
+ se->vlag = clamp(se->vlag, -limit, limit);
+
do {
u32 divider = get_pelt_divider(&se->avg);
se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum, divider);
>> + }
>> +
>> /*
>> * DEADLINE
>> * --------
>