Re: [PATCH 5/5] sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()

From: K Prateek Nayak

Date: Tue Jun 02 2026 - 05:00:01 EST


Hello Peter,

On 6/2/2026 2:02 PM, Peter Zijlstra wrote:
> On Tue, Jun 02, 2026 at 12:31:36PM +0530, K Prateek Nayak wrote:
>
>> My mind is taking a while to grasp the ->pick_next_task() removal.
>>
>> [1] https://lore.kernel.org/lkml/20260602050005.11160-1-kprateek.nayak@xxxxxxx/
>
> Yes, I'm familiar with that struggle. If you can manage to write a
> comment that clarifies it somewhat that would be awesome.
>
> I've tried, but every time I read it back after a few days, I'm just
> left more confused that I was at the beginning :-(

Here is an attempt on top of v2.1:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fa8c0b1a1cf1..9a14d75ff671 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9889,7 +9889,13 @@ struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf)
throttled = false;

do {
- /* Might not have done put_prev_entity() */
+ /*
+ * Might not have done put_prev_entity().
+ * The cfs_rq gets throttled here or via
+ * pick_task() -> set_next_task() where
+ * sched_cfs_bandwidth_slice() worth of
+ * runtime is requested for cfs_rq->curr.
+ */
if (cfs_rq->curr && cfs_rq->curr->on_rq)
update_curr(cfs_rq);

@@ -15003,7 +15009,12 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
break;

set_next_entity(cfs_rq, se, first);
- /* ensure bandwidth has been allocated on our new cfs_rq */
+ /*
+ * Ensure bandwidth has been allocated on our new cfs_rq.
+ * If this hierarchy was freshly picked, update_curr()
+ * was skipped for this cfs_rq. Request for the correct
+ * bandwidth slice now that cfs_rq->curr is updated.
+ */
throttled |= account_cfs_rq_runtime(cfs_rq, 0);
}

--
Thanks and Regards,
Prateek