Re: [BULK] Re: [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups

From: luca abeni

Date: Tue May 19 2026 - 17:06:01 EST


Hi,

I think we are converging... But I still have some doubts (probably due
to the fact that I do not know the cgroup v2 API well):


On Mon, 18 May 2026 08:47:37 -1000
Tejun Heo <tj@xxxxxxxxxx> wrote:
[...]
> I wonder whether it can be generalized more. Would something like the
> following work? I'm going to ignore period for the sake of simplicity
> as it doesn't seem to affect admission decisions.
>
> - There is no root cgroup.rt.max in line with other control knobs.

Well, the reason we had "rt.{runtime,period}_us" (now "rt.max") in the
root cgroup is that RT cgroups are scheduled by dl entities (one dl
entity per cpu), and these dl entities must be accounted for in the
SCHED_DEADLINE admission test... The easiest way to do this is to
reserve a fixed fraction of the CPU time to RT cgroups, leaving the
remaining fraction to SCHED_DEADLINE tasks. And we used rt.max to
configure the fraction of CPU time reserved for RT cgroups; do you have
suggestions about alternative interfaces for this configuration?


> - max means running in the nearest ancestor that has budget
> configuration. Obviously, if no one has budget configured, run in
> root.

Uh... OK; I understand this, now, but I suspect this will increase the
complexity of the admission control... Yuri, what do you think?


> - Setting a budget is subject to admission control in both directions
> - the budget source (the nearest budgeted ancestor, or the root pool
> if none) should have enough to give out and the target budget should
> be big enough to contain the actual usages and !max descendants in
> the subtree. Going to max is always fine - the source previously gave
> the budget out, so it has room to take everything back.

OK... Just to understand: if we consider this situation
root cgroup -> G1 (50, 100) -> G2 (10, 100)
and G1 switches to "max", what happens to G2? Does it stay (10, 100),
or is it forced to switch to "max", too?


I was thinking about enforcing that a cgroup can have runtime > 0 only
if it is a direct child of the root cgroup, or if its parent has
runtime > 0 and is not "max" (so, in the previous example G1 can switch
to "max" only if G2 sets its runtime to 0). Could this be acceptable?


Thanks,
Luca


>
> It seems like the above would give fairly generic behavior without
> abrupt system-wide switches while staying relatively close to the
> behaviors of other resource knobs. I could be missing something tho.
> Would something like this work?
>
> Thanks.
>