Re: [PATCH 04/10] sched_ext: Fix ops.cgroup_move() invocation kf_mask and rq tracking

From: Andrea Righi

Date: Fri Apr 10 2026 - 12:29:06 EST


On Thu, Apr 09, 2026 at 08:30:40PM -1000, Tejun Heo wrote:
> sched_move_task() invokes ops.cgroup_move() inside task_rq_lock(tsk), so
> @p's rq lock is held. The SCX_CALL_OP_TASK invocation mislabels this:
>
> - kf_mask = SCX_KF_UNLOCKED (== 0), claiming no lock is held.
> - rq = NULL, so update_locked_rq() doesn't run and scx_locked_rq()
> returns NULL.
>
> Switch to SCX_KF_REST and pass task_rq(p), matching ops.set_cpumask()
> from set_cpus_allowed_scx().
>
> Three effects:
>
> - scx_bpf_task_cgroup() becomes callable (was rejected by
> scx_kf_allowed(__SCX_KF_RQ_LOCKED)). Safe; rq lock is held.
>
> - scx_bpf_dsq_move() is now rejected (was allowed via the unlocked
> branch). Calling it while holding an unrelated task's rq lock is
> risky; rejection is correct.
>
> - scx_bpf_select_cpu_*() previously took the unlocked branch in
> select_cpu_from_kfunc() and called task_rq_lock(p, &rf), which
> would deadlock against the already-held pi_lock. Now it takes the
> locked-rq branch and is rejected with -EPERM via the existing
> kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE) check. Latent
> deadlock fix.
>
> No in-tree scheduler is known to call any of these from ops.cgroup_move().

Similarly to the ops.set_cpumask() fix maybe add:

Fixes: 18853ba782be ("sched_ext: Track currently locked rq")

With that:
Reviewed-by: Andrea Righi <arighi@xxxxxxxxxx>

Thanks,
-Andrea

>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
> kernel/sched/ext.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 6ca0085903e0..f7db8822a544 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4397,7 +4397,7 @@ void scx_cgroup_move_task(struct task_struct *p)
> */
> if (SCX_HAS_OP(sch, cgroup_move) &&
> !WARN_ON_ONCE(!p->scx.cgrp_moving_from))
> - SCX_CALL_OP_TASK(sch, SCX_KF_UNLOCKED, cgroup_move, NULL,
> + SCX_CALL_OP_TASK(sch, SCX_KF_REST, cgroup_move, task_rq(p),
> p, p->scx.cgrp_moving_from,
> tg_cgrp(task_group(p)));
> p->scx.cgrp_moving_from = NULL;
> --
> 2.53.0
>