Re: [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork

From: Peter Zijlstra

Date: Thu May 21 2026 - 03:05:40 EST

On Thu, May 21, 2026 at 10:52:50AM +0800, Guanyou.Chen wrote:
> diff --git a/init/init_task.c b/init/init_task.c
> index 5c838757fc10..57ff8dae9bfb 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> INIT_CPU_TIMERS(init_task)
> .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
> .timer_slack_ns = 50000, /* 50 usec default slack */
> + .default_timer_slack_ns = 50000, /* 50 usec default slack */
> .thread_pid = &init_struct_pid,
> .thread_node = LIST_HEAD_INIT(init_signals.thread_head),
> #ifdef CONFIG_AUDIT
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 65113a304518..8358df80e11d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
> retval = -EAGAIN;
> #endif
>
> - p->default_timer_slack_ns = current->timer_slack_ns;
> -
> #ifdef CONFIG_PSI
> p->psi_flags = 0;
> #endif

Cunlong makes a good point in that this changes behaviour. That said I
do find the current behaviour 'odd'.

*IF* we want to change this (and changing behaviour is always dodgy),
then it should be a separate patch with a separate justification.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7f77c165a6e..b1a241810ce0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
> p->policy = SCHED_NORMAL;
> p->static_prio = NICE_TO_PRIO(0);
> p->rt_priority = 0;
> + p->timer_slack_ns = p->default_timer_slack_ns;
> } else if (PRIO_TO_NICE(p->static_prio) < 0)
> p->static_prio = NICE_TO_PRIO(0);

Yes, this matches __setscheduler_param(). And yes, this wants to be
done.

Anyway, while looking at all this I found that the manpages specify
RESET_ON_FORK to apply to CAP_SYS properties; which is a tad awkward,
esp if we end up allowing unpriv access to DL (or even FIFO/RR when
isolated in a bandwidth group).

Additionally, it doesn't look like PR_SET_TIMERSLACK is CAP_SYS guarded
itself, so this is all a bit of a mess.