Re: [PATCH] sched: restore timer_slack_ns when resetting RT policy on fork

From: Guanyou Chen

Date: Thu May 21 2026 - 02:38:17 EST


Hi Cunlong,

Thanks for looking at this.

You're right that this changes the behavior for non-RT tasks - the
child's default_timer_slack_ns would come from dup_task_struct (parent's
default_timer_slack_ns) rather than parent's timer_slack_ns.

However, looking at the original commit 6976675d9404 that introduced
default_timer_slack_ns, its purpose is described as a "reset target" for
prctl(PR_SET_TIMERSLACK, 0). There's no documented requirement that
children should inherit the parent's effective slack as their default.

That said, I'm happy to narrow the fix to only handle the RT parent
case if maintainers prefer preserving the existing behavior. Will wait
for their input.

Thanks
Guanyou


Cunlong Li <shenxiaogll@xxxxxxxxx> 于2026年5月21日周四 11:18写道:
>
> On Thu, May 21, 2026 at 10:52:50AM +0800, Guanyou.Chen wrote:
> > Commit ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values
> > for realtime tasks") sets timer_slack_ns to 0 for RT tasks in
> > __setscheduler_params(). However, when an RT task with SCHED_RESET_ON_FORK
> > creates child threads, the children inherit timer_slack_ns=0 from the
> > parent. sched_fork() resets the child's policy to SCHED_NORMAL but does
> > not restore timer_slack_ns, leaving the child permanently running with
> > zero slack.
> >
> > Additionally, init_task never initialized default_timer_slack_ns, so all
> > processes in the system have default_timer_slack_ns=0 inherited from init.
> > The original fork code masked this by using timer_slack_ns (50000) as the
> > source for default_timer_slack_ns. After ed4fb6d7ef68, RT tasks have
> > timer_slack_ns=0, exposing this latent bug.
> >
> > This causes unnecessary timer interrupts and increased power consumption,
> > as NORMAL threads with slack=0 prevent timer coalescing.
> >
> > Fix this by:
> > 1. Initializing default_timer_slack_ns=50000 in init_task.
> > 2. In copy_process(), removing the incorrect default_timer_slack_ns
> > override (dup_task_struct already copies both timer_slack_ns and
> > default_timer_slack_ns correctly from the parent).
> > 3. In sched_fork(), restoring timer_slack_ns from default_timer_slack_ns
> > when resetting from RT/DL to NORMAL policy.
> >
> > Before this fix (RT parent, RESET_ON_FORK, 32 child threads usleep(1)):
> > child slack=0, avg_sleep=38us, ~832K interrupts/s
> >
> > After this fix:
> > child slack=50000, avg_sleep=88us, ~363K interrupts/s
> >
> > Fixes: 6976675d9404 ("hrtimer: create a "timer_slack" field in the task struct")
> > Fixes: ed4fb6d7ef68 ("hrtimer: Use and report correct timerslack values for realtime tasks")
> > Reported-by: Qiaoting.Lin <linqiaoting@xxxxxxxxxx>
> > Signed-off-by: Guanyou.Chen <chenguanyou@xxxxxxxxxx>
> > Signed-off-by: Chunhui.Li <chunhui.li@xxxxxxxxxxxx>
> > ---
> > init/init_task.c | 1 +
> > kernel/fork.c | 2 --
> > kernel/sched/core.c | 1 +
> > 3 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 5c838757fc10..57ff8dae9bfb 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -170,6 +170,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> > INIT_CPU_TIMERS(init_task)
> > .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
> > .timer_slack_ns = 50000, /* 50 usec default slack */
> > + .default_timer_slack_ns = 50000, /* 50 usec default slack */
> > .thread_pid = &init_struct_pid,
> > .thread_node = LIST_HEAD_INIT(init_signals.thread_head),
> > #ifdef CONFIG_AUDIT
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 65113a304518..8358df80e11d 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2133,8 +2133,6 @@ __latent_entropy struct task_struct *copy_process(
> > retval = -EAGAIN;
> > #endif
> >
> > - p->default_timer_slack_ns = current->timer_slack_ns;
> Hi Guanyou,
>
> This changes behavior for normal (non-RT) tasks. If a process calls
> prctl(PR_SET_TIMERSLACK, 200000) and then forks, the child currently
> gets default_timer_slack_ns=200000 (the parent's effective slack).
> With this removal, the child would get default_timer_slack_ns=50000
> (the parent's original default), so a subsequent PR_SET_TIMERSLACK(0)
> in the child would reset to a different value than before.
> I think the fix should be narrowed to only handle the RT parent case:
> if (rt_or_dl_task_policy(current))
> p->default_timer_slack_ns = current->default_timer_slack_ns;
> else
> p->default_timer_slack_ns = current->timer_slack_ns;
> Thanks
> > -
> > #ifdef CONFIG_PSI
> > p->psi_flags = 0;
> > #endif
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b7f77c165a6e..b1a241810ce0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4649,6 +4649,7 @@ int sched_fork(u64 clone_flags, struct task_struct *p)
> > p->policy = SCHED_NORMAL;
> > p->static_prio = NICE_TO_PRIO(0);
> > p->rt_priority = 0;
> > + p->timer_slack_ns = p->default_timer_slack_ns;
> > } else if (PRIO_TO_NICE(p->static_prio) < 0)
> > p->static_prio = NICE_TO_PRIO(0);
> >
> > --
> > 2.34.1
> >