Re: [Question] Possible scheduler regression: occasional ~3ms wake-up to sched-switch delay in cyclictest on kernel 6.6 and 6.12
From: K Prateek Nayak
Date: Wed Apr 29 2026 - 23:21:32 EST
Hello Jiping,
On 4/30/2026 8:26 AM, Jiping Ma wrote:
> [You don't often get email from jiping.ma2@xxxxxxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi,
>
> We are running cyclictest on x86 boards with standard (non-RT) kernels and observed an occasional but significant latency spike on kernel version 6.6 and 6.12.
>
> Issue:
> During cyclictest under CPU contention, we occasionally see a ~3ms delay from wake-up to sched-switch ion kernel 6.6 and kernel 6.12. Most of the time the latency is comparable to kernel 5.15, but these sporadic spikes significantly impact the max latency. This behavior is not observed on kernel 5.15 under the same conditions.
That seems equivalent of a base_slice so the cyclictest is perhaps
waking up with positive lag (time owed to the scheduler) and that is
the reason for that large tail latency.
>
> Test setup:
>
> Platform: x86
> Kernel type: std (non-RT)
> Cyclictest command: ./cyclictest -m -N --policy=SCHED_OTHER -t 1 -D 30m -a 3
> CPU load: 6 instances of a pure CPU calculation shell script (cal.sh) bound to CPU 3, running in background to create contention
> cal.sh (CPU stress script):
>
> #!/bin/sh
> x=1
> while :; do
> i=0
> while [ $i -lt 100000 ]; do
> x=$(( (x * 1103515245 + 12345) & 0x7fffffff ))
> i=$((i + 1))
> done
> done
>
> Results (max latency, ns):
> Kernel cyclictest max (ns)
> 5.15 53,461
> 6.6 251,926
Can you try running with:
echo NO_RUN_TO_PARITY > /sys/kernel/debug/sched/features
That should make preemption faster on v6.6.
> 6.12 110,861
At v6.12, the DELAY_DEQUEUE feature seems to have improved the tail
by decaying lag while keeping the task on the runqueue. You can
still try NO_RUN_TO_PARITY but the delay might still be close to the
granularity of the 1ms tick.
In v6.12, there is also PREEMPT_SHORT which would selectively disable
RUN_TO_PARITY if a task of shorter slice wakes up - you can also try
playing with that where you align the slice of cyclictest close to
its actual runtime.
Vincent was working on this exact same problem and there should be
a bunch of fixes lined up in v7.1-rc1 + few more being discussed in
the mailing list. In no particular order:
https://lore.kernel.org/lkml/177583163838.801717.18173795801708799547.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/177521940443.1647592.2249539659928731910.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/177736116339.3521451.16868557879769343251.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/20260429164102.1388139-1-vincent.guittot@xxxxxxxxxx/
https://lore.kernel.org/lkml/20260417171642.3539914-1-vincent.guittot@xxxxxxxxxx/
The commit log from first few patches should have the slice values
that Vincent was testing with.
>
> Key observation:
> The majority of cyclictest samples show latency similar to kernel 5.15. However, there are occasional spikes where the wake-up to sched-switch gap reaches ~3ms. This suggests the scheduler is intermittently failing to preempt the running tasks promptly when the cyclictest thread wakes up. Since all tasks (cyclictest + 6x cal.sh) are SCHED_OTHER on the same CPU, this may be related to changes in the EEVDF scheduler introduced in kernel 6.6.
>
> Questions:
> 1. Is this occasional preemption delay a known side effect of the EEVDF scheduler replacing CFS in kernel 6.6+?
> 2. Is this by design, or should it be considered a scheduler bug?
This is by design but there are workarounds to get around this by
setting a slice value and disabling RUN_TO_PARITY.
>
> 3ms delay kernel trace log:
> cal.sh-917 [003] d.h2. 1667.138647: sched_waking: comm=cyclictest pid=943 prio=120 target_cpu=003
> cal.sh-917 [003] d.h3. 1667.138648: sched_stat_runtime: comm=cal.sh pid=917 runtime=101740 [ns]
> cal.sh-917 [003] d.h3. 1667.138649: sched_wakeup: comm=cyclictest pid=943 prio=120 target_cpu=003
> cal.sh-917 [003] d.h.. 1667.138649: hrtimer_expire_exit: hrtimer=00000000c56c84b4
> cal.sh-917 [003] d.h.. 1667.138649: write_msr: 6e0, value 42dd8d2cf32
> cal.sh-917 [003] d.h.. 1667.138649: local_timer_exit: vector=236
> cal.sh-917 [003] d.h.. 1667.139540: local_timer_entry: vector=236
> cal.sh-917 [003] d.h.. 1667.139540: hrtimer_expire_entry: hrtimer=000000007fd337f4 function=tick_sched_timer now=1665335000684
> cal.sh-917 [003] d.h1. 1667.139541: notifier_run: pvclock_gtod_notify [kvm]
> cal.sh-917 [003] d.h.. 1667.139541: rcu_utilization: Start scheduler-tick
> cal.sh-917 [003] d.h.. 1667.139541: rcu_utilization: End scheduler-tick
> cal.sh-917 [003] d.h.. 1667.139542: read_msr: e8, value 4892b7c3d38
> cal.sh-917 [003] d.h.. 1667.139542: read_msr: e7, value 38088ff481e
> cal.sh-917 [003] d.h1. 1667.139542: sched_stat_runtime: comm=cal.sh pid=917 runtime=894430 [ns]
> cal.sh-917 [003] d.h.. 1667.139543: hrtimer_expire_exit: hrtimer=000000007fd337f4
> cal.sh-917 [003] d.h1. 1667.139543: hrtimer_start: hrtimer=000000007fd337f4 function=tick_sched_timer expires=1665336000000 softexpires=1665336000000 mode=ABS
> cal.sh-917 [003] d.h.. 1667.139543: write_msr: 6e0, value 42dd8fc30e4
> cal.sh-917 [003] d.h.. 1667.139543: local_timer_exit: vector=236
> cal.sh-917 [003] d.h.. 1667.140540: local_timer_entry: vector=236
> cal.sh-917 [003] d.h.. 1667.140540: hrtimer_expire_entry: hrtimer=000000007fd337f4 function=tick_sched_timer now=1665336000925
> cal.sh-917 [003] d.h1. 1667.140541: notifier_run: pvclock_gtod_notify [kvm]
> cal.sh-917 [003] d.h.. 1667.140541: rcu_utilization: Start scheduler-tick
> cal.sh-917 [003] d.h.. 1667.140541: rcu_utilization: End scheduler-tick
> cal.sh-917 [003] d.h.. 1667.140542: read_msr: e8, value 4892bb18743
> cal.sh-917 [003] d.h.. 1667.140542: read_msr: e7, value 380892863df
> cal.sh-917 [003] d.h1. 1667.140542: sched_stat_runtime: comm=cal.sh pid=917 runtime=1000134 [ns]
> cal.sh-917 [003] d.h.. 1667.140543: hrtimer_expire_exit: hrtimer=000000007fd337f4
> cal.sh-917 [003] d.h1. 1667.140543: hrtimer_start: hrtimer=000000007fd337f4 function=tick_sched_timer expires=1665337000000 softexpires=1665337000000 mode=ABS
> cal.sh-917 [003] d.h.. 1667.140543: write_msr: 6e0, value 42dd92592a6
> cal.sh-917 [003] d.h.. 1667.140544: local_timer_exit: vector=236
> cal.sh-917 [003] d.h.. 1667.141540: local_timer_entry: vector=236
> cal.sh-917 [003] d.h.. 1667.141540: hrtimer_expire_entry: hrtimer=000000007fd337f4 function=tick_sched_timer now=1665337000748
> cal.sh-917 [003] d.h1. 1667.141541: notifier_run: pvclock_gtod_notify [kvm]
> cal.sh-917 [003] d.h.. 1667.141541: rcu_utilization: Start scheduler-tick
> cal.sh-917 [003] d.h.. 1667.141541: softirq_raise: vec=9 [action=RCU]
> cal.sh-917 [003] d.h.. 1667.141541: rcu_utilization: End scheduler-tick
> cal.sh-917 [003] d.h.. 1667.141542: read_msr: e8, value 4892be6cf1e
> cal.sh-917 [003] d.h.. 1667.141542: read_msr: e7, value 38089517df9
> cal.sh-917 [003] dNh1. 1667.141542: sched_stat_runtime: comm=cal.sh pid=917 runtime=999991 [ns]
^
Since NEED_RESCHED is only set here, it is safe to assume cal.sh was
likely kept running by RUN_TO_PARITY for 3ms after it was selected which
corresponds to the default base_slice.
> cal.sh-917 [003] dNh.. 1667.141543: hrtimer_expire_exit: hrtimer=000000007fd337f4
> cal.sh-917 [003] dNh1. 1667.141543: hrtimer_start: hrtimer=000000007fd337f4 function=tick_sched_timer expires=1665338000000 softexpires=1665338000000 mode=ABS
> cal.sh-917 [003] dNh.. 1667.141543: write_msr: 6e0, value 42dd94ef44f
> cal.sh-917 [003] dNh.. 1667.141543: local_timer_exit: vector=236
> cal.sh-917 [003] .Ns.. 1667.141544: softirq_entry: vec=9 [action=RCU]
> cal.sh-917 [003] .Ns.. 1667.141544: rcu_utilization: Start RCU core
> cal.sh-917 [003] dNs1. 1667.141544: rcu_grace_period: rcu_preempt 589121 cpuend
> cal.sh-917 [003] dNs.. 1667.141544: rcu_batch_start: rcu_preempt CBs=1 bl=10
> cal.sh-917 [003] dNs.. 1667.141545: rcu_segcb_stats: SegCbDequeued seglen: (DONE=0, WAIT=0, NEXT_READY=0, NEXT=0) gp_seq: (DONE=0, WAIT=589124, NEXT_READY=589124, NEXT=0)
> cal.sh-917 [003] .Ns.. 1667.141545: rcu_invoke_callback: rcu_preempt rhp=00000000be314949 func=file_free_rcu
> cal.sh-917 [003] .Ns.. 1667.141545: kmem_cache_free: call_site=file_free_rcu+0x35/0x60 ptr=00000000be314949 name=kmalloc-cg-256
> cal.sh-917 [003] dNs.. 1667.141545: rcu_batch_end: rcu_preempt CBs-invoked=1 idle=.S..
> cal.sh-917 [003] .Ns.. 1667.141545: rcu_utilization: End RCU core
> cal.sh-917 [003] .Ns.. 1667.141546: softirq_exit: vec=9 [action=RCU]
> cal.sh-917 [003] dN.1. 1667.141546: rcu_utilization: Start context switch
> cal.sh-917 [003] dN.1. 1667.141546: rcu_utilization: End context switch
> cal.sh-917 [003] dN.2. 1667.141546: sched_stat_runtime: comm=cal.sh pid=917 runtime=4083 [ns]
> cal.sh-917 [003] d..2. 1667.141547: sched_switch: prev_comm=cal.sh prev_pid=917 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=943 next_prio=120
--
Thanks and Regards,
Prateek