Re: [Question] Possible scheduler regression: occasional ~3ms wake-up to sched-switch delay in cyclictest on kernel 6.6 and 6.12
From: Jiping Ma
Date: Thu Apr 30 2026 - 00:43:01 EST
Thanks quickly reply. The test results get well after the up command.
./cyclictest -m -N --policy=SCHED_OTHER -t 1 -D 30m -a 3
T: 0 ( 736) P: 0 I:1000 C:1776085 Min: 3135 Act: 53535 Avg: 77421 Max: 6946557
Thanks,
Jiping
>
>That should make preemption faster on v6.6.
>
>> 6.12 110,861
>
>At v6.12, the DELAY_DEQUEUE feature seems to have improved the tail
>by decaying lag while keeping the task on the runqueue. You can
>still try NO_RUN_TO_PARITY but the delay might still be close to the
>granularity of the 1ms tick.
>
>In v6.12, there is also PREEMPT_SHORT which would selectively disable
>RUN_TO_PARITY if a task of shorter slice wakes up - you can also try
>playing with that where you align the slice of cyclictest close to
>its actual runtime.
>
>Vincent was working on this exact same problem and there should be
>a bunch of fixes lined up in v7.1-rc1 + few more being discussed in
>the mailing list. In no particular order:
>
>https://lore.kernel.org/lkml/177583163838.801717.18173795801708799547.tip-bot2@tip-bot2/
>https://lore.kernel.org/lkml/177521940443.1647592.2249539659928731910.tip-bot2@tip-bot2/
>https://lore.kernel.org/lkml/177736116339.3521451.16868557879769343251.tip-bot2@tip-bot2/
>https://lore.kernel.org/lkml/20260429164102.1388139-1-vincent.guittot@xxxxxxxxxx/
>https://lore.kernel.org/lkml/20260417171642.3539914-1-vincent.guittot@xxxxxxxxxx/
>
>The commit log from first few patches should have the slice values
>that Vincent was testing with.
>
>>
>> Key observation:
>> The majority of cyclictest samples show latency similar to kernel 5.15. However, there are occasional spikes where the wake-up to sched-switch gap reaches ~3ms. This suggests the scheduler is intermittently failing to preempt the running tasks promptly when the cyclictest thread wakes up. Since all tasks (cyclictest + 6x cal.sh) are SCHED_OTHER on the same CPU, this may be related to changes in the EEVDF scheduler introduced in kernel 6.6.
>>
>> Questions:
>> 1. Is this occasional preemption delay a known side effect of the EEVDF scheduler replacing CFS in kernel 6.6+?
>> 2. Is this by design, or should it be considered a scheduler bug?
>
>This is by design but there are workarounds to get around this by
>setting a slice value and disabling RUN_TO_PARITY.
>
>>
>> 3ms delay kernel trace log:
>> cal.sh-917 [003] d.h2. 1667.138647: sched_waking: comm=cyclictest pid=943 prio=120 target_cpu=003
>> cal.sh-917 [003] d.h3. 1667.138648: sched_stat_runtime: comm=cal.sh pid=917 runtime=101740 [ns]
>> cal.sh-917 [003] d.h3. 1667.138649: sched_wakeup: comm=cyclictest pid=943 prio=120 target_cpu=003
>> cal.sh-917 [003] d.h.. 1667.138649: hrtimer_expire_exit: hrtimer=00000000c56c84b4
>> cal.sh-917 [003] d.h.. 1667.138649: write_msr: 6e0, value 42dd8d2cf32
>> cal.sh-917 [003] d.h.. 1667.138649: local_timer_exit: vector=236
>> cal.sh-917 [003] d.h.. 1667.139540: local_timer_entry: vector=236
>> cal.sh-917 [003] d.h.. 1667.139540: hrtimer_expire_entry: hrtimer=000000007fd337f4 function=tick_sched_timer now=1665335000684
>> cal.sh-917 [003] d.h1. 1667.139541: notifier_run: pvclock_gtod_notify [kvm]
>> cal.sh-917 [003] d.h.. 1667.139541: rcu_utilization: Start scheduler-tick
>> cal.sh-917 [003] d.h.. 1667.139541: rcu_utilization: End scheduler-tick
>> cal.sh-917 [003] d.h.. 1667.139542: read_msr: e8, value 4892b7c3d38
>> cal.sh-917 [003] d.h.. 1667.139542: read_msr: e7, value 38088ff481e
>> cal.sh-917 [003] d.h1. 1667.139542: sched_stat_runtime: comm=cal.sh pid=917 runtime=894430 [ns]
>> cal.sh-917 [003] d.h.. 1667.139543: hrtimer_expire_exit: hrtimer=000000007fd337f4
>> cal.sh-917 [003] d.h1. 1667.139543: hrtimer_start: hrtimer=000000007fd337f4 function=tick_sched_timer expires=1665336000000 softexpires=1665336000000 mode=ABS
>> cal.sh-917 [003] d.h.. 1667.139543: write_msr: 6e0, value 42dd8fc30e4
>> cal.sh-917 [003] d.h.. 1667.139543: local_timer_exit: vector=236
>> cal.sh-917 [003] d.h.. 1667.140540: local_timer_entry: vector=236
>> cal.sh-917 [003] d.h.. 1667.140540: hrtimer_expire_entry: hrtimer=000000007fd337f4 function=tick_sched_timer now=1665336000925
>> cal.sh-917 [003] d.h1. 1667.140541: notifier_run: pvclock_gtod_notify [kvm]
>> cal.sh-917 [003] d.h.. 1667.140541: rcu_utilization: Start scheduler-tick
>> cal.sh-917 [003] d.h.. 1667.140541: rcu_utilization: End scheduler-tick
>> cal.sh-917 [003] d.h.. 1667.140542: read_msr: e8, value 4892bb18743
>> cal.sh-917 [003] d.h.. 1667.140542: read_msr: e7, value 380892863df
>> cal.sh-917 [003] d.h1. 1667.140542: sched_stat_runtime: comm=cal.sh pid=917 runtime=1000134 [ns]
>> cal.sh-917 [003] d.h.. 1667.140543: hrtimer_expire_exit: hrtimer=000000007fd337f4
>> cal.sh-917 [003] d.h1. 1667.140543: hrtimer_start: hrtimer=000000007fd337f4 function=tick_sched_timer expires=1665337000000 softexpires=1665337000000 mode=ABS
>> cal.sh-917 [003] d.h.. 1667.140543: write_msr: 6e0, value 42dd92592a6
>> cal.sh-917 [003] d.h.. 1667.140544: local_timer_exit: vector=236
>> cal.sh-917 [003] d.h.. 1667.141540: local_timer_entry: vector=236
>> cal.sh-917 [003] d.h.. 1667.141540: hrtimer_expire_entry: hrtimer=000000007fd337f4 function=tick_sched_timer now=1665337000748
>> cal.sh-917 [003] d.h1. 1667.141541: notifier_run: pvclock_gtod_notify [kvm]
>> cal.sh-917 [003] d.h.. 1667.141541: rcu_utilization: Start scheduler-tick
>> cal.sh-917 [003] d.h.. 1667.141541: softirq_raise: vec=9 [action=RCU]
>> cal.sh-917 [003] d.h.. 1667.141541: rcu_utilization: End scheduler-tick
>> cal.sh-917 [003] d.h.. 1667.141542: read_msr: e8, value 4892be6cf1e
>> cal.sh-917 [003] d.h.. 1667.141542: read_msr: e7, value 38089517df9
>> cal.sh-917 [003] dNh1. 1667.141542: sched_stat_runtime: comm=cal.sh pid=917 runtime=999991 [ns]
> ^
>Since NEED_RESCHED is only set here, it is safe to assume cal.sh was
>likely kept running by RUN_TO_PARITY for 3ms after it was selected which
>corresponds to the default base_slice.
>
>> cal.sh-917 [003] dNh.. 1667.141543: hrtimer_expire_exit: hrtimer=000000007fd337f4
>> cal.sh-917 [003] dNh1. 1667.141543: hrtimer_start: hrtimer=000000007fd337f4 function=tick_sched_timer expires=1665338000000 softexpires=1665338000000 mode=ABS
>> cal.sh-917 [003] dNh.. 1667.141543: write_msr: 6e0, value 42dd94ef44f
>> cal.sh-917 [003] dNh.. 1667.141543: local_timer_exit: vector=236
>> cal.sh-917 [003] .Ns.. 1667.141544: softirq_entry: vec=9 [action=RCU]
>> cal.sh-917 [003] .Ns.. 1667.141544: rcu_utilization: Start RCU core
>> cal.sh-917 [003] dNs1. 1667.141544: rcu_grace_period: rcu_preempt 589121 cpuend
>> cal.sh-917 [003] dNs.. 1667.141544: rcu_batch_start: rcu_preempt CBs=1 bl=10
>> cal.sh-917 [003] dNs.. 1667.141545: rcu_segcb_stats: SegCbDequeued seglen: (DONE=0, WAIT=0, NEXT_READY=0, NEXT=0) gp_seq: (DONE=0, WAIT=589124, NEXT_READY=589124, NEXT=0)
>> cal.sh-917 [003] .Ns.. 1667.141545: rcu_invoke_callback: rcu_preempt rhp=00000000be314949 func=file_free_rcu
>> cal.sh-917 [003] .Ns.. 1667.141545: kmem_cache_free: call_site=file_free_rcu+0x35/0x60 ptr=00000000be314949 name=kmalloc-cg-256
>> cal.sh-917 [003] dNs.. 1667.141545: rcu_batch_end: rcu_preempt CBs-invoked=1 idle=.S..
>> cal.sh-917 [003] .Ns.. 1667.141545: rcu_utilization: End RCU core
>> cal.sh-917 [003] .Ns.. 1667.141546: softirq_exit: vec=9 [action=RCU]
>> cal.sh-917 [003] dN.1. 1667.141546: rcu_utilization: Start context switch
>> cal.sh-917 [003] dN.1. 1667.141546: rcu_utilization: End context switch
>> cal.sh-917 [003] dN.2. 1667.141546: sched_stat_runtime: comm=cal.sh pid=917 runtime=4083 [ns]
>> cal.sh-917 [003] d..2. 1667.141547: sched_switch: prev_comm=cal.sh prev_pid=917 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=943 next_prio=120
>--
>Thanks and Regards,
>Prateek
>