Re: [PATCH V3 1/4] Sched: Scheduler time slice extension
From: Prakash Sangappa
Date: Sun May 04 2025 - 21:43:21 EST
> On May 2, 2025, at 2:05 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, May 02, 2025 at 01:59:52AM +0000, Prakash Sangappa wrote:
>> Add support for a thread to request extending its execution time slice on
>> the cpu. The extra cpu time granted would help in allowing the thread to
>> complete executing the critical section and drop any locks without getting
>> preempted. The thread would request this cpu time extension, by setting a
>> bit in the restartable sequences(rseq) structure registered with the kernel.
>>
>> Kernel will grant a 50us extension on the cpu, when it sees the bit set.
>> With the help of a timer, kernel force preempts the thread if it is still
>> running on the cpu when the 50us timer expires. The thread should yield
>> the cpu by making a system call after completing the critical section.
>>
>> Suggested-by: Peter Ziljstra <peterz@xxxxxxxxxxxxx>
>> Signed-off-by: Prakash Sangappa <prakash.sangappa@xxxxxxxxxx>
>> ---
>
>> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
>> index c233aae5eac9..900cb75f6a88 100644
>> --- a/include/uapi/linux/rseq.h
>> +++ b/include/uapi/linux/rseq.h
>> @@ -26,6 +26,7 @@ enum rseq_cs_flags_bit {
>> RSEQ_CS_FLAG_NO_RESTART_ON_PREEMPT_BIT = 0,
>> RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT = 1,
>> RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT = 2,
>> + RSEQ_CS_FLAG_DELAY_RESCHED_BIT = 3,
>> };
>>
>> enum rseq_cs_flags {
>> @@ -35,6 +36,8 @@ enum rseq_cs_flags {
>> (1U << RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL_BIT),
>> RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE =
>> (1U << RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE_BIT),
>> + RSEQ_CS_FLAG_DELAY_RESCHED =
>> + (1U << RSEQ_CS_FLAG_DELAY_RESCHED_BIT),
>> };
>>
>> /*
>> @@ -128,6 +131,10 @@ struct rseq {
>> * - RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE
>> * Inhibit instruction sequence block restart on migration for
>> * this thread.
>> + * - RSEQ_CS_DELAY_RESCHED
>> + * Request by user task to try delaying preemption. With
>> + * use of a timer, extra cpu time upto 50us is granted for this
>> + * thread before being rescheduled.
>> */
>> __u32 flags;
>
> So while it works for testing, this really is a rather crap interface
> for real because userspace cannot up front tell if its going to work or
> not.
>
>> +void rseq_delay_resched_fini(void)
>> +{
>> +#ifdef CONFIG_SCHED_HRTICK
>> + extern void hrtick_local_start(u64 delay);
>> + struct task_struct *t = current;
>> + /*
>> + * IRQs off, guaranteed to return to userspace, start timer on this CPU
>> + * to limit the resched-overdraft.
>> + *
>> + * If your critical section is longer than 50 us you get to keep the
>> + * pieces.
>> + */
>> + if (t->sched_time_delay)
>> + hrtick_local_start(50 * NSEC_PER_USEC);
>> +#endif
>> +}
>
> Should not the interface at least reflect this SCHED_HRTICK status? One,
> slightly hacky way of doing this might to be invert the bit. Have the
> system write a 1 when the feature is present, and have userspace flip it
> to 0 to activate.
>
> A better way might be to add a second bit.
Sure, the second bit would be set when the rseq structure is registered.
Or, could there be an API thru sys_rseq system call to query if this feature is
available? Something Mathieu seems to suggest in his response.
>
> Also, didn't we all agree 50us was overly optimistic and this number
> should be lower?
>
>> diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
>> index cd38f4e9899d..1b2b64fe0fb1 100644
>> --- a/kernel/sched/syscalls.c
>> +++ b/kernel/sched/syscalls.c
>> @@ -1378,6 +1378,11 @@ static void do_sched_yield(void)
>> */
>> SYSCALL_DEFINE0(sched_yield)
>> {
>> + if (IS_ENABLED(CONFIG_RSEQ) && current->sched_time_delay) {
>> + schedule();
>> + return 0;
>> + }
>> +
>> do_sched_yield();
>> return 0;
>> }
>
> Multiple people, very much including Linus, have already said this
> 'cute' hack isn't going to fly. Why is it still here?
I can remove it.
Which system call should be suggested for the application to
yield the cpu at the end of the critical section?
Thanks,
-Prakash