Re: the stuttering regression in 7.0: should I have done something different

From: Thomas Gleixner

Date: Thu May 14 2026 - 06:24:56 EST


On Thu, May 14 2026 at 00:24, Tony Rodriguez wrote:
> Initial validation of the test patches for v7.0.6 and 7.1-rc3 on the
> S7-2 looks promising: I have not observed panics, timer delays, or other
> timer-related issues so far. I will pause broader validation on the S7-2
> and T7-1 until I receive your recommendation or any requested revisions
> (see inline comments below).
>
> Note: I did see an intermittent error on the S7-2 running 7.1-rc3,
> usually when the system is under heavy load during a kernel build. I’m
> not sure whether it is a separate problem?
>
> "[676.464681] BUG: Bad rss-counter state mm:000000008d9f1cf2
> type:MM_FILEPAGES val:-4096 Comm:cc1 Pid:78165".

That's unrelated and an accounting issue in the MM code. Please report
it separately to the MM people.

> On 5/13/26 1:28 PM, Thomas Gleixner wrote:
>> I'm willing to bet a round of beers at the next conference that this is
>> the problem and that it will magically disappear when you change that
>> condition to:
>>
>> return (read_cnt() - exp) >= 0 ? -ETIME : 0;
>
> Attempted to locate "return (read_cnt() - exp) >= 0 ? -ETIME : 0;" but
> could not find an exact match. After additional inspection I updated the
> following functions "tick_add_compare()" and "stick_add_compare()" in
> arch/sparc/kernel/time_64.c to from "> 0L" to ">= 0L". This appears to
> have resolved the lost-timer behavior.

I condensed the logic for illustration and rightfully assumed that you
will figure it out. :)

> --- time_64.c.orig
> +++ time_64.c
> @@ -146,7 +146,7 @@
>                              : "=r" (new_tick));
>         new_tick &= ~TICKCMP_IRQ_BIT;
>
> -       return ((long)(new_tick - (orig_tick+adj))) > 0L;
> +       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
>  }
>
>  static unsigned long tick_add_tick(unsigned long adj)
> @@ -277,7 +277,7 @@
>                              : "=r" (new_tick));
>         new_tick &= ~TICKCMP_IRQ_BIT;
>
> -       return ((long)(new_tick - (orig_tick+adj))) > 0L;
> +       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
>  }

Looks correct, but you missed the one in hbtick_add_compare() which has
the same issue.

>> --- a/kernel/time/clockevents.c
>> +++ b/kernel/time/clockevents.c
>> @@ -381,6 +381,8 @@ int clockevents_program_event(struct clo
>> if (dev->set_next_event(dev->min_delta_ticks, dev)) {
>> if (!force || clockevents_program_min_delta(dev))
>> return -ETIME;
>> + } else if (delta <= 0) {
>> + dev->next_event = ktime_add_ns(ktime_get(), dev->min_delta_ns);
>> }
>> dev->next_event_forced = 1;
>> return 0;
>>
> You mentioned this kernel/time/clockevents.c patch is optional, but I
> propose revising clockevents_program_event(). If the requested event
> time is already at or before now, record a sane next_event (now +
> min_delta) so core code sees a future expected time and can behave
> correctly. Does this seem reasonable?

The related core code only cares what the last programmed expiry value
in clock monotonic (i.e. the @expires argument) was. And the only
interesting information is whether it's in the future or not. If it's in
the past then it does not matter how much in the past it is.

Whatever we fake into it is never going to reflect anything related to
reality anyway and there is no guarantee that the code which reads it
will see a future expected time depending on the time elapsed between
faking it and reading it. So it's truly a cosmetic exercise for no real
value.

Thanks,

tglx