Re: [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts

From: Tony Rodriguez

Date: Tue May 19 2026 - 19:27:14 EST


Hi Thomas,

Thanks again for the careful analysis of the comparator ordering. After applying the changes in the patch I confirmed S7‑2 and T7‑1 systems no longer hang; timer scheduling now behaves as expected.

I don’t have bandwidth right now to finish the remaining tidy steps and I’d rather avoid delaying this fix. Since we worked on this together and you’re familiar with the issue, would you be willing to take it over so upstream has everything required for approval? I’m happy to answer any questions; if you’re busy I can ask the maintainer to reassign.

Thanks again for the review and the clear explanation.

Tony

On 5/19/26 7:22 AM, Thomas Gleixner wrote:
On SPARC64 the check:

return ((long)(new_tick - (orig_tick + adj))) > 0L;

Is safe only if retries make forward progress. The comparator can
take effect with a latency, so the moment when counter == comparator
may be missed, which can cause delays or hangs on some SPARC64 systems.

For clarity:
exp = orig_tick + adj /* expected comparator value */

The current check requires new_tick to be strictly greater than exp;
equality (new_tick == exp) is treated as not yet passed and the caller
will retry.
That's confusing at best. You really want to explain how the ordering is
similar to what I described in the analysis:

exp = read_cnt() + delta_ticks;
write_cmp(exp);
return (read_cnt() - exp) > 0;

If the counter advanced past the expected expiry time, after writing it,
then the caller will retry, as the calling code does:

return tick.add_compare(delta_ticks) ? -ETIME : 0;

But it won't do so when the counter is equal, which is causing the
problem.

By contrast, using:

return ((long)(new_tick - (orig_tick + adj))) >= 0L;

causes the caller to stop retrying and assume the timer is scheduled;
both equality and greater-than are accepted (new_tick == exp or
new_tick > exp).
It's the other way round. When counter >= expiry time, then the write is
considered failed. If the counter has not yet reached expiry time,
i.e. it is smaller, then it assumes the timer is scheduled.

Signed-off-by: Tony Rodriguez<unixpro1970@xxxxxxxxx>
It would be nice to have a link to the original thread in the change log
itself as that gives people quick access when they are wondering about
this a year down the road.

Thanks,

tglx