Re: [REGRESSION] osnoise: "eventpoll: Replace rwlock with spinlock" causes ~50µs noise spikes on isolated PREEMPT_RT cores
From: Crystal Wood
Date: Thu Mar 26 2026 - 14:14:57 EST
On Thu, 2026-03-26 at 16:00 +0200, Ionut Nechita (Wind River) wrote:
> Hi,
>
> I'm reporting a regression introduced by commit 0c43094f8cc9
> ("eventpoll: Replace rwlock with spinlock"), backported to stable 6.12.y.
>
> On a PREEMPT_RT system with nohz_full isolated cores, this commit causes
> significant osnoise degradation on the isolated CPUs.
>
> Setup:
> - Kernel: 6.12.78 with PREEMPT_RT
> - Hardware: x86_64, dual-socket (CPUs 0-63)
> - Boot params: nohz_full=1-16,33-48 isolcpus=nohz,domain,managed_irq,1-16,33-48
> rcu_nocbs=1-31,33-63 kthread_cpus=0,32 irqaffinity=17-31,49-63
> - Tool: osnoise tracer (./osnoise -c 1-16,33-48)
Is SMT disabled?
> With commit applied (spinlock, kernel 6.12.78-vanilla-0):
>
> CPU RUNTIME MAX_NOISE AVAIL% NOISE NMI IRQ SIRQ Thread
> [001] 950000 50163 94.719% 14 0 6864 0 5922
> [004] 950000 50294 94.705% 14 0 6864 0 5920
> [007] 950000 49782 94.759% 14 0 6864 1 5921
> [033] 950000 49528 94.786% 15 0 6864 2 5922
> [016] 950000 48551 94.889% 20 0 6863 19 5942
> [008] 950000 44343 95.332% 14 0 6864 0 5925
>
> With commit reverted (rwlock restored, kernel 6.12.78-vanilla-1):
>
> CPU RUNTIME MAX_NOISE AVAIL% NOISE NMI IRQ SIRQ Thread
> [001] 950000 0 100.000% 0 0 6 0 0
> [004] 950000 0 100.000% 0 0 4 0 0
> [007] 950000 0 100.000% 0 0 4 0 0
> [033] 950000 0 100.000% 0 0 4 0 0
> [016] 950000 0 100.000% 0 0 5 0 0
> [008] 950000 7 99.999% 7 0 5 0 0
>
> Summary across all isolated cores (32 CPUs):
>
> With spinlock With rwlock (reverted)
> MAX noise (ns): 44,343 - 51,869 0 - 10
> IRQ count/sample: ~6,650 - 6,870 3 - 7
> Thread noise/sample: ~5,700 - 5,940 0 - 1
> CPU availability: 94.5% - 95.3% ~100%
>
> The regression is roughly 3 orders of magnitude in noise on isolated
> cores. The test was run over many consecutive samples and the pattern
> is consistent: with the spinlock, every isolated core sees thousands
> of IRQs and ~50µs of noise per 950ms sample window. With the rwlock,
> the cores are essentially silent.
>
> Note that CPU 016 occasionally shows SIRQ noise (softirq) with both
> kernels, which is a separate known issue with the tick on the first
> nohz_full CPU. The eventpoll regression is the dominant noise source.
>
> My understanding of the root cause: the original rwlock allowed
> ep_poll_callback() (producer side, running from IRQ context on any CPU)
> to use read_lock, which does not cause cross-CPU contention on isolated
> cores when no local epoll activity exists. With the spinlock conversion,
> on PREEMPT_RT spinlock_t becomes an rt_mutex. This means that even if
> the isolated core is not involved in any epoll activity, the lock's
> cacheline bouncing and potential PI-boosted wakeups from housekeeping
> CPUs can inject noise into the isolated cores via IPI or cache
> invalidation traffic.
That sounds like a general isolation problem... it's not a bug for non-
isolated CPUs to bounce cachelines or send IPIs to each other.
Whether it's IPIs or not, osnoise is showing IRQs on the isolated CPUs,
so I'd look into which IRQs and why. Even with the patch reverted,
there are some IRQs on the isolated CPUs.
>
> The commit message acknowledges the throughput regression but argues
> real workloads won't notice. However, for RT/latency-sensitive
> deployments with CPU isolation, the impact is severe and measurable
> even with zero local epoll usage.
>
> I believe this needs either:
> a) A revert of the backport for stable RT trees, or
Even if the patch weren't trying to address an RT issue in the first
place, this would just be a bandaid rather than a real solution.
> b) A fix that avoids the spinlock contention path for isolated CPUs
If there's truly no epoll activity on the isolated CPUs, when would you
ever reach that path on an isolated CPU?
-Crystal