Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation
From: Dmitry Ilvokhin
Date: Thu Mar 19 2026 - 09:23:14 EST
On Mon, Mar 16, 2026 at 05:40:50PM +0000, Dmitry Ilvokhin wrote:
[...]
> A possible generic solution is a trace_contended_release() for spin
> locks, for example:
>
> if (trace_contended_release_enabled() &&
> atomic_read(&lock->val) & ~_Q_LOCKED_MASK)
> trace_contended_release(lock);
>
> This might work on x86, but could increase code size and regress
> performance on arches where spin_unlock() is inlined, such as arm64
> under !PREEMPTION.
I took a stab at this idea and submitted an RFC [1].
The implementation builds on your earlier observation from Matthew that
_raw_spin_unlock() is not inlined in most configurations. In those
cases, when the tracepoint is disabled, this adds a single NOP on the
fast path, with the conditional check staying out of line. The measured
text size increase in this configuration is +983 bytes.
For configurations where _raw_spin_unlock() is inlined, the
instrumentation does increase code size more noticeably
(+71 KB in my measurements), since the check and out of line call is
replicated at each call site.
This provides a generic release-side signal for contended locks,
allowing: correlation of lock holders with waiters and measurement of
contended hold times
This RFC addressing the same visibility gap without introducing per-lock
instrumentation.
If this tradeoff is acceptable, this could be a generic alternative to
lock-specific tracepoints.
[1]: https://lore.kernel.org/all/51aad0415b78c5a39f2029722118fa01eac77538.1773858853.git.d@xxxxxxxxxxxx