Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation

Next message: Maxime Ripard: "Re: [PATCH 01/14] drm/atomic: Document atomic state lifetime"
Previous message: John Groves: "[PATCH V8 06/10] famfs_fuse: Plumb dax iomap and fuse read/write/mmap"
In reply to: Dmitry Ilvokhin: "Re: [PATCH v4 0/5] mm: zone lock tracepoint instrumentation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dmitry Ilvokhin

Date: Thu Mar 19 2026 - 09:23:14 EST

On Mon, Mar 16, 2026 at 05:40:50PM +0000, Dmitry Ilvokhin wrote:

[...]

> A possible generic solution is a trace_contended_release() for spin
> locks, for example:
>
> if (trace_contended_release_enabled() &&
> atomic_read(&lock->val) & ~_Q_LOCKED_MASK)
> trace_contended_release(lock);
>
> This might work on x86, but could increase code size and regress
> performance on arches where spin_unlock() is inlined, such as arm64
> under !PREEMPTION.

I took a stab at this idea and submitted an RFC [1].

The implementation builds on your earlier observation from Matthew that
_raw_spin_unlock() is not inlined in most configurations. In those
cases, when the tracepoint is disabled, this adds a single NOP on the
fast path, with the conditional check staying out of line. The measured
text size increase in this configuration is +983 bytes.

For configurations where _raw_spin_unlock() is inlined, the
instrumentation does increase code size more noticeably
(+71 KB in my measurements), since the check and out of line call is
replicated at each call site.

This provides a generic release-side signal for contended locks,
allowing: correlation of lock holders with waiters and measurement of
contended hold times

This RFC addressing the same visibility gap without introducing per-lock
instrumentation.

If this tradeoff is acceptable, this could be a generic alternative to
lock-specific tracepoints.

[1]: https://lore.kernel.org/all/51aad0415b78c5a39f2029722118fa01eac77538.1773858853.git.d@xxxxxxxxxxxx