Re: [PATCH] locking/osq_lock: Use READ_ONCE() for node->prev

From: Waiman Long

Date: Mon May 18 2026 - 11:04:58 EST

On 5/18/26 10:29 AM, David Laight wrote:

On Mon, 18 May 2026 11:29:53 +0100
Will Deacon <will@xxxxxxxxxx> wrote:

On Mon, Mar 30, 2026 at 09:32:55AM +0800, Yu Peng wrote:

osq_lock() consults node->prev in the vcpu_is_preempted() heuristic while
a concurrent predecessor may update it via WRITE_ONCE(next->prev, prev)
during unqueue.

This read only affects the decision to abort optimistic spinning; stale
values do not affect queue linkage or lock correctness. Use READ_ONCE()
to mark the shared read and match the concurrent WRITE_ONCE() update.

No functional change intended.

Signed-off-by: Yu Peng <pengyu@xxxxxxxxxx>
---
kernel/locking/osq_lock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index b4233dc2c2b04..db4545e7bb72c 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -144,7 +144,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
* polling, be careful.
*/
if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
- vcpu_is_preempted(node_cpu(node->prev))))
+ vcpu_is_preempted(node_cpu(READ_ONCE(node->prev)))))
return true;

Hmm, I wonder whether this is actually sufficient...

Architectures with relaxed memory models won't order plain reads to
different addresses, so the read of 'node->locked' is unordered wrt the
read of 'node->prev' in this condition. Given that we're using
smp_cond_load_relaxed(), can we end up using a value of 'node->prev'
that was loaded in a previous iteration of the loop?

I'd be much more comfortable if this was smp_cond_load_acquire(), in
addition to the READ_ONCE() that you are proposing.

I've got a patch 'pending' for this file that tidied some things up.
In particular it saves the cpu numbers not the per-cpu addresses.
So the above check doesn't bounce a cache line.
Perhaps it is time to resend it.

Note that the vpu_is_preempted() path is horribly expensive and
can't actually work reliably.
And can't work at all on arm where 'mwait' (equivalent) is used.

I believe the vcpu_is_preempted() check is only enabled in x86, loongarch, powerpc and s390. It is disabled (always returns false) in other arches. Yes, the vcpu_is_preempted() check, if implemented, can be expensive. Perhaps we can reduce the frequency that it is being called.

Cheers,
Longman