Re: [PATCH 15/14] selftests: futex: Add tests for robust unlock within the critical section.

From: André Almeida

Date: Wed May 27 2026 - 22:55:33 EST


Em 04/04/2026 06:39, Sebastian Andrzej Siewior escreveu:
From: Sebastian Andrzej Siewior <sebastian@xxxxxxxxxxxxx>

I took Thomas’ initial test case from the cover letter and reworked it
so that it uses ptrace() to single‑step through the VDSO unlock
operation. The test expects the lock to remain locked, with
`list_op_pending' pointing somewhere, when entering the VDSO unlock
path. Once execution steps into the critical section, it expects the
kernel to perform the fixup that is, to unlock the lock and clear
`list_op_pending'.

The test requires VDSO debug symbols, typically provided by
vdso64.so.dbg or vdso32.so.dbg. It attempts to locate the appropriate
file automatically, but the user may override this by setting the
VDSO_DBG environment variable. If neither method succeeds, libelf falls
back to its usual lookup mechanism under /usr/lib/debug/.build-id/

Signed-off-by: Sebastian Andrzej Siewior <sebastian@xxxxxxxxxxxxx>

[...]

+
+ } else if (state == STATE_IN_CS) {
+ /*
+ * If the critical section has been entered then
+ * the kernel has to unlock and clean list_op_pending.
+ * On 32bit the pointer is just 32bit wide, the
+ * upper 32bit are cleaned on 64bit.
+ */
+ if (is_32bit)
+ rhead_val &= 0xffffffff;
+
+ ASSERT_EQ(rhead_val, 0);
+ ASSERT_EQ(lock_val, 0);
+ }

It turns out that the test success I saw with my aarch64 implementation was a false positive :/ There's no logic to verify if the code really enters the critical section. If the code just jump over it, the test never checks if lock_val and rhead_val are actually zeroed.

+
+ if (ptrace(PTRACE_SINGLESTEP, child, 0, 0))
+ err(1, "PTRACE_SINGLESTEP");

After I fixed my code, the selftest got to an infinity loop (maybe we should add max steps?). The single steps doesn't work for LL/SC locks, like this one:

retry:
ldxr %w[val], %[lock]
cmp %w[tid], %w[val]
bne end
stlxr %w[result], wzr, %[lock]
cbnz %w[result], retry
end:


The single step with ptrace() causes a context switch that clear the exclusive monitor[1], so store fails and the code branches to retry. We need to jump straight to `cbnz %w[result], retry`. I tested to single step with GDB, and it turns outs that it is smart enough to run the code from ldxr to stlxr "atomically", to avoid messing with the exclusive monitor and then it worked as expected.

[1] https://developer.arm.com/documentation/dht0008/a/arm-synchronization-primitives/exclusive-accesses/exclusive-monitors
[2] https://github.com/gnutools/binutils-gdb/blob/aa5685c0fa9f299ae0f94e537a1f55991c972e9c/gdb/aarch64-tdep.c#L3514