Re: [PATCH] x86: Implement _THIS_IP_ using inline asm for 32-bit

From: H. Peter Anvin

Date: Thu May 21 2026 - 06:54:13 EST


On May 21, 2026 12:08:01 AM PDT, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>On Thu, May 21, 2026 at 02:00:09AM +0200, Marco Elver wrote:
>> Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to
>> be broken:
>>
>> #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
>>
>> In particular, the address of a label is only expected to be used with a
>> computed goto.
>>
>> While the generic version more or less works today, it is known to be
>> brittle and may break with current and future optimizations. For
>> example, Clang -O2 always returns 1 when this function is inlined:
>>
>> static inline unsigned long get_ip(void)
>> { return ({ __label__ __here; __here: (unsigned long)&&__here; }); }
>>
>
>Oh gawd :/
>
>> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1]
>> Link: https://github.com/llvm/llvm-project/issues/138272 [2]
>> Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
>> ---
>> arch/x86/include/asm/linkage.h | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/linkage.h b/arch/x86/include/asm/linkage.h
>> index a7294656ad90..bce3c6f4b94f 100644
>> --- a/arch/x86/include/asm/linkage.h
>> +++ b/arch/x86/include/asm/linkage.h
>> @@ -13,11 +13,12 @@
>> * The generic version tends to create spurious ENDBR instructions under
>> * certain conditions.
>> */
>> -#define _THIS_IP_ ({ unsigned long __here; asm ("lea 0(%%rip), %0" : "=r" (__here)); __here; })
>> +#define _THIS_IP_ ({ unsigned long __here; asm volatile("lea 0(%%rip), %0" : "=r" (__here)); __here; })
>> #endif
>>
>> #ifdef CONFIG_X86_32
>> #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
>> +#define _THIS_IP_ ({ unsigned long __ip; asm volatile("call 1f\n1: pop %0" : "=r" (__ip)); __ip; })
>
>This will mess up the RSB and cause bad performance ripple effects for a
>bit each use. Now, I don't think anybody still cares about performance
>on 32bit (I certainly don't), so perhaps this is fine. But urgh.

Most microarchitectures do *not* have a problem with call/pop, as they know that call with a zero offset is not going to return. The main exception was the Pentium 4.