Re: [PATCH V3 05/17] perf/x86: Support XMM register for non-PEBS and REGS_USER
From: Liang, Kan
Date: Tue Aug 19 2025 - 11:55:22 EST
On 2025-08-19 6:39 a.m., Peter Zijlstra wrote:
> On Fri, Aug 15, 2025 at 02:34:23PM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
>> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>>
>> Collecting the XMM registers in a PEBS record has been supported since
>> the Icelake. But non-PEBS events don't support the feature. It's
>> possible to retrieve the XMM registers from the XSAVE for non-PEBS.
>> Add it to make the feature complete.
>>
>> To utilize the XSAVE, a 64-byte aligned buffer is required. Add a
>> per-CPU ext_regs_buf to store the vector registers. The size of the
>> buffer is ~2K. kzalloc_node() is used because there's a _guarantee_
>> that all kmalloc()'s with powers of 2 are naturally aligned and also
>> 64b aligned.
>>
>> Extend the support for both REGS_USER and REGS_INTR. For REGS_USER, the
>> perf_get_regs_user() returns the regs from the task_pt_regs(current),
>> which is struct pt_regs. Need to move it to local struct x86_perf_regs
>> x86_user_regs.
>> For PEBS, the HW support is still preferred. The XMM should be retrieved
>> from PEBS records.
>>
>> There could be more vector registers supported later. Add ext_regs_mask
>> to track the supported vector register group.
>
>
> I'm a little confused... *again* :-)
>
> Specifically, we should consider two sets of registers:
>
> - the live set, as per the CPU (XSAVE)
> - the stored set, as per x86_task_fpu()
>
> regs_intr should always get a copy of the live set; however
> regs_user should not. It might need a copy of the x86_task_fpu() instead
> of the live set, depending on TIF_NEED_FPU_LOAD (more or less, we need
> another variable set in kernel_fpu_begin_mask() *after*
> save_fpregs_to_fpstate() is completed).
>
> I don't see this code make this distinction.
>
> Consider getting a sample while the kernel is doing some avx enhanced
> crypto and such.
The regs_user only needs a set when the NMI hits the user mode
(user_mode(regs)) or a non-kernel thread (!(current->flags &
PF_KTHREAD)). The live set is good enough for both cases.
I think the kernel crypto should be to a kernel thread (current->flags &
PF_KTHREAD). If so, the regs_user should return NULL.
Thanks,
Kan