Re: [PATCH v5 1/2] mm/process_vm_access: pidfd and nowait support for process_vm_readv/writev

From: Christian Brauner

Date: Thu Jun 04 2026 - 09:12:06 EST


On 2026-06-03 10:27 +0200, Alban Crequy wrote:
> Sashiko raised a question about pidfd_get_task() and PIDFD_THREAD [1],
> so I ran some tests to understand the behavior.
> [1] https://sashiko.dev/#/patchset/20260602100917.3641359-1-alban.crequy@xxxxxxxxx
>
> pidfd_get_task() always resolves pidfds using PIDTYPE_TGID (kernel/pid.c
> line 640), regardless of whether the pidfd was created with PIDFD_THREAD.
> This means:
>
> - A PIDFD_THREAD pidfd for a non-leader thread fails with ESRCH.
> - A regular pidfd for a process whose leader has exited (pthread_exit
> in main, secondary thread still alive) also fails with ESRCH.
>
> This is not specific to my patch: process_madvise() uses pidfd_get_task()
> in the same way and has the same behavior. I wrote a test program
> confirming this:
>
> https://github.com/alban/tests/tree/alban_pvm_flags/pvm_flags/pidfd_thread_test
>
> Results summary:
>
> All threads alive:
> pidfd_open(pid, 0) + process_vm_readv: OK
> pidfd_open(tid, PIDFD_THREAD) + process_vm_readv: OK (leader tid)
> pidfd_open(tid, PIDFD_THREAD) + process_vm_readv: ESRCH (non-leader)
>
> Leader thread exited (secondary still alive):
> pidfd_open(pid, 0) + process_vm_readv: ESRCH
> pidfd_open(pid, PIDFD_THREAD) + process_vm_readv: ESRCH
> pidfd_open(tid, PIDFD_THREAD) + process_vm_readv: ESRCH (non-leader)
> process_vm_readv(tid, flags=0) : OK (plain TID path)
>
> process_madvise() behaves identically in all cases above.
>
> For the non-leader thread case when all threads are alive, this is fine in
> practice: all threads share the same mm_struct, so profilers just use a regular
> pidfd for the thread-group leader.

This was an intentional limitation back then because pidfds only came in
thread-group flavor. I only added subthread pidfds much later.
pidfd_get_task() should drop the flags argument btw. I think that's
unused.

> However, the exited-leader case is a real limitation for profilers.
> OpenTelemetry eBPF Profiler wants to profile a process where the main thread
> has exited but secondary threads are still running [2].
> [2] https://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/376

If the thread-group leader exists before all of it's subthreads exit
then this is a broken program - even if it is a legal state. The
thread-group leader cannot be reaped while there are live subthreads and
it also means that any subthread exec "resurrects" the thread-group
leader struct pid. So that's going to make for fun profiling...

> Using plain TIDs (flags=0) would work, but it means users cannot use
> PROCESS_VM_PIDFD in this scenario.
>
> What do you think this patch should do? I see two options:
> - Address this limitation in a separate future patch that fixes
> pidfd_get_task() to use PIDTYPE_PID when PIDFD_THREAD is detected in
> f_flags, benefiting all callers (process_vm_readv, process_madvise,
> and any future users).

As long as all users of the interface are fine with operating on
subthreads this should be perfectly fine.