Re: [RFC PATCH] ptrace: don't report syscall-exit if the tracee was killed by seccomp

From: Kees Cook

Date: Sun Mar 22 2026 - 10:47:58 EST




On March 22, 2026 6:44:54 AM PDT, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>__seccomp_filter() does
>
> case SECCOMP_RET_KILL_THREAD:
> case SECCOMP_RET_KILL_PROCESS:
> ...
> /* Show the original registers in the dump. */
> syscall_rollback(current, current_pt_regs());
>
> /* Trigger a coredump with SIGSYS */
> force_sig_seccomp(this_syscall, data, true);
>
>syscall_rollback() does regs->ax == orig_ax. This means that
>ptrace_get_syscall_info_exit() will see .is_error == 0. To the tracer,
>it looks as if the aborted syscall actually succeeded and returned its
>own syscall number.
>
>And since force_sig_seccomp() uses force_coredump == true, SIGSYS won't
>be reported (see the SA_IMMUTABLE check in get_signal()), so the tracee
>will "silently" exit with error_code == SIGSYS after the bogus report.
>
>Change syscall_exit_work() to avoid the bogus single-step/syscall-exit
>reports if the tracee is SECCOMP_MODE_DEAD.
>
>TODO: With or without this change, get_signal() -> ptrace_signal() may
>report other !SA_IMMUTABLE pending signals before it dequeues SIGSYS.
>Perhaps it makes sense to change get_signal() to check SECCOMP_MODE_DEAD
>too and prioritize the fatal SIGSYS.
>
>Reported-by: Max Ver <dudududumaxver@xxxxxxxxx>
>Closes: https://lore.kernel.org/all/CABjJbFJO+p3jA1r0gjUZrCepQb1Fab3kqxYhc_PSfoqo21ypeQ@xxxxxxxxxxxxxx/
>Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
>---
> include/linux/entry-common.h | 3 +++
> include/linux/seccomp.h | 8 ++++++++
> kernel/seccomp.c | 3 ---
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
>diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
>index f83ca0abf2cd..5c62bda9dcf9 100644
>--- a/include/linux/entry-common.h
>+++ b/include/linux/entry-common.h
>@@ -250,6 +250,9 @@ static __always_inline void syscall_exit_work(struct pt_regs *regs, unsigned lon
> if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> trace_syscall_exit(regs, syscall_get_return_value(current, regs));
>
>+ if (killed_by_seccomp(current))
>+ return;

Hmm. I'm still not convinced this is right, but if we make this change, I'd want to see a behavioral test added (likely to the seccomp self tests), and to make sure the rr test suite doesn't regress. It's traditionally been the most sensitive to these kinds of notification ordering/behavior changes.

-Kees

>+
> step = report_single_step(work);
> if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
> arch_ptrace_report_syscall_exit(regs, step);
>diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>index 9b959972bf4a..e95a251955c1 100644
>--- a/include/linux/seccomp.h
>+++ b/include/linux/seccomp.h
>@@ -22,6 +22,12 @@
> #include <linux/atomic.h>
> #include <asm/seccomp.h>
>
>+/* Not exposed in uapi headers: internal use only. */
>+#define SECCOMP_MODE_DEAD (SECCOMP_MODE_FILTER + 1)
>+
>+#define killed_by_seccomp(task) \
>+ ((task)->seccomp.mode == SECCOMP_MODE_DEAD)
>+
> extern int __secure_computing(void);
>
> #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
>@@ -49,6 +55,8 @@ static inline int seccomp_mode(struct seccomp *s)
>
> struct seccomp_data;
>
>+#define killed_by_seccomp(task) 0
>+
> #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
> static inline int secure_computing(void) { return 0; }
> #else
>diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>index 066909393c38..461eb15c66c3 100644
>--- a/kernel/seccomp.c
>+++ b/kernel/seccomp.c
>@@ -31,9 +31,6 @@
>
> #include <asm/syscall.h>
>
>-/* Not exposed in headers: strictly internal use only. */
>-#define SECCOMP_MODE_DEAD (SECCOMP_MODE_FILTER + 1)
>-
> #ifdef CONFIG_SECCOMP_FILTER
> #include <linux/file.h>
> #include <linux/filter.h>

--
Kees Cook