Re: Race condition in __modify_ftrace_direct() between tmp_ops registration and direct_functions hash update

From: Steven Rostedt

Date: Sun May 17 2026 - 12:53:59 EST


[ RESEND - I didn't realize you replied to me privately. Adding back Cc list ]

On Sun, 17 May 2026 15:16:17 +0000
Afi0 <capyenglishlite@xxxxxxxxx> wrote:

> Hi Steven,
>
> Thanks for the detailed feedback, and for adding Jiri.
>
> You're right to challenge this. Let me clarify the exact scenario:
>
> The race is not about direct being NULL before assignment. The issue arises
> specifically in the *modification* path where an existing non-NULL direct
> is being replaced:
>
> 1. Caller holds a valid trampoline at address old_addr
> 2. Caller calls modify_ftrace_direct(ops, new_addr)
> 3. __modify_ftrace_direct() registers tmp_ops -> ftrace starts using
> tmp_ops
> 4. *Window opens:* CPUs entering traced function read entry -> direct =
> old_addr via ftrace_find_rec_direct()
> 5. Caller, believing the update is complete after modify_ftrace_direct()
> returns, frees old_addr
> 6. entry->direct = new_addr executes - too late, CPUs already jumped to
> freed memory
>
> The key assumption being violated: the caller cannot know when it is safe
> to free old_addr because modify_ftrace_direct() returns before entry ->
> direct is updated. The API implies atomicity that isn't guaranteed.

But __modify_ftrace_direct() calls unregister_ftrace_function(&tmp_ops).

Hmm, tmp_ops being static may be considered part of the core kernel in
which case the FTRACE_OPS_DYNAMIC is not set and the synchronization
will not be called from the unregister function.

>
> If the convention is that callers *must* never free the old trampoline
> until some explicit synchronization point after modify_ftrace_direct()
> returns, then you're correct that this is a caller bug rather than a bug in
> __modify_ftrace_direct() itself. Could you point me to documentation of
> this requirement? I may have misread the contract.

I'll let Jiri answer this part, but it does seem that there should be a
synchronization to make sure that the code is freed. BPF is the only
user of this, and this is a new feature.

Jiri, if the modify_ftrace_direct() is used to change the trampoline,
what synchronization is done to make make sure it's not called before
being freed?

-- Steve