[GIT PULL] tracing: Fixes for 7.0
From: Steven Rostedt
Date: Sun Mar 22 2026 - 11:54:50 EST
Linus,
tracing fixes for 7.0:
- Revert "tracing: Remove pid in task_rename tracing output"
A change was made to remove the pid field from the task_rename event
because it was thought that it was always done for the current task and
recording the pid would be redundant. This turned out to be incorrect and
there are a few corner case where this is not true and caused some
regressions in tooling.
- Fix the reading from user space for migration
The reading of user space uses a seq lock type of logic where it uses a
per-cpu temporary buffer and disables migration, then enables preemption,
does the copy from user space, disables preemption, enables migration and
checks if there was any schedule switches while preemption was enabled. If
there was a context switch, then it is considered that the per-cpu buffer
could be corrupted and it tries again. There's a protection check that
tests if it takes a hundred tries, it issues a warning and exits out to
prevent a live lock.
This was triggered because the task was selected by the load balancer to
be migrated to another CPU, every time preemption is enabled the migration
task would schedule in try to migrate the task but can't because migration
is disabled and let it run again. This caused the scheduler to schedule out
the task every time it enabled preemption and made the loop never exit
(until the 100 iteration test triggered).
Fix this by enabling and disabling preemption and keeping migration
enabled if the reading from user space needs to be done again. This will
let the migration thread migrate the task and the copy from user space
will likely pass on the next iteration.
- Fix trace_marker copy option freeing
The "copy_trace_marker" option allows a tracing instance to get a copy of
a write to the trace_marker file of the top level instance. This is
managed by a link list protected by RCU. When an instance is removed, a
check is made if the option is set, and if so synchronized_rcu() is
called. The problem is that an iteration is made to reset all the flags to
what they were when the instance was created (to perform clean ups) was
done before the check of the copy_trace_marker option and that option was
cleared, so the synchronize_rcu() was never called.
Move the clearing of all the flags after the check of copy_trace_marker to
do synchronize_rcu() so that the option is still set if it was before and
the synchronization is performed.
- Fix entries setting when validating the persistent ring buffer
When validating the persistent ring buffer on boot up, the number of
events per sub-buffer is added to the sub-buffer meta page. The validator
was updating cpu_buffer->head_page (the first sub-buffer of the per-cpu
buffer) and not the "head_page" variable that was iterating the
sub-buffers. This was causing the first sub-buffer to be assigned the
entries for each sub-buffer and not the sub-buffer that was supposed to be
updated.
- Use "hash" value to update the direct callers
When updating the ftrace direct callers, it assigned a temporary callback
to all the callback functions of the ftrace ops and not just the
functions represented by the passed in hash. This causes an unnecessary
slow down of the functions of the ftrace_ops that is not being modified.
Only update the functions that are going to be modified to call the
ftrace loop function so that the update can be made on those functions.
Please pull the latest trace-v7.0-rc4 tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace-v7.0-rc4
Tag SHA1: c1d7d0804e221b6c0789184efcb354ccea104f2f
Head SHA1: 50b35c9e50a865600344ab1d8f9a8b3384d7e63d
Jiri Olsa (1):
ftrace: Use hash argument for tmp_ops in update_ftrace_direct_mod
Masami Hiramatsu (Google) (1):
ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
Steven Rostedt (2):
tracing: Fix failure to read user space from system call trace events
tracing: Fix trace_marker copy link list updates
Xuewen Yan (1):
tracing: Revert "tracing: Remove pid in task_rename tracing output"
----
include/trace/events/task.h | 7 +++++--
kernel/trace/ftrace.c | 4 ++--
kernel/trace/ring_buffer.c | 2 +-
kernel/trace/trace.c | 36 +++++++++++++++++++++++++++---------
4 files changed, 35 insertions(+), 14 deletions(-)
---------------------------
diff --git a/include/trace/events/task.h b/include/trace/events/task.h
index 4f0759634306..b9a129eb54d9 100644
--- a/include/trace/events/task.h
+++ b/include/trace/events/task.h
@@ -38,19 +38,22 @@ TRACE_EVENT(task_rename,
TP_ARGS(task, comm),
TP_STRUCT__entry(
+ __field( pid_t, pid)
__array( char, oldcomm, TASK_COMM_LEN)
__array( char, newcomm, TASK_COMM_LEN)
__field( short, oom_score_adj)
),
TP_fast_assign(
+ __entry->pid = task->pid;
memcpy(entry->oldcomm, task->comm, TASK_COMM_LEN);
strscpy(entry->newcomm, comm, TASK_COMM_LEN);
__entry->oom_score_adj = task->signal->oom_score_adj;
),
- TP_printk("oldcomm=%s newcomm=%s oom_score_adj=%hd",
- __entry->oldcomm, __entry->newcomm, __entry->oom_score_adj)
+ TP_printk("pid=%d oldcomm=%s newcomm=%s oom_score_adj=%hd",
+ __entry->pid, __entry->oldcomm,
+ __entry->newcomm, __entry->oom_score_adj)
);
/**
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8df69e702706..413310912609 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6606,9 +6606,9 @@ int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace_hash *hash, b
if (!orig_hash)
goto unlock;
- /* Enable the tmp_ops to have the same functions as the direct ops */
+ /* Enable the tmp_ops to have the same functions as the hash object. */
ftrace_ops_init(&tmp_ops);
- tmp_ops.func_hash = ops->func_hash;
+ tmp_ops.func_hash->filter_hash = hash;
err = register_ftrace_function_nolock(&tmp_ops);
if (err)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 17d0ea0cc3e6..170170bd83bd 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2053,7 +2053,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
entries += ret;
entry_bytes += local_read(&head_page->page->commit);
- local_set(&cpu_buffer->head_page->entries, ret);
+ local_set(&head_page->entries, ret);
if (head_page == cpu_buffer->commit_page)
break;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ebd996f8710e..a626211ceb9a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -555,7 +555,7 @@ static bool update_marker_trace(struct trace_array *tr, int enabled)
lockdep_assert_held(&event_mutex);
if (enabled) {
- if (!list_empty(&tr->marker_list))
+ if (tr->trace_flags & TRACE_ITER(COPY_MARKER))
return false;
list_add_rcu(&tr->marker_list, &marker_copies);
@@ -563,10 +563,10 @@ static bool update_marker_trace(struct trace_array *tr, int enabled)
return true;
}
- if (list_empty(&tr->marker_list))
+ if (!(tr->trace_flags & TRACE_ITER(COPY_MARKER)))
return false;
- list_del_init(&tr->marker_list);
+ list_del_rcu(&tr->marker_list);
tr->trace_flags &= ~TRACE_ITER(COPY_MARKER);
return true;
}
@@ -6783,6 +6783,23 @@ char *trace_user_fault_read(struct trace_user_buf_info *tinfo,
*/
do {
+ /*
+ * It is possible that something is trying to migrate this
+ * task. What happens then, is when preemption is enabled,
+ * the migration thread will preempt this task, try to
+ * migrate it, fail, then let it run again. That will
+ * cause this to loop again and never succeed.
+ * On failures, enabled and disable preemption with
+ * migration enabled, to allow the migration thread to
+ * migrate this task.
+ */
+ if (trys) {
+ preempt_enable_notrace();
+ preempt_disable_notrace();
+ cpu = smp_processor_id();
+ buffer = per_cpu_ptr(tinfo->tbuf, cpu)->buf;
+ }
+
/*
* If for some reason, copy_from_user() always causes a context
* switch, this would then cause an infinite loop.
@@ -9744,18 +9761,19 @@ static int __remove_instance(struct trace_array *tr)
list_del(&tr->list);
- /* Disable all the flags that were enabled coming in */
- for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) {
- if ((1ULL << i) & ZEROED_TRACE_FLAGS)
- set_tracer_flag(tr, 1ULL << i, 0);
- }
-
if (printk_trace == tr)
update_printk_trace(&global_trace);
+ /* Must be done before disabling all the flags */
if (update_marker_trace(tr, 0))
synchronize_rcu();
+ /* Disable all the flags that were enabled coming in */
+ for (i = 0; i < TRACE_FLAGS_MAX_SIZE; i++) {
+ if ((1ULL << i) & ZEROED_TRACE_FLAGS)
+ set_tracer_flag(tr, 1ULL << i, 0);
+ }
+
tracing_set_nop(tr);
clear_ftrace_function_probes(tr);
event_trace_del_tracer(tr);