Re: [PATCH v2] perf stat: Fix crash on arm64
From: Ian Rogers
Date: Wed Mar 25 2026 - 14:26:16 EST
On Wed, Mar 25, 2026 at 3:25 AM Breno Leitao <leitao@xxxxxxxxxx> wrote:
>
> Perf stat is crashing on arm64 hosts with the following issue:
>
> # make -C tools/perf DEBUG=1
> # perf stat sleep 1
> perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> [1] 1220794 IOT instruction (core dumped) ./perf stat
>
> The sorting function introduced by commit a745c0831c15c ("perf stat:
> Sort default events/metrics") compares events based on their individual
> properties. This can cause events from different groups to be
> interleaved, resulting in group members appearing before their leaders
> in the sorted evlist.
>
> When the iterator opens events in list order, a group member may be
> processed before its leader has been opened.
>
> For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> could be sorted before its leader, causing the crash when CPU_CYCLES
> tries to get its group fd from the not-yet-opened leader.
>
> Fix this by comparing events based on their leader's attributes instead
> of their own attributes when the events are in different groups. This
> ensures all members of a group share the same sort key as their leader,
> keeping groups together and guaranteeing leaders are opened before their
> members.
>
> Reported-by: Denis Yaroshevskiy <dyaroshev@xxxxxxxx>
> Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
> Tested-by: Dmitry Ilvokhin <d@xxxxxxxxxxxx>
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
Ah, this is the separate sorting for perf stat output and not the
parse-events sorting. This sorting happens after we've regrouped for
PMUs, etc. and so the invariants I expect shouldn't be broken by the
change. Fwiw, the Intel hybrid sorting isn't impacted:
Before:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
22,355 context-switches # 792.6
cs/sec cs_per_second
28,205.79 msec cpu-clock # 27.6
CPUs CPUs_utilized
477 cpu-migrations # 16.9
migrations/sec migrations_per_second
294 page-faults # 10.4
faults/sec page_faults_per_second
1,796,969 cpu_core/branch-misses/ # 2.0 %
branch_miss_rate
90,819,614 cpu_core/branches/ # 3.2
M/sec branch_frequency
465,688,306 cpu_core/cpu-cycles/ # 0.0
GHz cycles_frequency
435,799,882 cpu_core/instructions/ # 0.9
instructions insn_per_cycle
2,239,373 cpu_atom/branch-misses/ # 4.6 %
branch_miss_rate (49.78%)
48,356,802 cpu_atom/branches/ # 1.7
M/sec branch_frequency (50.17%)
477,205,378 cpu_atom/cpu-cycles/ # 0.0
GHz cycles_frequency (50.37%)
236,677,090 cpu_atom/instructions/ # 0.5
instructions insn_per_cycle (50.35%)
TopdownL1 (cpu_core) # 7.1 %
tma_bad_speculation
# 40.2 %
tma_frontend_bound
# 35.7 %
tma_backend_bound
# 17.1 %
tma_retiring
TopdownL1 (cpu_atom) # 32.7 %
tma_backend_bound (59.74%)
# 37.8 %
tma_frontend_bound (59.54%)
# 17.2 %
tma_bad_speculation
# 12.3 %
tma_retiring (59.62%)
1.006767726 seconds time elapsed
```
After:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
21,329 context-switches # 758.5
cs/sec cs_per_second
28,120.76 msec cpu-clock # 27.7
CPUs CPUs_utilized
482 cpu-migrations # 17.1
migrations/sec migrations_per_second
217 page-faults # 7.7
faults/sec page_faults_per_second
1,606,877 cpu_core/branch-misses/ # 1.8 %
branch_miss_rate
90,472,412 cpu_core/branches/ # 3.2
M/sec branch_frequency
459,566,033 cpu_core/cpu-cycles/ # 0.0
GHz cycles_frequency
482,577,042 cpu_core/instructions/ # 1.1
instructions insn_per_cycle
2,430,046 cpu_atom/branch-misses/ # 7.0 %
branch_miss_rate (49.85%)
35,031,345 cpu_atom/branches/ # 1.2
M/sec branch_frequency (50.20%)
494,493,558 cpu_atom/cpu-cycles/ # 0.0
GHz cycles_frequency (50.22%)
190,397,854 cpu_atom/instructions/ # 0.4
instructions insn_per_cycle (50.24%)
TopdownL1 (cpu_core) # 6.6 %
tma_bad_speculation
# 35.9 %
tma_frontend_bound
# 38.2 %
tma_backend_bound
# 19.3 %
tma_retiring
TopdownL1 (cpu_atom) # 30.2 %
tma_backend_bound (59.74%)
# 38.8 %
tma_frontend_bound (59.71%)
# 21.7 %
tma_bad_speculation
# 9.3 %
tma_retiring (59.68%)
1.005096844 seconds time elapsed
```
Tested-by: Ian Rogers <irogers@xxxxxxxxxx>
Thanks,
Ian
> ---
> Cc; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> ---
> Changes in v2:
> - No changes from V1, just resending the exact same patch.
> - Link to v1: https://patch.msgid.link/20260205-perf_stat-v1-1-e433b0c918af@xxxxxxxxxx
> ---
> tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 73c2ba7e30760..a97b9d0de3f58 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1917,25 +1917,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
> const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
> const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
> const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> + const struct evsel *lhs_leader = evsel__leader(lhs);
> + const struct evsel *rhs_leader = evsel__leader(rhs);
>
> - if (evsel__leader(lhs) == evsel__leader(rhs)) {
> + if (lhs_leader == rhs_leader) {
> /* Within the same group, respect the original order. */
> return lhs_core->idx - rhs_core->idx;
> }
>
> + /*
> + * Compare using leader's attributes so that all members of a group
> + * stay together. This ensures leaders are opened before their members.
> + */
> +
> /* Sort default metrics evsels first, and default show events before those. */
> - if (lhs->default_metricgroup != rhs->default_metricgroup)
> - return lhs->default_metricgroup ? -1 : 1;
> + if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
> + return lhs_leader->default_metricgroup ? -1 : 1;
>
> - if (lhs->default_show_events != rhs->default_show_events)
> - return lhs->default_show_events ? -1 : 1;
> + if (lhs_leader->default_show_events != rhs_leader->default_show_events)
> + return lhs_leader->default_show_events ? -1 : 1;
>
> /* Sort by PMU type (prefers legacy types first). */
> - if (lhs->pmu != rhs->pmu)
> - return lhs->pmu->type - rhs->pmu->type;
> + if (lhs_leader->pmu != rhs_leader->pmu)
> + return lhs_leader->pmu->type - rhs_leader->pmu->type;
>
> - /* Sort by name. */
> - return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
> + /* Sort by leader's name. */
> + return strcmp(evsel__name((struct evsel *)lhs_leader),
> + evsel__name((struct evsel *)rhs_leader));
> }
>
> /*
>
> ---
> base-commit: 85964cdcad0fac9a0eb7b87a0f9d88cc074b854c
> change-id: 20260205-perf_stat-a0a2a37e21c5
>
> Best regards,
> --
> Breno Leitao <leitao@xxxxxxxxxx>
>