[PATCH v6 00/14] perf build: Reduce build time by nearly half
From: Ian Rogers
Date: Mon May 18 2026 - 00:48:22 EST
This patch series refactors Kbuild internals, BPF skeleton generation,
Python AST pre-computation, and foundational tooling dependencies across
the perf tool build system. By eliminating umbrella target synchronization
barriers, decoupling static library prerequisites, parallelizing single-core
script generators, and eradicating redundant feature checks, this series
unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
On a 28-core build workstation (make -j28 all from scratch), clean build
latency improves by over 44%:
Before:
real 0m29.006s
user 2m46.019s
sys 0m30.610s
After:
real 0m16.091s
user 2m40.135s
sys 0m25.740s
Saving 12.9 full seconds time per clean build. Furthermore, nothing to
build incremental builds are improved by nearly 7x:
Before:
real 0m11.528s
user 0m9.633s
sys 0m6.965s
After:
real 0m1.717s
user 0m1.682s
sys 0m0.960s
Summary of Patches:
1: Fast-Path Feature Detection
- Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
checks to group shell pipelines within curly braces and redirect both stdout
and stderr to .make.output before touching $@ purely upon success
(> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
ensures that compiler stderr is successfully captured in .make.output rather
than escaping to the parent shell. This perfectly matches standard Kbuild
feature check conventions and ensures the target files are touched on disk
purely upon success, allowing Kbuild to cache positive detections and avoid
continuous sub-make re-evaluations during incremental builds. Adds
test-bpftool-skeletons.bin to the clean FILES list and explicit source
prerequisite test-clang-bpf-co-re.c.
2-4: Flattening Umbrella Prepare Barriers
- builtin-trace embedded inclusions and pmu-events generation are completely
decoupled from the sequential "prepare" umbrella target, eliminating Make
AST double-parsing overhead and unchoking parallel compilation barriers.
5-7: Decoupling & Pre-generating BPF Skeletons
- BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
- Decouples bpftool bootstrap from top-level static libbpf dependencies,
attaching bpf-skel-prepare directly to the umbrella prepare target. This
allows Make to pre-compile bpftool and dump vmlinux.h in the background at
build startup, removing the 7-second serialization bottleneck before BPF
object compilation.
- Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
during make clean, and adds bpf-skel-prepare to .PHONY.
8-9: Foundational Linkage Optimization
- Moves static libsymbol library prerequisites out of the prepare step.
- Eliminates redundant libbpf sub-make feature checks during static builds.
10-11: jevents.py Concurrency & Deduplication
- Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
dedicated pmu-events-string.c compilation unit. This slices C compilation
latency in half by compiling string and struct tables simultaneously across
separate CPU cores while preserving zero dynamic ELF relocations. Adds
pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
locally inside output_string_file and output_file when split to prevent linkage
conflicts with empty-pmu-events.c, defers file closures to ensure identical
timestamps, and uses canonical Make 4.0 @: dependency chaining.
- Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
all available CPU cores using ProcessPoolExecutor (accelerating Python
execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
scope to ensure clean pickling under spawn multiprocessing start methods.
12: Out-of-Tree Incremental Rebuild Fix
- Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
Make from continuously re-executing script installation rules on already
built out-of-tree builds.
13-14: AST Parsing Optimization & Shell Fork Eradication
- Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
(=) to simply expanded assignment (:=) and replaces model_name/vendor_name
with pure GNU Make string functions. This guarantees Make executes directory
probing shell forks exactly once during AST parsing and evaluates path macros
purely in memory, completely eradicating over 7,800 redundant sub-processes
during out-of-tree build evaluation.
- Converts llvm-config shell queries in Makefile.config from recursive assignment
(=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
that were previously executed across object compilation dependency checks.
Changes since v5:
- perf pmu-events (Patch 10): Refactored jevents.py to explicitly close output_file
first and output_string_file second at the absolute tail of main(), guaranteeing
that pmu-events-string.c receives a filesystem modification timestamp strictly
greater than or equal to pmu-events.c. This completely eliminates any nanosecond
timestamp discrepancy during Python teardown, ensuring Make's canonical @:
dependency chaining rule correctly sees pmu-events-string.c as fully up to
date and preventing redundant recompilations during incremental builds without
requiring manual touch commands in the Makefile.
Ian Rogers (14):
tools build: Fix feature checks to touch target files on success
perf trace beauty: Make beauty generated C code standalone .o files
perf build: Decouple pmu-events from prepare umbrella target
perf build: Remove empty archheaders target
perf build: Move BPF skeleton generation out of Makefile.perf
perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
perf build: Pre-generate BPF skeleton tooling during umbrella prepare
phase
perf build: Move libsymbol dependency out of prepare step
perf build: Remove redundant libbpf feature check for static builds
perf pmu-events: Split big_c_string storage into standalone
compilation unit
perf pmu-events: Parallelize JSON and metric pre-computation in
jevents.py
perf build: Prefix SCRIPTS with output directory to fix continuous
rebuilds
perf pmu-events: Convert recursive shell assignments and macros to
Make built-ins
perf build: Convert llvm-config shell queries to simply expanded
variables
tools/build/feature/Makefile | 13 +-
tools/perf/.gitignore | 1 +
tools/perf/Build | 2 +
tools/perf/Makefile.config | 19 +-
tools/perf/Makefile.perf | 423 ++----------------
tools/perf/bench/Build | 6 +
.../bpf_skel/bench_uprobe.bpf.c | 0
tools/perf/bench/uprobe.c | 2 +-
tools/perf/bpf_skel.mak | 109 +++++
tools/perf/builtin-trace.c | 32 +-
tools/perf/pmu-events/Build | 26 +-
tools/perf/pmu-events/jevents.py | 59 ++-
tools/perf/trace/beauty/Build | 276 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c | 2 +
tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
tools/perf/trace/beauty/beauty.h | 60 +++
tools/perf/trace/beauty/eventfd.c | 6 +-
tools/perf/trace/beauty/fsconfig.c | 5 +
tools/perf/trace/beauty/futex_op.c | 5 +-
tools/perf/trace/beauty/futex_val3.c | 5 +-
tools/perf/trace/beauty/mmap.c | 24 +-
tools/perf/trace/beauty/mode_t.c | 6 +-
tools/perf/trace/beauty/msg_flags.c | 8 +-
tools/perf/trace/beauty/open_flags.c | 2 +
tools/perf/trace/beauty/perf_event_open.c | 21 +-
tools/perf/trace/beauty/pid.c | 5 +-
tools/perf/trace/beauty/sched_policy.c | 8 +-
tools/perf/trace/beauty/seccomp.c | 12 +-
tools/perf/trace/beauty/signum.c | 6 +-
tools/perf/trace/beauty/socket_type.c | 6 +-
.../perf/{util => trace/beauty}/syscalltbl.c | 0
.../perf/{util => trace/beauty}/syscalltbl.h | 0
tools/perf/trace/beauty/tracepoints/Build | 21 +
tools/perf/trace/beauty/waitid_options.c | 8 +-
tools/perf/util/Build | 17 +-
tools/perf/util/bpf-trace-summary.c | 2 +-
tools/perf/util/env.c | 4 -
tools/perf/util/env.h | 1 +
38 files changed, 688 insertions(+), 516 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
--
2.54.0.563.g4f69b47b94-goog