[PATCH v4 00/14] perf build: Reduce build time by nearly half
From: Ian Rogers
Date: Fri May 15 2026 - 14:21:30 EST
This patch series refactors Kbuild internals, BPF skeleton generation,
Python AST pre-computation, and foundational tooling dependencies across
the perf tool build system. By eliminating umbrella target synchronization
barriers, decoupling static library prerequisites, parallelizing single-core
script generators, and eradicating redundant feature checks, this series
unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
On a 28-core build workstation (make -j28 all from scratch), clean build
latency improves by over 44%:
Before:
real 0m29.006s
user 2m46.019s
sys 0m30.610s
After:
real 0m16.091s
user 2m40.135s
sys 0m25.740s
Saving 12.9 full seconds time per clean build. Furthermore, nothing to
build incremental builds are improved by nearly 7x:
Before:
real 0m11.528s
user 0m9.633s
sys 0m6.965s
After:
real 0m1.717s
user 0m1.682s
sys 0m0.960s
Summary of Patches:
1: Fast-Path Feature Detection
- Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
checks to redirect grep output to .make.output and touch $@ upon success
(> $(@:.bin=.make.output) 2>&1 && touch $@). This perfectly matches standard
Kbuild feature check conventions and ensures the target files are touched on
disk purely upon success, allowing Kbuild to cache positive detections and avoid
continuous sub-make re-evaluations during incremental builds. For
test-clang-bpf-co-re.bin, adds explicit source prerequisite test-clang-bpf-co-re.c
and simplifies Clang recipe using $<.
2-4: Flattening Umbrella Prepare Barriers
- builtin-trace embedded inclusions and pmu-events generation are completely
decoupled from the sequential "prepare" umbrella target, eliminating Make
AST double-parsing overhead and unchoking parallel compilation barriers.
5-7: Decoupling & Pre-generating BPF Skeletons
- BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
- Decouples bpftool bootstrap from top-level static libbpf dependencies,
attaching bpf-skel-prepare directly to the umbrella prepare target. This
allows Make to pre-compile bpftool and dump vmlinux.h in the background at
build startup, removing the 7-second serialization bottleneck before BPF
object compilation.
- Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
during make clean, and adds bpf-skel-prepare to .PHONY.
8-9: Foundational Linkage Optimization
- Moves static libsymbol library prerequisites out of the prepare step.
- Eliminates redundant libbpf sub-make feature checks during static builds.
10-11: jevents.py Concurrency & Deduplication
- Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
dedicated pmu-events-string.c compilation unit. This slices C compilation
latency in half by compiling string and struct tables simultaneously across
separate CPU cores while preserving zero dynamic ELF relocations. Adds
pmu-events-string.c to .gitignore, includes pmu-events.h for global extern
declarations, defers file closures to ensure identical timestamps, and uses
Make 4.0 compatible dependency chaining with robust self-correction checks.
- Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
all available CPU cores using ProcessPoolExecutor (accelerating Python
execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
scope to ensure clean pickling under spawn multiprocessing start methods.
12: Out-of-Tree Incremental Rebuild Fix
- Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
Make from continuously re-executing script installation rules on already
built out-of-tree builds.
13-14: AST Parsing Optimization & Shell Fork Eradication
- Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
(=) to simply expanded assignment (:=) and replaces model_name/vendor_name
with pure GNU Make string functions. This guarantees Make executes directory
probing shell forks exactly once during AST parsing and evaluates path macros
purely in memory, completely eradicating over 7,800 redundant sub-processes
during out-of-tree build evaluation.
- Converts llvm-config shell queries in Makefile.config from recursive assignment
(=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
that were previously executed across object compilation dependency checks.
Changes since v3:
- Streamlined series to 14 patches by dropping Patches 1, 2, and 9 to focus on
the most uncontroversial, high-impact architectural gains.
- tools build (Patch 1): Refactored test-bpftool-skeletons.bin and
test-clang-bpf-co-re.bin feature check recipes to match standard Kbuild
conventions by redirecting grep output to .make.output and touching $@ upon
success (> $(@:.bin=.make.output) 2>&1 && touch $@). Added explicit source
file prerequisite test-clang-bpf-co-re.c and simplified Clang recipe using $<.
- perf build (Patch 7): Fixed missing prerequisite on bpf-skel-prepare in
bpf_skel.mak by making it depend directly on explicit $(BPFTOOL) $(VMLINUX_H)
prerequisites, preventing it from executing as a no-op during prepare.
- perf pmu-events (Patch 10): Added extern const char big_c_string[]; declaration
to pmu-events.h and included it in output_string_file to satisfy Clang
-Wmissing-variable-declarations compiler warnings. Deferred closing
output_string_file until the absolute tail of main() to ensure identical
timestamps with output_file, preventing redundant incremental rebuilds. Updated
the secondary target rule in pmu-events/Build to verify the file exists on disk
and force a rebuild if manually deleted, ensuring 100% self-correcting builds.
Ian Rogers (14):
tools build: Fix feature checks to touch target files on success
perf trace beauty: Make beauty generated C code standalone .o files
perf build: Decouple pmu-events from prepare umbrella target
perf build: Remove empty archheaders target
perf build: Move BPF skeleton generation out of Makefile.perf
perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
perf build: Pre-generate BPF skeleton tooling during umbrella prepare
phase
perf build: Move libsymbol dependency out of prepare step
perf build: Remove redundant libbpf feature check for static builds
perf pmu-events: Split big_c_string storage into standalone
compilation unit
perf pmu-events: Parallelize JSON and metric pre-computation in
jevents.py
perf build: Prefix SCRIPTS with output directory to fix continuous
rebuilds
perf pmu-events: Convert recursive shell assignments and macros to
Make built-ins
perf build: Convert llvm-config shell queries to simply expanded
variables
tools/build/feature/Makefile | 8 +-
tools/perf/.gitignore | 1 +
tools/perf/Build | 2 +
tools/perf/Makefile.config | 19 +-
tools/perf/Makefile.perf | 423 ++----------------
tools/perf/bench/Build | 6 +
.../bpf_skel/bench_uprobe.bpf.c | 0
tools/perf/bench/uprobe.c | 2 +-
tools/perf/bpf_skel.mak | 109 +++++
tools/perf/builtin-trace.c | 32 +-
tools/perf/pmu-events/Build | 26 +-
tools/perf/pmu-events/jevents.py | 57 ++-
tools/perf/pmu-events/pmu-events.h | 2 +
tools/perf/trace/beauty/Build | 276 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c | 2 +
tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
tools/perf/trace/beauty/beauty.h | 60 +++
tools/perf/trace/beauty/eventfd.c | 6 +-
tools/perf/trace/beauty/fsconfig.c | 5 +
tools/perf/trace/beauty/futex_op.c | 5 +-
tools/perf/trace/beauty/futex_val3.c | 5 +-
tools/perf/trace/beauty/mmap.c | 24 +-
tools/perf/trace/beauty/mode_t.c | 6 +-
tools/perf/trace/beauty/msg_flags.c | 8 +-
tools/perf/trace/beauty/open_flags.c | 2 +
tools/perf/trace/beauty/perf_event_open.c | 21 +-
tools/perf/trace/beauty/pid.c | 5 +-
tools/perf/trace/beauty/sched_policy.c | 8 +-
tools/perf/trace/beauty/seccomp.c | 12 +-
tools/perf/trace/beauty/signum.c | 6 +-
tools/perf/trace/beauty/socket_type.c | 6 +-
.../perf/{util => trace/beauty}/syscalltbl.c | 0
.../perf/{util => trace/beauty}/syscalltbl.h | 0
tools/perf/trace/beauty/tracepoints/Build | 21 +
tools/perf/trace/beauty/waitid_options.c | 8 +-
tools/perf/util/Build | 17 +-
tools/perf/util/bpf-trace-summary.c | 2 +-
tools/perf/util/env.c | 4 -
tools/perf/util/env.h | 1 +
39 files changed, 685 insertions(+), 514 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
--
2.54.0.563.g4f69b47b94-goog