Re: [PATCH v3 00/17] perf build: Reduce build time by nearly half

From: Ian Rogers

Date: Fri May 15 2026 - 12:59:07 EST


On Thu, May 14, 2026 at 3:23 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Thu, May 14, 2026 at 3:06 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > On Thu, May 14, 2026 at 09:33:52AM -0700, Ian Rogers wrote:
> > > This patch series refactors Kbuild internals, BPF skeleton generation,
> > > Python AST pre-computation, and foundational tooling dependencies across
> > > the perf tool build system. By eliminating umbrella target synchronization
> > > barriers, decoupling static library prerequisites, parallelizing single-core
> > > script generators, and eradicating redundant feature checks, this series
> > > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> > >
> > > On a 28-core build workstation (make -j28 all from scratch), clean build
> > > latency improves by over 49%:
> > >
> > > Before:
> > > real 0m29.006s
> > > user 2m46.019s
> > > sys 0m30.610s
> > >
> > > After:
> > > real 0m14.782s
> > > user 2m39.527s
> > > sys 0m22.938s
> > >
> > > Saving 14.2 full seconds time per clean build. Furthermore, nothing to
> > > build incremental builds are improved by nearly 7x:
> > >
> > > Before:
> > > real 0m11.528s
> > > user 0m9.633s
> > > sys 0m6.965s
> > >
> > > After:
> > > real 0m1.729s
> > > user 0m1.600s
> > > sys 0m0.884s
> >
> > I've quickly checked it with latency profiling like below:
> >
> > $ perf record --latency -- make -C tools/perf
> >
> > $ perf report --latency -s comm
> >
> > The result looks like this.
> >
> > Before:
> > #
> > # Samples: 715K of event 'cpu/cycles/Pu'
> > # Event count (approx.): 422452811481
> > #
> > # Latency Overhead Command
> > # ........ ........ ...............
> > #
> > 45.28% 71.33% cc1
> > 34.48% 16.92% python3
> > 11.15% 2.21% ld
> > 2.58% 1.51% x86_64-linux-gn
> > 2.22% 0.99% cc1plus
> > 0.71% 0.63% sh
> > 0.69% 0.14% llvm-config
> > 0.62% 0.56% clang
> > 0.57% 4.40% shellcheck
> > 0.44% 0.12% perl
> >
> > After:
> > #
> > # Samples: 709K of event 'cpu/cycles/Pu'
> > # Event count (approx.): 416654798495
> > #
> > # Latency Overhead Command
> > # ........ ........ ...............
> > #
> > 64.99% 71.16% cc1
> > 15.07% 1.81% ld
> > 7.14% 17.59% python3
> > 3.66% 1.53% x86_64-linux-gn
> > 3.48% 0.75% cc1plus
> > 1.11% 4.43% shellcheck
> > 1.09% 0.74% sh
> > 0.86% 0.59% clang
> > 0.77% 0.12% perl
> > 0.45% 0.23% make
> >
> > Now I see a big drop in the latency from python. And the llvm-config
> > doesn't show up in the top 10.
>
> This looks good. What is "x86_64-linux-gn", and since we default off
> LIBPERL, why does perl show up in the commands?
>
> Thanks,
> Ian

So, Sashiko reviews caught 2 regressions I introduced in v3. Given
that people seem reasonably happy I propose the following for v4 to
make the easy bits easy to land:

1) Drop patch 1 the bpftool change - hopefully someone on the BPF side
can pull in an equivalent change so that we're not testing libbfd for
disassembly support in a bootstrap version of bpftool that doesn't
support disassembly.
2) Drop patch 2 on feature testing debuginfod and avoiding repeated
tests we can come back to it.
3) Fix patch 3 "tools build: Fix test-clang-bpf-co-re.bin" to cover
the other missed case and to make the dependency tracking better. We
shouldn't have to re-feature test BPF features on every incremental
build.
4) Patch 4 to 8, keep everything seems happy.
5) Drop patch 9 "perf build: Move static libbpf dependency out of
prepare step." There's parallelism in the prepare step, so this likely
wasn't gaining much build time and Sashiko seems to find many
potential not really issues that suggest the gain isn't worth the
pain.
6) Patch 10 needs a fix for a variable dropped from v2.
7) Patch 11 to 12, keep everything seems happy.
8) Patch 13, the header file declaring the variable seems to have gone
AWOL. Needs a fix.
9) Patch 14, Sashiko is warning "Does this loop overwrite standard
events from earlier architectures?" which would be a worry if (1) we
cared much about building for >1 architecture, or (2) more than just
arm64 were using the architecture standard files. I think we can
address this problem later.
10) Patch 15 to 17, keep everything seems happy.

Thanks,
Ian

> > Thanks,
> > Namhyung
> >
> > >
> > > Summary of Patches:
> > >
> > > 1-3: Foundational Tooling & Fast-Path Feature Detection
> > > - Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
> > > libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.
> > > - Integrates libdebuginfod directly into test-all.c, allowing Make to skip
> > > individual feature check sub-make forks during AST parsing on fully
> > > configured workstations. Escapes $(shell ...) macro expansion to prevent
> > > unconditional sub-make forks.
> > > - Fixes test-clang-bpf-co-re.bin feature check to correctly generate its
> > > target file on disk via atomic move (> $@.tmp && mv $@.tmp $@), allowing
> > > Kbuild to perfectly cache the detection result and avoid continuous sub-make
> > > re-evaluations.
> > >
> > > 4-6: Flattening Umbrella Prepare Barriers
> > > - builtin-trace embedded inclusions and pmu-events generation are completely
> > > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > > AST double-parsing overhead and unchoking parallel compilation barriers.
> > >
> > > 7-10: Decoupling & Pre-generating BPF Skeletons
> > > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > > build startup, removing the 7-second serialization bottleneck before BPF
> > > object compilation.
> > > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > > during make clean, and adds bpf-skel-prepare to .PHONY.
> > >
> > > 11-12: Foundational Linkage Optimization
> > > - Eliminates redundant libbpf sub-make feature checks during static builds.
> > > - Moves static libsymbol and libbpf library prerequisites out of the
> > > prepare step, ensuring libbpf headers are installed before
> > > compiling BPF-dependent tests.
> > >
> > > 13-14: jevents.py Concurrency & Deduplication
> > > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c
> > > into a dedicated pmu-events-string.c compilation unit. This slices
> > > C compilation latency in half by compiling string and struct
> > > tables simultaneously across separate CPU cores while preserving
> > > zero dynamic ELF relocations. Adds pmu-events-string.c to
> > > .gitignore and uses Make 4.0 compatible dependency chaining.
> > > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > > scope to ensure clean pickling under spawn multiprocessing start methods.
> > >
> > > 15: Out-of-Tree Incremental Rebuild Fix
> > > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > > Make from continuously re-executing script installation rules on already
> > > built out-of-tree builds.
> > >
> > > 16-17: AST Parsing Optimization & Shell Fork Eradication
> > > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive
> > > assignment (=) to simply expanded assignment (:=) and replaces
> > > model_name/vendor_name with pure GNU Make string functions. This
> > > guarantees Make executes directory probing shell forks exactly
> > > once during AST parsing and evaluates path macros purely in
> > > memory, completely eradicating over 7,800 redundant sub-processes
> > > during out-of-tree build evaluation.
> > > - Converts llvm-config shell queries in Makefile.config from
> > > recursive assignment (=) to simply expanded assignment (:=). This
> > > eliminates ~185 redundant sub-processes that were previously
> > > executed across object compilation dependency checks.
> > >
> > > Changes since v2:
> > > - Dropped Patch 4 (tools scripts: Short-circuit CC_NO_CLANG compiler
> > > probe in Makefile.include) to prevent potential cross-compilation
> > > regressions when CC and HOSTCC use different compilers.
> > > - tools build (Patch 2): Escaped $(shell ...) macro expansion as
> > > $$(shell ...) inside define feature_check_code to safely defer
> > > sub-make execution until after eval parses the ifeq guard.
> > > - tools build (Patch 3): Refactored test-clang-bpf-co-re.bin feature
> > > check recipe to redirect grep output to a temporary file and
> > > atomically move it upon success (> $@.tmp && mv $@.tmp $@),
> > > preventing Kbuild from permanently caching failed detections due to
> > > 0-byte files.
> > > - perf trace beauty (Patch 4): Updated commit description to accurately
> > > reflect the unconditional top-level recursive kbuild hook
> > > (perf-util-y += trace/beauty/).
> > > - perf build (Patch 7): Added $(OUTPUT)bench/bpf_skel/.tmp to
> > > bpf-skel-clean in Makefile.perf to ensure intermediate benchmark
> > > skeleton .bpf.o artifacts are cleanly removed during make clean.
> > > Removed unused bpf_skel_deps variable from bpf_skel.mak.
> > > - perf build (Patch 9): Added $(LIBBPF) as an explicit prerequisite to
> > > $(LIBPERF_TEST_IN) in Makefile.perf to guarantee libbpf headers are
> > > fully installed before compiling sigtrap.c or other BPF-dependent
> > > tests during parallel builds.
> > > - perf build (Patch 10): Added bpf-skel-prepare to the .PHONY target
> > > list in Makefile.perf to ensure Make never incorrectly skips the
> > > target if a file or directory named bpf-skel-prepare accidentally
> > > exists in the build tree.
> > > - perf pmu-events (Patch 13): Added pmu-events/pmu-events-string.c to
> > > tools/perf/.gitignore. Replaced grouped targets (&:) with Make 4.0
> > > compatible dependency chaining to guarantee backward compatibility
> > > with older Make versions (like 4.2.1) and prevent parallel builds
> > > from spawning multiple concurrent jevents.py processes.
> > > - perf pmu-events (Patch 14): Moved _init_worker from local main()
> > > scope to the top-level module scope in jevents.py to ensure it can be
> > > cleanly pickled when ProcessPoolExecutor uses the spawn
> > > multiprocessing start method (avoiding AttributeError crashes).
> > >
> > > Ian Rogers (17):
> > > bpftool build: Restrict feature tests during bootstrap compilation
> > > tools build: Integrate libdebuginfod into test-all fast path
> > > tools build: Fix test-clang-bpf-co-re.bin to generate target file
> > > perf trace beauty: Make beauty generated C code standalone .o files
> > > perf build: Decouple pmu-events from prepare umbrella target
> > > perf build: Remove empty archheaders target
> > > perf build: Move BPF skeleton generation out of Makefile.perf
> > > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > > perf build: Move static libbpf dependency out of prepare step
> > > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > > phase
> > > perf build: Move libsymbol dependency out of prepare step
> > > perf build: Remove redundant libbpf feature check for static builds
> > > perf pmu-events: Split big_c_string storage into standalone
> > > compilation unit
> > > perf pmu-events: Parallelize JSON and metric pre-computation in
> > > jevents.py
> > > perf build: Prefix SCRIPTS with output directory to fix continuous
> > > rebuilds
> > > perf pmu-events: Convert recursive shell assignments and macros to
> > > Make built-ins
> > > perf build: Convert llvm-config shell queries to simply expanded
> > > variables
> > >
> > > tools/bpf/bpftool/Makefile | 5 +
> > > tools/build/Makefile.feature | 6 +-
> > > tools/build/feature/Makefile | 4 +-
> > > tools/build/feature/test-all.c | 5 +
> > > tools/perf/.gitignore | 1 +
> > > tools/perf/Build | 2 +
> > > tools/perf/Makefile.config | 19 +-
> > > tools/perf/Makefile.perf | 431 ++----------------
> > > tools/perf/bench/Build | 6 +
> > > .../bpf_skel/bench_uprobe.bpf.c | 0
> > > tools/perf/bench/uprobe.c | 2 +-
> > > tools/perf/bpf_skel.mak | 109 +++++
> > > tools/perf/builtin-trace.c | 30 +-
> > > tools/perf/pmu-events/Build | 26 +-
> > > tools/perf/pmu-events/jevents.py | 56 ++-
> > > tools/perf/trace/beauty/Build | 280 ++++++++++++
> > > tools/perf/trace/beauty/arch_errno_names.c | 2 +
> > > tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
> > > tools/perf/trace/beauty/beauty.h | 60 +++
> > > tools/perf/trace/beauty/eventfd.c | 6 +-
> > > tools/perf/trace/beauty/fsconfig.c | 5 +
> > > tools/perf/trace/beauty/futex_op.c | 6 +-
> > > tools/perf/trace/beauty/futex_val3.c | 6 +-
> > > tools/perf/trace/beauty/mmap.c | 24 +-
> > > tools/perf/trace/beauty/mode_t.c | 6 +-
> > > tools/perf/trace/beauty/msg_flags.c | 8 +-
> > > tools/perf/trace/beauty/open_flags.c | 1 +
> > > tools/perf/trace/beauty/perf_event_open.c | 22 +-
> > > tools/perf/trace/beauty/pid.c | 5 +-
> > > tools/perf/trace/beauty/sched_policy.c | 8 +-
> > > tools/perf/trace/beauty/seccomp.c | 12 +-
> > > tools/perf/trace/beauty/signum.c | 6 +-
> > > tools/perf/trace/beauty/socket_type.c | 6 +-
> > > .../perf/{util => trace/beauty}/syscalltbl.c | 0
> > > .../perf/{util => trace/beauty}/syscalltbl.h | 0
> > > tools/perf/trace/beauty/tracepoints/Build | 22 +
> > > tools/perf/trace/beauty/waitid_options.c | 8 +-
> > > tools/perf/util/Build | 17 +-
> > > tools/perf/util/bpf-trace-summary.c | 2 +-
> > > tools/perf/util/env.c | 4 +-
> > > tools/perf/util/env.h | 1 +
> > > 41 files changed, 717 insertions(+), 504 deletions(-)
> > > rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
> > > create mode 100644 tools/perf/bpf_skel.mak
> > > create mode 100644 tools/perf/trace/beauty/fsconfig.c
> > > rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
> > > rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
> > >
> > > --
> > > 2.54.0.563.g4f69b47b94-goog
> > >