Re: [PATCH v7 00/14] perf build: Reduce build time by nearly half

From: Arnaldo Carvalho de Melo

Date: Wed May 20 2026 - 18:43:43 EST


On Tue, May 19, 2026 at 05:18:31PM -0700, Namhyung Kim wrote:
> On Tue, May 19, 2026 at 11:53:08AM -0700, Ian Rogers wrote:
> > On Tue, May 19, 2026 at 11:49 AM Arnaldo Carvalho de Melo
> > <acme@xxxxxxxxxx> wrote:
> > >
> > > On Tue, May 19, 2026 at 11:27:05AM -0700, Namhyung Kim wrote:
> > > > On Mon, May 18, 2026 at 08:46:24AM -0700, Ian Rogers wrote:
> > > > > This patch series refactors Kbuild internals, BPF skeleton generation,
> > > > > Python AST pre-computation, and foundational tooling dependencies across
> > > > > the perf tool build system. By eliminating umbrella target synchronization
> > > > > barriers, decoupling static library prerequisites, parallelizing single-core
> > > > > script generators, and eradicating redundant feature checks, this series
> > > > > unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
> > > > >
> > > > > On a 28-core build workstation (make -j28 all from scratch), clean build
> > > > > latency improves by over 44%:
> > > > >
> > > > > Before:
> > > > > real 0m29.006s
> > > > > user 2m46.019s
> > > > > sys 0m30.610s
> > > > >
> > > > > After:
> > > > > real 0m16.091s
> > > > > user 2m40.135s
> > > > > sys 0m25.740s
> > > > >
> > > > > Saving 12.9 full seconds time per clean build. Furthermore, nothing to
> > > > > build incremental builds are improved by nearly 7x:
> > > > >
> > > > > Before:
> > > > > real 0m11.528s
> > > > > user 0m9.633s
> > > > > sys 0m6.965s
> > > > >
> > > > > After:
> > > > > real 0m1.717s
> > > > > user 0m1.682s
> > > > > sys 0m0.960s
> > > > >
> > > > > Summary of Patches:
> > > > >
> > > > > 1: Fast-Path Feature Detection
> > > > > - Refactors test-clang-bpf-co-re.bin and test-bpftool-skeletons.bin feature
> > > > > checks to group shell pipelines within curly braces and redirect both stdout
> > > > > and stderr to .make.output before touching $@ purely upon success
> > > > > (> $(@:.bin=.make.output) 2>&1 && touch $@). Grouping the pipeline ({ cmd1 | cmd2; })
> > > > > ensures that compiler stderr is successfully captured in .make.output rather
> > > > > than escaping to the parent shell. This perfectly matches standard Kbuild
> > > > > feature check conventions and ensures the target files are touched on disk
> > > > > purely upon success, allowing Kbuild to cache positive detections and avoid
> > > > > continuous sub-make re-evaluations during incremental builds. Adds
> > > > > test-bpftool-skeletons.bin to the clean FILES list and explicit source
> > > > > prerequisite test-clang-bpf-co-re.c.
> > > >
> > > > I think patch 1 can be separated and needs Ack/Review from BPF folks.
> > > >
> > > > >
> > > > > 2-4: Flattening Umbrella Prepare Barriers
> > > > > - builtin-trace embedded inclusions and pmu-events generation are completely
> > > > > decoupled from the sequential "prepare" umbrella target, eliminating Make
> > > > > AST double-parsing overhead and unchoking parallel compilation barriers.
> > > > >
> > > > > 5-7: Decoupling & Pre-generating BPF Skeletons
> > > > > - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> > > > > - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> > > > > attaching bpf-skel-prepare directly to the umbrella prepare target. This
> > > > > allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> > > > > build startup, removing the 7-second serialization bottleneck before BPF
> > > > > object compilation.
> > > > > - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> > > > > during make clean, and adds bpf-skel-prepare to .PHONY.
> > > > >
> > > > > 8-9: Foundational Linkage Optimization
> > > > > - Moves static libsymbol library prerequisites out of the prepare step.
> > > > > - Eliminates redundant libbpf sub-make feature checks during static builds.
> > > > >
> > > > > 10-11: jevents.py Concurrency & Deduplication
> > > > > - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
> > > > > dedicated pmu-events-string.c compilation unit. This slices C compilation
> > > > > latency in half by compiling string and struct tables simultaneously across
> > > > > separate CPU cores while preserving zero dynamic ELF relocations. Adds
> > > > > pmu-events-string.c to .gitignore, declares extern const char big_c_string[];
> > > > > locally inside output_string_file and output_file when split to prevent linkage
> > > > > conflicts with empty-pmu-events.c, defers file closures to ensure identical
> > > > > timestamps, and uses canonical Make 4.0 @: dependency chaining.
> > > > > - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> > > > > all available CPU cores using ProcessPoolExecutor (accelerating Python
> > > > > execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> > > > > scope to ensure clean pickling under spawn multiprocessing start methods.
> > > > >
> > > > > 12: Out-of-Tree Incremental Rebuild Fix
> > > > > - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> > > > > Make from continuously re-executing script installation rules on already
> > > > > built out-of-tree builds.
> > > > >
> > > > > 13-14: AST Parsing Optimization & Shell Fork Eradication
> > > > > - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
> > > > > (=) to simply expanded assignment (:=) and replaces model_name/vendor_name
> > > > > with pure GNU Make string functions. This guarantees Make executes directory
> > > > > probing shell forks exactly once during AST parsing and evaluates path macros
> > > > > purely in memory, completely eradicating over 7,800 redundant sub-processes
> > > > > during out-of-tree build evaluation.
> > > > > - Converts llvm-config shell queries in Makefile.config from recursive assignment
> > > > > (=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
> > > > > that were previously executed across object compilation dependency checks.
> > > > >
> > > > > Changes since v6:
> > > > > - Rebase/resend as last series failed to apply by Sashiko.
> > > > >
> > > > > Ian Rogers (14):
> > > > > tools build: Fix feature checks to touch target files on success
> > > > > perf trace beauty: Make beauty generated C code standalone .o files
> > > > > perf build: Decouple pmu-events from prepare umbrella target
> > > > > perf build: Remove empty archheaders target
> > > > > perf build: Move BPF skeleton generation out of Makefile.perf
> > > > > perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> > > > > perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> > > > > phase
> > > > > perf build: Move libsymbol dependency out of prepare step
> > > > > perf build: Remove redundant libbpf feature check for static builds
> > > > > perf pmu-events: Split big_c_string storage into standalone
> > > > > compilation unit
> > > > > perf pmu-events: Parallelize JSON and metric pre-computation in
> > > > > jevents.py
> > > > > perf build: Prefix SCRIPTS with output directory to fix continuous
> > > > > rebuilds
> > > > > perf pmu-events: Convert recursive shell assignments and macros to
> > > > > Make built-ins
> > > > > perf build: Convert llvm-config shell queries to simply expanded
> > > > > variables
> > > >
> > > > Reviewed-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > >
> > > So this is for 2-14? I haven't checked if 1 can be left out of an
> > > initial merge by me.
> >
> > I believe you are correct. Patch 1 is completely independent because
> > it is the only change in tools/build; everything else is in
> > tools/perf.
>
> Actually it goes to the patch 1 as well. But we can take 2-14 in the
> perf tree first.

Ok, lets go with 2-14, we can look at 1 later.

- Arnaldo