[REGRESSION] mm: MADV_PAGEOUT THP/no-swap refault takes ~1.7x longer on v6.19 than v6.12
From: Chengfeng Lin
Date: Mon May 18 2026 - 09:09:05 EST
Hi,
I would like to report a userspace-visible performance regression in a
MADV_PAGEOUT workload.
The workload is intentionally narrow:
- map 16 MiB anonymous memory
- use the default THP policy
- run in a guest with no configured swap
- call madvise(MADV_PAGEOUT)
- refault/write-touch the mapping
This is not meant as a generic madvise() or generic MADV_PAGEOUT
regression report. The signal is currently scoped to the THP + no-swap +
refault/write-touch workflow above.
The current public evidence bundle is here:
https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/madvise-pageout-thp-noswap-refault
The standalone workload source is here:
https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/madvise-pageout-thp-noswap-refault/workload
The formal experiment profile is here:
https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/madvise-pageout-thp-noswap-refault/experiments
The formal timing runs compare v6.12.77 and v6.19.9 with similar kernel
configuration, using QEMU direct boot. The formal performance runs were
clean timing runs with coverage disabled. Coverage was collected
separately and is not used for the timing numbers below.
Lab environment:
host label: lcf
host kernel: Linux 6.14.0-37-generic x86_64
QEMU: qemu-system-x86_64 8.2.2
container/cgroup CPU set: 0,2,4,6,8,10,12,14
container/cgroup memory limit: 16106127360 bytes
guest memory: QEMU_MEM_MB=14336
guest CPUs: QEMU_SMP=1/2/4
repetitions: 9
version order: interleaved
performance coverage_enabled: false
Primary result, cycle_ns_per_page, lower is better:
CPU v6.12.77 v6.19.9 old-lower-vs-new v6.19/v6.12
1 1900.3 3304.7 42.5% 1.74x
2 2107.7 3583.2 41.2% 1.70x
4 2154.2 3690.9 41.6% 1.71x
MADV_PAGEOUT syscall/reclaim-side segment, advise_ns_per_page, lower is
better:
CPU v6.12.77 v6.19.9 old-lower-vs-new v6.19/v6.12
1 1713.2 2922.7 41.4% 1.71x
2 1924.7 3162.9 39.1% 1.64x
4 1953.1 3284.2 40.5% 1.68x
The current mechanism interpretation is that the timing difference is in
the MADV_PAGEOUT/reclaim part, not primarily in the later refault touch.
The path evidence points at the no-swap reclaim/swap-allocation-failure
chain:
madvise(MADV_PAGEOUT)
-> reclaim_pages()
-> shrink_folio_list()
-> folio_alloc_swap()
-> swap allocation failure path
I have not bisected the exact culprit commit yet. Separate release-level
sanity checks showed v6.18.19 already in the slow range, so the current
best reporting range is:
#regzbot introduced: v6.12..v6.18
Please let me know if a different reproducer shape, a narrower bisect, or
additional raw logs would be more useful.