[REGRESSION] mm: MADV_PAGEOUT THP/no-swap refault takes ~1.7x longer on v6.19 than v6.12

From: Chengfeng Lin

Date: Mon May 18 2026 - 09:09:05 EST

Hi,

I would like to report a userspace-visible performance regression in a
MADV_PAGEOUT workload.

The workload is intentionally narrow:

- map 16 MiB anonymous memory
- use the default THP policy
- run in a guest with no configured swap
- call madvise(MADV_PAGEOUT)
- refault/write-touch the mapping

This is not meant as a generic madvise() or generic MADV_PAGEOUT
regression report. The signal is currently scoped to the THP + no-swap +
refault/write-touch workflow above.

The current public evidence bundle is here:

https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/madvise-pageout-thp-noswap-refault

The standalone workload source is here:

https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/madvise-pageout-thp-noswap-refault/workload

The formal experiment profile is here:

https://github.com/lcf0399/linux-mm-regression-evidence-2026-05/tree/e13469b/madvise-pageout-thp-noswap-refault/experiments

The formal timing runs compare v6.12.77 and v6.19.9 with similar kernel
configuration, using QEMU direct boot. The formal performance runs were
clean timing runs with coverage disabled. Coverage was collected
separately and is not used for the timing numbers below.

Lab environment:

host label: lcf
host kernel: Linux 6.14.0-37-generic x86_64
QEMU: qemu-system-x86_64 8.2.2
container/cgroup CPU set: 0,2,4,6,8,10,12,14
container/cgroup memory limit: 16106127360 bytes
guest memory: QEMU_MEM_MB=14336
guest CPUs: QEMU_SMP=1/2/4
repetitions: 9
version order: interleaved
performance coverage_enabled: false

Primary result, cycle_ns_per_page, lower is better:

CPU v6.12.77 v6.19.9 old-lower-vs-new v6.19/v6.12
1 1900.3 3304.7 42.5% 1.74x
2 2107.7 3583.2 41.2% 1.70x
4 2154.2 3690.9 41.6% 1.71x

MADV_PAGEOUT syscall/reclaim-side segment, advise_ns_per_page, lower is
better:

CPU v6.12.77 v6.19.9 old-lower-vs-new v6.19/v6.12
1 1713.2 2922.7 41.4% 1.71x
2 1924.7 3162.9 39.1% 1.64x
4 1953.1 3284.2 40.5% 1.68x

The current mechanism interpretation is that the timing difference is in
the MADV_PAGEOUT/reclaim part, not primarily in the later refault touch.
The path evidence points at the no-swap reclaim/swap-allocation-failure
chain:

madvise(MADV_PAGEOUT)
-> reclaim_pages()
-> shrink_folio_list()
-> folio_alloc_swap()
-> swap allocation failure path

I have not bisected the exact culprit commit yet. Separate release-level
sanity checks showed v6.18.19 already in the slow range, so the current
best reporting range is:

#regzbot introduced: v6.12..v6.18

Please let me know if a different reproducer shape, a narrower bisect, or
additional raw logs would be more useful.