Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
From: Barry Song
Date: Tue Apr 28 2026 - 01:40:39 EST
On Fri, Apr 24, 2026 at 7:53 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, 18 Apr 2026 20:02:33 +0800 "Barry Song (Xiaomi)" <baohua@xxxxxxxxxx> wrote:
>
> > MGLRU gives high priority to folios mapped in page tables.
> > As a result, folio_set_active() is invoked for all folios
> > read during page faults. In practice, however, readahead
> > can bring in many folios that are never accessed via page
> > tables.
> >
> > A previous attempt by Lei Liu proposed introducing a separate
> > LRU for readahead[1] to make readahead pages easier to reclaim,
> > but that approach is likely over-engineered.
> >
> > Before commit 4d5d14a01e2c ("mm/mglru: rework workingset
> > protection"), folios with PG_active were always placed in
> > the youngest generation, leading to over-protection and
> > increased refaults. After that commit, PG_active folios
> > are placed in the second youngest generation, which is
> > still too optimistic given the presence of readahead. In
> > contrast, the classic active/inactive scheme is more
> > conservative.
> >
> > This patch switches to folio_mark_accessed(). If
> > folio_check_references() later detects referenced PTEs,
> > the folio will be promoted based on the reference flag
> > set by folio_mark_accessed().
> >
> > The following uses a simple model to demonstrate why the current
> > code is not ideal. It runs fio-3.42 in a memcg, reading a file in a
> > strided pattern—4KB every 64KB—to simulate prefaulted pages that may
> > not be accessed.
>
> Are you able to suggest any workloads which might regress? And test
> for those?
I don’t have a specific workload, but I can imagine one. For example,
a workload with readahead, where all readahead pages are mapped and
all folios also happen to be hot. In this case, placing them into
active generations at the readahead stage might be beneficial. However,
this patch would not lose them either; it may just require a more
folio_check_references()- PTE scan to confirm they are truly
accessed before activating them.
>
> > Without the patch, we observed 12883855 file refaults and a very low
> > bandwidth of 58.5 MiB/s, because prefaulted but unused pages occupy
> > hot positions, continuously pushing out the real working set and
> > causing incorrect reclaim. With the patch, we observed 0 refaults
> > and bandwidth increased to 5078 MiB/s.
>
> Wow. And that isn't a crazy workload.
Right.
Readahead is mainly for I/O performance, but it does not necessarily
mean those pages will be accessed or become hot. On a memory-limited
system, promoting readahead folios can be very harmful.
>
> > For those who want to try the model on x86, you will need the
> > following in arch/x86/include/asm/pgtable.h.
> >
> > #define arch_wants_old_prefaulted_pte arch_wants_old_prefaulted_pte
> > static inline bool arch_wants_old_prefaulted_pte(void)
> > {
> > return true;
> > }
>
> Can you propose a patch? We can at least toss it in there for testing
> while we think about it.
I can include this RFC after this patch in v2. Right now only arm64
supports fault_around of old PTEs; it seems all architectures were
disabled due to an x86 issue in commit 315d09bf30c2
("Revert 'mm: make faultaround produce old ptes'"). It was later
revived only for arm64 by Will.
I may reach out to and engage the x86 and RISC-V communities for a
revisit. I actually have Zhe Qiao testing it on newer AMD and Intel
platforms. With sufficient memory and no reclaim pressure, mapping
fault_around PTEs as old still shows a small UnixBench regression, as
below. However, I doubt this is significant, because under memory
pressure, reclaiming the correct folios is likely more important than
the hardware access flag cost, offsetting the HW AF overhead.
Alternatively, we may need some self-tuning logic to detect memory
pressure and decide whether to map fault_around PTEs as old on
platforms where HW AF handling is costly?
Thanks to Zhe Qiao for the data below. Cc’ing Zhe Qiao as well.
AMD Ryzen 9 9950X 16-Core Processor:
========== Kernel A: 7.0.0-14-generic ==========
================================================================
Kernel: 7.0.0-14-generic
================================================================
Test Index (avg) n Iters
----------------------------------------------------------------
dhry2reg 111703.38 4 [0,1,2,3]
whetstone-double 34170.80 4 [0,1,2,3]
execl 15312.25 4 [0,1,2,3]
fstime 59662.58 4 [0,1,2,3]
fsbuffer 72696.42 4 [0,1,2,3]
fsdisk 31885.47 4 [0,1,2,3]
pipe 61432.48 4 [0,1,2,3]
context1 15438.23 4 [0,1,2,3]
spawn 16349.47 4 [0,1,2,3]
syscall 38748.50 4 [0,1,2,3]
shell1 24710.05 4 [0,1,2,3]
shell8 20207.05 4 [0,1,2,3]
----------------------------------------------------------------
Valid tests: 12 / 12
╔══════════════════════════════════════════════════╗
║ System Benchmarks Index Score: 34045.39 ║
╚══════════════════════════════════════════════════╝
========== Kernel B: 7.0.0-custom-test+ ==========
================================================================
Kernel: 7.0.0-custom-test+
================================================================
Test Index (avg) n Iters
----------------------------------------------------------------
dhry2reg 105061.82 5 [0,1,2,3,4]
whetstone-double 33945.03 4 [0,1,2,3]
execl 13992.70 5 [0,1,2,3,4]
fstime 59037.77 4 [0,1,2,3]
fsbuffer 72465.12 4 [0,1,2,3]
fsdisk 28388.05 4 [0,1,2,3]
pipe 64047.38 4 [0,1,2,3]
context1 15322.56 5 [0,1,2,3,4]
spawn 16020.58 4 [0,1,2,3]
syscall 40669.70 4 [0,1,2,3]
shell1 23514.32 4 [0,1,2,3]
shell8 19393.55 4 [0,1,2,3]
----------------------------------------------------------------
Valid tests: 12 / 12
╔══════════════════════════════════════════════════╗
║ System Benchmarks Index Score: 33159.46 ║
╚══════════════════════════════════════════════════╝
================================================================
Per-Test Index Comparison
================================================================
Test Kernel A Kernel B Diff %
----------------------------------------------------------------
dhry2reg 111703.38 105061.82 -5.95% ⬇
whetstone-double 34170.80 33945.03 -0.66%
execl 15312.25 13992.70 -8.62% ⬇
fstime 59662.58 59037.77 -1.05%
fsbuffer 72696.42 72465.12 -0.32%
fsdisk 31885.47 28388.05 -10.97% ⬇
pipe 61432.48 64047.38 +4.26% ⬆
context1 15438.23 15322.56 -0.75%
spawn 16349.47 16020.58 -2.01% ⬇
syscall 38748.50 40669.70 +4.96% ⬆
shell1 24710.05 23514.32 -4.84% ⬇
shell8 20207.05 19393.55 -4.03% ⬇
================================================================
Final System Benchmarks Index Score
================================================================
7.0.0-14-generic 34045.39
7.0.0-custom-test+ 33159.46
--------------------------------------------
B vs A -2.60%
INTEL(R) XEON(R) PLATINUM 8575C:
========== Kernel A: 7.0.0-14-generic ==========
================================================================
Kernel: 7.0.0-14-generic
Prefix: 24-300s- (24 threads)
================================================================
Test AvgScore Baseline Index n
----------------------------------------------------------------
dhry2reg 87226.50 116700.0 7.47 4
whetstone-double 29323.70 55.0 5331.58 4
execl 12638.17 43.0 2939.11 4
fstime 84223.20 3960.0 212.68 4
fsbuffer 56491.22 1655.0 341.34 4
fsdisk 34570.90 5800.0 59.61 4
pipe 45941.97 12440.0 36.93 4
context1 20177.08 4000.0 50.44 4
spawn 9498.88 126.0 753.88 4
syscall 29181.70 15000.0 19.45 4
shell1 21060.58 42.4 4967.12 4
shell8 17669.08 6.0 29448.47 4
----------------------------------------------------------------
Valid tests: 12 / 12
╔════════════════════════════════════════════════╗
║ System Benchmarks Index Score: 335.36 ║
╚════════════════════════════════════════════════╝
========== Kernel B: 7.0.0-custom-test+ ==========
================================================================
Kernel: 7.0.0-custom-test+
Prefix: 24-300s- (24 threads)
================================================================
Test AvgScore Baseline Index n
----------------------------------------------------------------
dhry2reg 87607.90 116700.0 7.51 4
whetstone-double 29092.33 55.0 5289.51 4
execl 12318.00 43.0 2864.65 4
fstime 85738.35 3960.0 216.51 4
fsbuffer 57621.05 1655.0 348.16 4
fsdisk 33608.60 5800.0 57.95 4
pipe 46320.38 12440.0 37.24 4
context1 20450.12 4000.0 51.13 4
spawn 9579.15 126.0 760.25 4
syscall 29563.43 15000.0 19.71 4
shell1 20073.10 42.4 4734.22 4
shell8 16946.05 6.0 28243.42 4
----------------------------------------------------------------
Valid tests: 12 / 12
╔════════════════════════════════════════════════╗
║ System Benchmarks Index Score: 333.55 ║
╚════════════════════════════════════════════════╝
================================================================
Per-Test Index Comparison
================================================================
Test Kernel A Kernel B Diff %
----------------------------------------------------------------
dhry2reg 87226.50 87607.90 +0.44%
whetstone-double 29323.70 29092.33 -0.79%
execl 12638.17 12318.00 -2.53% ⬇
fstime 84223.20 85738.35 +1.80%
fsbuffer 56491.22 57621.05 +2.00% ⬆
fsdisk 34570.90 33608.60 -2.78% ⬇
pipe 45941.97 46320.38 +0.82%
context1 20177.08 20450.12 +1.35%
spawn 9498.88 9579.15 +0.85%
syscall 29181.70 29563.43 +1.31%
shell1 21060.58 20073.10 -4.69% ⬇
shell8 17669.08 16946.05 -4.09% ⬇
================================================================
Final System Benchmarks Index Score
================================================================
7.0.0-14-generic 335.36
7.0.0-custom-test+ 333.55
--------------------------------------------
B vs A -0.54%
>
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -512,7 +512,7 @@ void folio_add_lru(struct folio *folio)
> > /* see the comment in lru_gen_folio_seq() */
> > if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> > lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> > - folio_set_active(folio);
> > + folio_mark_accessed(folio);
> >
> > folio_batch_add_and_move(folio, lru_add);
> > }
>
> lol, I was expecting something larger ;)
Yep. I usually prefer small patches if they can resolve the problem,
which would make our lives easier :-)
But we will likely need a 2/2 patch for refault activation, as
discussed with Shakeel and Axel.
Thanks
Barry