RE: [PATCH RFC] mm/mglru: lazily activate folios while folios are really mapped

From: wangzicheng

Date: Thu Mar 19 2026 - 06:31:21 EST

Hi Barry,

Thank you for the suggestion.

I have re-designed the workload and get the relative promising results.
The workload repeatedly launches and switches between 30 apps
for 500 rounds. Since the test takes quite a long time, the final results
appear relatively stable across runs.

The testing was done on an Android 16 device with kernel 6.6.89,
8GB RAM, MGLRU enabled.

However, the results are not very easy to interpret.

Average number of kept-alive apps: ±0.08 apps
Average available memory (sampled after each app launch):
baseline vs patched: 2216MB vs 2218MB (~2MB difference)

Below is the vmstat comparison (patched vs baseline):

Metric Change
--------------------------- --------
pgpgin +2.06%
pgpgout +3.10%
pswpin +14.13%
pswpout +4.55%
pgfault -3.19%
pgmajfault +12.75%
workingset_refault_anon +14.77%
workingset_refault_file +3.48%
workingset_activate_anon -3.45%
workingset_activate_file -17.76%
workingset_restore_anon -3.44%
workingset_restore_file -19.13%

In v6.6, when PG_active is set, pages go to the youngest generation,
while pages without PG_active go to the second oldest generation.
```
static inline bool lru_gen_add_folio(
...
if (folio_test_active(folio))
seq = lrugen->max_seq;
...
else
seq = lrugen->min_seq[type] + 1;
```

My rough expectation was that the patch should make file pages more
prone to reclaim and make file page hot/cold aging more accurate, so
both file refault and anon refault might decrease. But here anon refault
increases instead.

I’m not sure if this assumption is correct. Could you share your thoughts
on how to interpret these results?

Thanks,
Zicheng

> -----Original Message-----
> From: owner-linux-mm@xxxxxxxxx <owner-linux-mm@xxxxxxxxx> On Behalf
> Of Barry Song
> Sent: Sunday, March 1, 2026 12:16 PM
> To: wangzicheng <wangzicheng@xxxxxxxxx>
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Suren Baghdasaryan <surenb@xxxxxxxxxx>; Lei Liu
> <liulei.rjpt@xxxxxxxx>; Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>;
> Axel Rasmussen <axelrasmussen@xxxxxxxxxx>; Yuanchu Xie
> <yuanchu@xxxxxxxxxx>; Wei Xu <weixugc@xxxxxxxxxx>; Kairui Song
> <kasong@xxxxxxxxxxx>; Tangquan Zheng <zhengtangquan@xxxxxxxx>;
> wangtao <tao.wangtao@xxxxxxxxx>
> Subject: Re: [PATCH RFC] mm/mglru: lazily activate folios while folios are
> really mapped
>
> On Sat, Feb 28, 2026 at 6:28 PM wangzicheng <wangzicheng@xxxxxxxxx>
> wrote:
> >
> > Hi Barry,
> > >
> > > I find your concern a bit surprising. If I understand correctly,
> > > you’re observing that file folios are currently being over-reclaimed.
> > > In that case, placing hot pages at the tail might make them harder
> > > to reclaim after PTE scanning (since they may still be young), but
> > > this seems to violate the fundamental principle of LRU. Moreover,
> > > when scanning encounters young file folios, reclaim will simply
> > > continue scanning more folios to find reclaimable ones, so scanning
> > > hot folios only wastes CPU time.
> > > Since read-ahead cold folios are placed at the head, relatively hotter
> > > folios may be reclaimed instead, causing refaults and further triggering
> > > reclaim, which can worsen the situation.
> > >
> > Thank you for the detailed explanation.
> > > >
> > > > We'll test this when available and report back. We hope to have a
> > > > chance to discuss this topic at LSF/MM/BPF.
> > > >
> > >
> > > Sure, thanks!
> > >
> > > Barry
> >
> > For evaluation I’m using a workload that repeatedly cold-starts and
> > drives same user actions in 20+ apps on Android.
> > I’m comparing baseline(v6.6) vs. the patched kernel and watching
> > `/proc/vmstat -> workingset_refault_file`, expecting it to go down.
> >
> > I ran 3 runs per kernel, but `workingset_refault_file` is quite noisy,
> > the Coefficient of Variation is around 40%, so the result doesn’t look
> > statistically solid.
> >
> > Do you have any suggestions on how to measure the benefit more
> > robustly? For example:
> > - different or longer-running workloads,
> > - better normalization for refaults (per time, per faults, etc.),
> > - or other vmstat metrics that you found more stable in practice?
>
> I've cc'ed Tangquan, and he may be able to share how he was testing.
> Basically, you may want to disable Wi-Fi, as it can introduce a lot of
> variability between runs. Aside from refault metrics, you should also
> see reduced I/O load and fewer swap-out/in events if you run the same
> sequence of apps consistently.
>
> >
> > I’m also considering increasing the number of runs and using a t-test,
> > or comparing the CDF between baseline and patched kernels.
> > If you have a preferred methodology, I’d like to align with that.
> >
>
> Thanks
> Barry