Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
From: Kairui Song
Date: Tue Apr 28 2026 - 23:18:34 EST
On Wed, Apr 29, 2026 at 6:26 AM Barry Song <baohua@xxxxxxxxxx> wrote:
>
> On Wed, Apr 29, 2026 at 2:55 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> >
> > On Sat, Apr 18, 2026 at 8:03 PM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
> > >
> > > diff --git a/mm/swap.c b/mm/swap.c
> > > index 5cc44f0de987..e3cf703ccb89 100644
> > > --- a/mm/swap.c
> > > +++ b/mm/swap.c
> > > @@ -512,7 +512,7 @@ void folio_add_lru(struct folio *folio)
> > > /* see the comment in lru_gen_folio_seq() */
> > > if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> > > lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> > > - folio_set_active(folio);
> > > + folio_mark_accessed(folio);
> >
> > Hi Barry,
> >
> > Sorry I haven't checked everything yet, but just a naive idea: What if
> > we just remove this whole lru_gen_* check chunk here? Only keep the
>
> Do you mean the below?
>
> index 5cc44f0de987..499ad49c1b51 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -509,11 +509,6 @@ void folio_add_lru(struct folio *folio)
> folio_test_unevictable(folio), folio);
> VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
>
> - /* see the comment in lru_gen_folio_seq() */
> - if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> - lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> - folio_set_active(folio);
> -
> folio_batch_add_and_move(folio, lru_add);
> }
> EXPORT_SYMBOL(folio_add_lru);
>
> If so, this essentially resembles the active/inactive LRU. But I
> assume Yu Zhao’s earlier point about mmaped folio access still
> has some merit? The problem, however, is that readahead and
> prefaulting may have made this assumption less accurate, since
> being mmaped doesn’t necessarily mean the user actually wants
> to access it.
Right, my question is that, foundunmentally, it depends on which
assumption is true:
- All allocated folios should be treated equally.
vs
- Page fault allocation folios are more important.
Some databases have the option to use mmap or read, in that case, it's
only two different methods for reading the data in that case.
Classical LRU also used to treat anon folios differently, until commit
ccc5dc67340c and the series following it changed this.
Still I'm not against the idea that page fault allocation is more
important, I'm just not sure about this. I think taking your approach
first then improving it further might be a good idea.
> I mean, I’m at least convinced the following might be correct:
>
> @@ -509,10 +511,14 @@ void folio_add_lru(struct folio *folio)
> folio_test_unevictable(folio), folio);
> VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
>
> - /* see the comment in lru_gen_folio_seq() */
> - if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> - lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> - folio_set_active(folio);
> + /*
> + * For architectures without old prefaulted PTEs, we need a first
> + * PTE scan to clear the access flag set during prefault, and a second
> + * scan to confirm the folio is active. For architectures with old
> + * prefaulted PTEs, we can skip the scan that clears the access flag.
> + */
> + if (arch_wants_old_prefaulted_pte())
> + folio_mark_accessed(folio);
>
> folio_batch_add_and_move(folio, lru_add);
> }
>
> It could also be the case below to check whether fault_around is
> disabled, if it’s not too ugly :-)
>
> + if (arch_wants_old_prefaulted_pte() || fault_around_bytes == PAGE_SIZE)
> + folio_mark_accessed(folio);
Seems an interesting idea!
> > Is there any evidence that folios that are allocated through fault are
> > always frequently used folios? Because classical LRU has the exact
> > opposite assumption on that. Refault distance based activation is more
> > battle tested (I'm not saying that is absolutely right though).
>
> I agree with this. I’m also queuing some code for testing
> to check whether reclamation has occurred very recently.
> If so, we set the folios active:
> https://lore.kernel.org/linux-mm/20260428013520.47417-1-baohua@xxxxxxxxxx/
>
> So basically we’re on the same page, just taking slightly different
> approaches to checking recency during refault?
That's a very nice idea too!
And actually I hesitated a lot between using refault distance / gen
distance. And didn't find a good reason to make a decision on that.
Using refault distance makes the result much more accurate, its page
granularity, and aging is a kind of random thing, each gen will have
vastly different lifetime, and could be very rough.
Using gen number makes things much cleaner, avoid the bucket in
workingset.c and might work better if we have periodical aging
support. Also avoid to deal with eviction root problem which will
somehow might also help to make the swap table more compact :)
https://lore.kernel.org/linux-mm/CAMgjq7Aq5ckraKtNtet8+1ANuqnitFsXxefbDJQZpBxNmaW7Cg@xxxxxxxxxxxxxx/