Re: [PATCH v6 00/14] mm/mglru: improve reclaim loop and dirty folio handling
From: Kairui Song
Date: Sun Apr 26 2026 - 02:59:57 EST
On Sun, Apr 26, 2026 at 4:58 AM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
>
> On Sat, Apr 25, 2026 at 9:30 PM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> [...]
> >
> > I just ran a matrix of for kernels (mainline, mm-new HEAD, before this
> > series, after this series) X 3 different memcg configs (-j96 3G, -j48
> > 2G, -j24 1G), and none of these showed any regression but all
> > improvement. That's really odd.
> >
> > One possibility is that I removed the:
> >
> > if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS >
> > lrugen->max_seq)
> > scanned = 0;
> >
> > Which will make the reclaim loop go further and trigger aging.
> > Previously if reclaim drained the LRU's cold gens, it may go reclaim
> > slab instead. So idle inodes will be dropped with the mapping and
> > reclaim more file, and we won't see any refault data from that since
> > the mapping itself is gone. Sys will be lower too, as IO isn't counted
> > as sys. Checking your data, despite sys is higher, real is acutually
> > lower, which matches my guess.
> >
> > Will the following patch help? I'm not sure if this is the problem,
> > but this added back that early abort, personally I don't think this
> > really makes much sense as it's more like a workaround for other
> > issues, but if that helps we might better keep it.
>
> Hi Kairui, after five hours of sleep I’m feeling much more
> refreshed and should have identified the issue. I think the
> problem will be clear once you look at the patch below.
> Feel free to include it in the new version if you find it
> helpful.
>
> From e3a0b50dc53a3a5f2ef1adfb73111336e3c2d08b Mon Sep 17 00:00:00 2001
> From: "Barry Song (Xiaomi)" <baohua@xxxxxxxxxx>
> Date: Sun, 26 Apr 2026 08:34:21 +1200
> Subject: [PATCH] mm/mglru: Avoid falling back to another type when
> scan_folios() leaves no remaining pages
>
> While remaining reaches 0 in scan_folios(), we quickly fall back
> to the other type in isolate_folios(). This is incorrect, as the
> current type may still have sufficient folios. Falling back can
> undermine the positive_ctrl_err() result from get_type_to_scan(),
> which is derived from swappiness.
>
> A simple fix is to continue scanning this type for another round.
>
> Signed-off-by: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> ---
> mm/vmscan.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ef45f3a4aa38..169fbbe17c7b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4788,8 +4788,13 @@ static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> *isolate_scanned = scanned;
> break;
> }
> -
> - type = !type;
> + /*
> + * If scanned > 0 and isolated == 0, avoid falling back to the
> + * other type, as this type remains sufficient. Falling back
> + * too readily can disrupt the positive_ctrl_err() bias.
> + */
> + if (!scanned)
> + type = !type;
> }
Thanks so much for catching this! The fix makes perfect sense, it
restores the pre-patch behavior. I was too aggressive with the
fallback; positive_ctrl_err() should win whenever the preferred type
is actually productive.
Would you prefer I squash this into the original patch (with a
Co-developed-by for you?), or keep it as a standalone fix on top? I'm
fine either way.
One related thought for a follow-up: scanned == 0 from scan_folios()
is essentially driven by get_nr_gens(lruvec, type) == MIN_NR_GENS. So
when the type get_type_to_scan() picked is the one that's
gen-exhausted, maybe the right response is also to age that type
rather than fall back. Right now should_run_aging() only looks at the
combined evictable_min_seq, so the ctrl_err preferred type can
silently stay starved while we evict from the other one, that could be
an old issue we can fix later.