Re: [PATCH v7 06/15] mm/mglru: avoid reclaim type fall back when isolation makes no progress
From: Kairui Song
Date: Tue Apr 28 2026 - 00:19:35 EST
Forwarding sashiko review, which I think is just false positives, just
to make things clear:
> commit dd4c6bd11860d1bc69ef77e9f0d457abec425fc6
> Author: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> mm/mglru: avoid reclaim type fall back when isolation makes no progress
> This commit modifies isolate_folios() to only fallback to the alternate LRU
> type if no folios were scanned. This prevents falling back when folios are
> scanned but none are isolated.
> Link: https://lore.kernel.org/linux-mm/CAGsJ_4zjdOYEtuO6gNjABm7NDxW0skzBFNRNee-k2D6VwsYEQA@xxxxxxxxxxxxxx/ [1]
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 43c0f22f15d5..304d5a4704ec 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4817,8 +4817,13 @@ static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > *isolate_scanned = scanned;
> > break;
> > }
> > -
> > - type = !type;
> > + /*
> > + * If scanned > 0 and isolated == 0, avoid falling back to the
> > + * other type, as this type remains sufficient. Falling back
> > + * too readily can disrupt the positive_ctrl_err() bias.
> > + */
> > + if (!scanned)
> > + type = !type;
> > }
> >
> > return total_scanned;
> Could skipping this fallback cause a livelock when folios are evictable but
> un-isolatable?
No. The total scan budget in try_to_shrink_lruvec() bounds the loop and scanned
folios are moved to newer gen. And we used to skip it here just fine, this
commit just restored that behavior.
> For example, in constrained direct reclaim like GFP_NOIO, isolate_folio()
> can early-reject anonymous folios due to missing __GFP_IO and return false.
> When scan_folios() places these rejected folios into a temporary list and
> splices them back at the head of the oldest generation's list, this code
> path will see scanned > 0 but isolated == 0.
> Because the fallback is skipped, isolate_folios() scans the exact same LRU
> type on the next iteration. It encounters the identical un-isolatable
> folios at the head of the list, skips them again, and returns 0 isolated
> folios.
> When try_to_shrink_lruvec() decrements nr_to_scan and retries, no memory
> has been reclaimed, so the MGLRU statistics are unchanged.
> get_type_to_scan() will then select the exact same blocked LRU type again.
> Will this starve the alternate LRU type and potentially trigger premature
> OOMs?
No. Folios stuck in a non-reclaimable state (GFP_NOIO anon, writeback-waiting
file, etc.) is a pre-existing concern orthogonal to this change, and it also
applies to the fallback type
Falling back to another type is still done if we ran out of gens (scan
consumes cold gen). In following commit the writeback throttling we just
introduced improves the folio stuck in non-reclaimable state issue but
not related to this commit.
> I note this behavior is addressed later in the patch series by commit
> 8992ac501e543b77bf0d5e24175632b6eba8086c, which removes the redundant swap
> constraint check. That allows the folios to be isolated and promoted, emptying
> the oldest generation so scanned can become 0. However, does this patch
> introduce an intermediate bisection regression for the LRU fallback mechanism?
No, as said above, that's not related.