Re: [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling

From: Kairui Song

Date: Sun Mar 22 2026 - 12:22:57 EST

On Sat, Mar 21, 2026 at 5:24 AM Axel Rasmussen <axelrasmussen@xxxxxxxxxx> wrote:
>
> On Tue, Mar 17, 2026 at 12:11 PM Kairui Song via B4 Relay
> <devnull+kasong.tencent.com@xxxxxxxxxx> wrote:
> >
> > From: Kairui Song <kasong@xxxxxxxxxxx>
> >
> > The current handling of dirty writeback folios is not working well for
> > file page heavy workloads: Dirty folios are protected and move to next
> > gen upon isolation of getting throttled or reactivated upon pageout
> > (shrink_folio_list).
> >
> > This might help to reduce the LRU lock contention slightly, but as a
> > result, the ping-pong effect of folios between head and tail of last two
> > gens is serious as the shrinker will run into protected dirty writeback
> > folios more frequently compared to activation. The dirty flush wakeup
> > condition is also much more passive compared to active/inactive LRU.
> > Active / inactve LRU wakes the flusher if one batch of folios passed to
> > shrink_folio_list is unevictable due to under writeback, but MGLRU
> > instead has to check this after the whole reclaim loop is done, and then
> > count the isolation protection number compared to the total reclaim
> > number.
> >
> > And we previously saw OOM problems with it, too, which were fixed but
> > still not perfect [1].
> >
> > So instead, just drop the special handling for dirty writeback, just
> > re-activate it like active / inactive LRU. And also move the dirty flush
> > wake up check right after shrink_folio_list. This should improve both
> > throttling and performance.
> >
> > Test with YCSB workloadb showed a major performance improvement:
> >
> > Before this series:
> > Throughput(ops/sec): 61642.78008938203
> > AverageLatency(us): 507.11127774145166
> > pgpgin 158190589
> > pgpgout 5880616
> > workingset_refault 7262988
> >
> > After this commit:
> > Throughput(ops/sec): 80216.04855744806 (+30.1%, higher is better)
> > AverageLatency(us): 388.17633477268913 (-23.5%, lower is better)
> > pgpgin 101871227 (-35.6%, lower is better)
> > pgpgout 5770028
> > workingset_refault 3418186 (-52.9%, lower is better)
> >
> > The refault rate is 50% lower, and throughput is 30% higher, which is a
> > huge gain. We also observed significant performance gain for other
> > real-world workloads.
> >
> > We were concerned that the dirty flush could cause more wear for SSD:
> > that should not be the problem here, since the wakeup condition is when
> > the dirty folios have been pushed to the tail of LRU, which indicates
> > that memory pressure is so high that writeback is blocking the workload
> > already.
>
> This looks reasonable to me overall. I unfortunately don't have a fast
> way of reproducing the results under production workloads. At least
> under basic functional testing, this seems to work as advertised.
>
> Besides one small clean-up:
>
> Reviewed-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
>
> >
> > Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
> > Link: https://lore.kernel.org/linux-mm/20241026115714.1437435-1-jingxiangzeng.cas@xxxxxxxxx/ [1]
> > ---
> > mm/vmscan.c | 44 +++++++++++++-------------------------------
> > 1 file changed, 13 insertions(+), 31 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b26959d90850..e11d0f1a8b68 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4577,7 +4577,6 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
> > int tier_idx)
> > {
> > bool success;
> > - bool dirty, writeback;
> > int gen = folio_lru_gen(folio);
> > int type = folio_is_file_lru(folio);
> > int zone = folio_zonenum(folio);
> > @@ -4627,21 +4626,6 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
> > return true;
> > }
> >
> > - dirty = folio_test_dirty(folio);
> > - writeback = folio_test_writeback(folio);
> > - if (type == LRU_GEN_FILE && dirty) {
> > - sc->nr.file_taken += delta;
> > - if (!writeback)
> > - sc->nr.unqueued_dirty += delta;
>
> A grep says that after this commit, nobody is left *reading* from
> `unqueued_dirty`, so can we remove that field and the couple of
> remaining places that modify it?
>
> In `struct scan_control` I mean, we do still use this field in `struct
> reclaim_stat`.
>

Thanks for the review! Yeah, I cleaned one of the unused variables in
the next patch, but there is still one :)