Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg

From: Yosry Ahmed

Date: Thu Jun 04 2026 - 12:24:35 EST


On Thu, Jun 4, 2026 at 6:06 AM Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:
>
>
>
> On 2026/6/4 13:34, Yosry Ahmed wrote:
> >>>> For instance, suppose a parent memcg has two children, memcg1 and memcg2,
> >>>> each with 200MB of zswap (100MB inactive). Triggering proactive writeback on
> >>>> the parent memcg will exhaust memcg1's inactive zswap pages. After that,
> >>>> even though memcg2 still has plenty of inactive zswap pages, it will
> >>>> continue to write back memcg1's active zswap pages. Writing back active
> >>>> zswap pages causes the user-space agent to prematurely abort the writeback
> >>>> because it detects that certain memcg metrics have exceeded predefined
> >>>> thresholds.
> >>>
> >>> This will only happen if the reclaim size is smaller than the batch
> >>> size, right? Otherwise the kernel should reclaim more or less equally
> >>> from both memcgs?
> >>>
> >>
> >> I gave it some thought. Not using a cursor could lead to unfairness
> >> issues with certain writeback sizes:
> >>
> >> - If the writeback size is an odd multiple of WB_BATCH (e.g.,
> >> triggering a writeback of 3 * WB_BATCH), with 2 child cgroups, the
> >> writeback ratio might end up being 2:1.
> >> - If a memcg has 5 child cgroups and a writeback of 2 * WB_BATCH is
> >> triggered, it might repeatedly write back from only the first 2 child
> >> cgroups.
> >>
> >> Although setting a smaller WB_BATCH might mitigate this unfairness, it
> >> could hurt writeback efficiency. Let's just use per-memcg cursors to
> >> completely fix these corner cases.
> >
> > Exactly, the batch size should be small enough that any unfairness is
> > not a problem. I would honestly just do batching without a per-memcg
> > cursor, unless we have numbers to prove that the efficiency is
> > affected when we use a small batch size. Let's only introduce
> > complexity when needed please.
>
>
> If you prefer not to use per-cgroup cursors, do we still need to keep
> the global cursor (i.e., the root cgroup's cursor) zswap_next_shrink?
> I found this part to be quite tricky when trying to reuse the main logic
> of shrink_worker() in zswap_proactive_writeback().
>
> Of course, I think we could also keep zswap_next_shrink and write a
> small helper to check if it's the root cgroup, allowing us to use
> different memcg iteration methods.

I think we want to keep the global cursor, at least for now.