Re: [RFC PATCH 0/5] mm, swap: Virtual Swap Space (Swap Table Edition)
From: Kairui Song
Date: Mon Jun 01 2026 - 13:52:19 EST
On Tue, Jun 2, 2026 at 12:22 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Mon, Jun 1, 2026 at 8:56 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 1, 2026 at 12:34 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> > >
> > > On Thu, May 28, 2026 at 02:29:24PM +0800, Nhat Pham wrote:
> > > > III. Follow-ups:
> > > >
> > > > In no particular order (and most of which can be done as follow-up
> > > > patch series rather than shoving everything in the initial landing):
> > > >
> > > > - More thorough stress testing is very much needed.
> > > >
> > > > - Performance benchmarks to make sure I don't accidentally regress
> > > > the vswap-less case, and that the vswap's case performance is
> > > > good. I suspect I will have to port a lot of the
> > > > optimizations I implemented in v6 over here - some of the
> > > > inefficiencies are inherent in any swap virtualization, and
> > > > would require the same fix (for e.g the MRU cluster caching
> > > > for faster cluster lookup - see [8] and [9]).
> > >
> > > This could be imporved by per-si percpu cluster. Both YoungJun's
> > > tiering and Baoquan's previous swap ops mentioned this is needed,
> > > and now vswap also need that. If the vswap is also a si, then it will
> > > make use of this too.
>
> Oh and the MRU cluster caching I mentioned here is not the allocation
> caching. It's the lookup caching, basically to avoid doing the
> xa_load() to look up clusters for consecutive swap operations on the
> same vswap cluster (which is the common case with vswap). For v6, it
> massively reduces this indirection lookup overhead. Performance-wise
> it's an absolute winner, just more complexity (because I need to
> handle reference counting carefully).
Ah alright, that's interesting. And I think we can keep things simple
to start, since sensitive users is stil able tol use plain device this
way.
BTW maintaining MRU is also an overhead, I'm not sure if the lookup
pattern always follows that?
> I also just realized we'll induce the indirection overhead on
> allocation here too, even if the cached cluster still have slots for
> allocation, because we look up the cluster (which is basically free
> for static swap device, but not free for vswap devices). Might need to
> take care of that to maintain vswap performance (but it will then
> diverge from your existing code...).
That part should be indeed coverable by the si->percpu cluster though, I think.