Re: [PATCH v4 09/21] mm: swap: allocate a virtual swap slot for each swapped out page
From: Peter Zijlstra
Date: Thu Mar 19 2026 - 17:03:51 EST
On Thu, Mar 19, 2026 at 11:37:19AM -0700, Nhat Pham wrote:
> On Thu, Mar 19, 2026 at 12:56 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Mar 18, 2026 at 03:29:40PM -0700, Nhat Pham wrote:
> > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> > > index 62cd7b35a29c9..85cb45022e796 100644
> > > --- a/include/linux/cpuhotplug.h
> > > +++ b/include/linux/cpuhotplug.h
> > > @@ -86,6 +86,7 @@ enum cpuhp_state {
> > > CPUHP_FS_BUFF_DEAD,
> > > CPUHP_PRINTK_DEAD,
> > > CPUHP_MM_MEMCQ_DEAD,
> > > + CPUHP_MM_VSWAP_DEAD,
> > > CPUHP_PERCPU_CNT_DEAD,
> > > CPUHP_RADIX_DEAD,
> > > CPUHP_PAGE_ALLOC,
> >
> > > +static int vswap_cpu_dead(unsigned int cpu)
> > > +{
> > > + struct vswap_cluster *cluster;
> > > + int order;
> > > +
> > > + rcu_read_lock();
> >
> > nit:
> > guard(rcu)();
> >
> > > + for (order = 0; order < SWAP_NR_ORDERS; order++) {
> > > + cluster = per_cpu(percpu_vswap_cluster.clusters[order], cpu);
> > > + if (cluster) {
> > > + per_cpu(percpu_vswap_cluster.clusters[order], cpu) = NULL;
> > > + spin_lock(&cluster->lock);
> >
> > This breaks on PREEMPT_RT as this is ran with IRQs disabled. This must
> > be a raw_spinlock_t.
> >
> > > + cluster->cached = false;
> > > + if (refcount_dec_and_test(&cluster->refcnt))
> > > + vswap_cluster_free(cluster);
> >
> > And this... below.
> >
> > > + spin_unlock(&cluster->lock);
> > > + }
> > > + }
> > > + rcu_read_unlock();
> > > +
> > > + return 0;
> > > +}
> >
> > > +static void vswap_cluster_free(struct vswap_cluster *cluster)
> > > +{
> > > + VM_WARN_ON(cluster->count || cluster->cached);
> > > + VM_WARN_ON(!spin_is_locked(&cluster->lock));
> >
> > This is terrible, please use:
> >
> > lockdep_assert_held(&cluster->lock);
> >
> > > + xa_lock(&vswap_cluster_map);
> >
> > This is again broken, this cannot be from a DEAD callback with IRQs
> > disabled.
> >
> > > + list_del_init(&cluster->list);
> > > + __xa_erase(&vswap_cluster_map, cluster->id);
> >
> > Strictly speaking this can end up in xas_alloc(), which is again, not
> > allowed in a DEAD callback.
>
> I see. I'll take a look at this. Thanks for pointing this out, Peter!
Oh, I think I might have confused DEAD and DYING here. DYING is the
tricky one, DEAD should be okay. Sorry about that.