Re: [PATCH v2 01/20] locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()

From: David Woodhouse

Date: Mon Jun 01 2026 - 06:52:46 EST


On Sat, 2026-05-30 at 16:40 +0200, Paolo Bonzini wrote:
> On Sat, May 30, 2026 at 3:04 PM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> >
> > On Sat, 2026-05-30 at 12:26 +0200, Paolo Bonzini wrote:
> > >
> > > Yeah, I think so.
> > >
> > > The write side needs kvm->srcu so it would have to be yet another SRCU.
> > > I initially thought that sucks for the code that calls kvm_gpc_check(),
> > > but maybe not because it simply replaces read_lock/read_unlock.
> > >
> > > By using a seqcount for the data, SRCU only needs to be synchronized in
> > > gpc_unmap().  So, something like this:
> >
> > It isn't just gpc_unmap() which does the invalidation. We also
> > invalidate from the MMU notifier in gfn_to_pfn_cache_invalidate_start()
> > which would also have to synchronize, wouldn't it?
>
> You're right, the write_lock_irq() there drains the readers and that
> is needed because khva is not pinned, only kmap()-ed.
>
> That is already broken for the OOM case under PREEMPT_RT, where
> rwlock_t becomes sleepable. But using SRCU would break it on
> !PREEMPT_RT as well.

I don't think 'sleepable' is the problem per se, is it? *Why* does the
OOM killer use mmu_notifier_invalidate_range_start_nonblock()?

Commit 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu
notifiers") did say:

There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.

But that was in 2018, when mmap_lock was an rw_semaphore.

Is "sleepable" still a problem even when PREEMPT_RT where almost
*everything* is now strictly sleepable? Wouldn't that mean drivers
aren't even allowed to take their own spinlocks?

I think the *real* constraint in the OOM path is that it mustn't block
on anything which might be waiting for memory allocation. So waiting on
an actual mutex is bad... but waiting for an rwlock which PREEMPT_RT
just happens to have made sleepable... should be OK?

And waiting for the in-guest CPU to respond to the IPI in Fred's patch
should actually be OK too, but then so would returning -EAGAIN if any
vCPUs really did need kicking.

Attachment: smime.p7s
Description: S/MIME cryptographic signature