Re: [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path

From: Suren Baghdasaryan

Date: Wed May 13 2026 - 21:44:54 EST

On Tue, May 5, 2026 at 9:39 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/4/26 16:15, Edgecombe, Rick P wrote:
> >
> > I guess the problem is the lock ordering. Not sure if there is any slow path
> > avoidance details that could make this splat a false positive. But how about
> > this simpler munmap() case:
> >
> > Shadow stack signal munmap()
> > ------------------- --------
> > vma_start_read() (VM_SHADOW_STACK check)
> > mmap_write_lock()
> > mmap_read_lock() (user fault) <- deadlock
> > vma_start_write() <-deadlock
>
> It's a little more complicated than that in practice, but I think you're
> right.
>
> I'm not sure when this would happen in practice because the fault is
> actually on the VMA that's being held for read. So I think another
> writer would have had to sneak in there and zap the VMA.

Not necessarily to zap the VMA but to modify it, which requires
write-locking it (split, merge, remap, etc). Note that write-locking a
VMA always involves mmap_write_lock() followed by vma_start_write(),
IOW we end up write-locking both mmap and per-VMA locks in that order.

The lock ordering when we take both mmap and per-VMA locks is to
always take mmap_lock first, followed by the per-VMA lock. With this
change we read-lock the VMA first and then call
get_shstk_data()->get_user()->might_fault()->might_lock_read(mm->mmap_lock).
So, we take per-VMA read-lock and then mmap_read_lock and that's the
problem.

>
> The funny thing is that the fault handler is really just trying to find
> the VMA. The thing causing the fault *has* the VMA. So it's as simple as
> just passing the VMA down into the fault handler, right? How hard could
> it be? ;)
>
> There are still games to play, but they all involve dropping locks and
> retrying, like:
>
> retry:
> vma = lock_vma_under_rcu()
> // muck with VMA
> pagefault_disable() // avoid deadlock
> ret = copy_from_user()
> pagefault_enable()
> vma_end_read();
>
> if (!ret)
> return SUCCESS;
>
> mmap_read_lock()
> vma = vma_lookup()
> mmap_read_unlock() // avoid deadlock before touching userspace
> // check for valid VMA to avoid looping when there is no VMA
> if (!vma)
> return -ERRNO;
>
> // uh oh, slow path, something faulted
> get_user_pages()??
> //or
> copy_from_user() without the VMA??
>
> goto retry;
>
>
> This also needs some very careful thought, but something like this
> should work, where we avoid fault handling (and lock taking) in the
> actual #PF and do it in a context where the VMA lock is held:
>
> vma = lock_vma_under_rcu();
> pagefault_disable() // avoid deadlock
>
> while (1) {
> ret = copy_from_user()
> if (!ret)
> break;
> handle_mm_fault(vma, address, FAULT_FLAG_VMA_LOCK...);
> };
>
> pagefault_enable()
> vma_end_read();
>
> That's effectively just short-circuiting the #PF code which does the:
>
> vma = lock_vma_under_rcu(mm, address);
> ...
> fault = handle_mm_fault(vma, address, ... FAULT_FLAG_VMA_LOCK)
>
> sequence _itself_.

IMHO that looks more complex than the original speculation+retry solution :)