Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
From: Ackerley Tng
Date: Wed Jun 03 2026 - 11:29:30 EST
Yan Zhao <yan.y.zhao@xxxxxxxxx> writes:
> On Tue, Jun 02, 2026 at 05:40:00PM -0700, Ackerley Tng wrote:
>> Yan Zhao <yan.y.zhao@xxxxxxxxx> writes:
>>
>> Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@xxxxxxxxxx/.
>>
>> > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote:
>> >> Add more context and information to the comment in kvm_gmem_release() that
>> >> explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b)
>> >> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
>> >> from slot->gmem.file")
>> >>
>> >> b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
>> >> occur after the memslot is no longer visible to kvm_gmem_get_pfn().
>> >>
>> >> is especially difficult to fully grok, particularly in light of commit
>> >> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
>> >> gmem is dying"), which addressed a race between unbind() and release().
>> > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on
>> > memslot deletion when gmem is dying"), unbind() and release() are mutually
>> > exclusive, i.e., both protected by slots_lock, mentioning "race" here is
>> > confusing. So, that commit addressed the mishandling in unbind() after
>> > kvm_gmem_get_file() returning NULL?
>> >
>>
>> I agree that the use of the word "race" here is odd since both unbind
>> and release() are mutually exclusive and protected by slots_lock.
>>
>> Looking at ae431059e75d again, there are really 3 changes:
>>
>> 1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting
>> bindings in the xarray to NULL and setting slot->gmem.file to NULL,
>> 2. Checking that if slot->gmem.file is NULL, skip unbinding
>> 3. Checking that if kvm_gmem_get_file() returns NULL, just unbind
>> without taking filemap_invalidate_lock().
>>
>> (2.) is explained in the comment
>>
>> /*
>> * Nothing to do if the underlying file was _already_ closed, as
>> * kvm_gmem_release() invalidates and nullifies all bindings.
>> */
>>
>> The real fix is in (3.), I think? Could you explain how
>> kvm_gmem_get_file() works? Why would the file be dying with the
>> slots_lock making the unbind and release mutually exclusive?
> Below is my understanding:
>
Thanks for explaining!
> kvm_gmem_unbind() is invoked under slots_lock.
> kvm_gmem_release() holds slots_lock before it accesses any slot and sets
> slot->gmem.file to NULL.
>
> When kvm_gmem_get_file() returns NULL, it could be
> 1) kvm_gmem_release() is to be invoked soon.
> 2) kvm_gmem_release() is being invoked, before holding slots_lock.
> 3) kvm_gmem_release() is being invoked, holding slots_lock.
> 4) kvm_gmem_release() is being invoked, after releasing slots_lock.
> 5) kvm_gmem_release() has been invoked and completed.
>
> So, when kvm_gmem_unbind() finds slot->gmem.file != NULL while
> kvm_gmem_get_file() returns NULL, it could only be 1) or 2). And at
To clarify, 3, 4, 5 are not possible because kvm_gmem_unbind() is called
with slots_lock held, right?
> that point,
> the remaining user of the slot->gmem.file should only be kvm_gmem_release().
>
> For both 1) and 2), kvm_gmem_unbind() and kvm_gmem_release() can't operate on
> f->bindings simultaneously. kvm_gmem_unbind() needs to remove the slot from
> slot->gmem.file's binding. Otherwise, after kvm_gmem_unbind() returns, the slot
> will be deleted and kvm_gmem_release() will later hold slots_lock and cause
> use-after-free.
Thanks, this makes sense. The purpose of using kvm_gmem_get_file() is
that if the file is _being_ closed, as stated in the comment in
kvm_gmem_unbind(), kvm_gmem_get_file() would find the refcount to be 0
and return NULL. (The missing part for me was that _being_ closed leads
to kvm_gmem_get_file() finding a 0 refcount.)
Annotating all of these:
> void kvm_gmem_unbind(struct kvm_memory_slot *slot)
> {
> /*
> * Nothing to do if the underlying file was _already_ closed, as
> * kvm_gmem_release() invalidates and nullifies all bindings.
> */
>
Above comment rephrased: !slot->gmem.file means kvm_gmem_release() had
already taken slots_lock, iterated all bindings and set slot->gmem.file
to NULL, and dropped slots_lock.
>
> if (!slot->gmem.file)
> return;
>
If we get here, we know for sure that slot->gmem.file is valid (not
freed) and can be dereferenced safely. We know this because
kvm_gmem_release() is the place that frees slot->gmem.file, and if there
was any chance of slot->gmem.file being freed, slot->gmem.file would
have been NULL-ed first.
Hence we can also safely proceed to do
filemap_invalidate_lock(slot->gmem.file->f_mapping), but as described in
ae431059e75d, file->f_mapping is not under KVM's purview. A future
change could NULL out file->f_mapping before .release() is called, for
example, and break the assumption that gmem.file->f_mapping is still
valid. Hence, rely on taking a reference on gmem.file from slot instead,
which is within KVM's control.
> CLASS(gmem_get_file, file)(slot);
>
If we do get the reference, we're good and we can take
filemap_invalidate_lock() and then unbind.
> /*
> * However, if the file is _being_ closed, then the bindings need to be
If we don't get the reference, then the file is _being_ closed, so the
file's refcount is already 0, but kvm_gmem_release() hasn't yet been
called.
Later, when kvm_gmem_release() is called, it will iterate bindings and
then dereference slot. Hence, kvm_gmem_release() must not see the slot -
the slot must be removed from bindings. Hence we must unbind.
Unbinding without taking filemap_invalidate_lock() is fine since if
!file, we're sure there's only one last user of file -
kvm_gmem_release(), and kvm_gmem_release() is locked out by slots_lock.
> * removed as kvm_gmem_release() might not run until after the memslot
> * is freed. Note, modifying the bindings is safe even though the file
> * is dying as kvm_gmem_release() nullifies slot->gmem.file under
> * slots_lock, and only puts its reference to KVM after destroying all
> * bindings. I.e. reaching this point means kvm_gmem_release() hasn't
> * yet destroyed the bindings or freed the gmem_file, and can't do so
> * until the caller drops slots_lock.
> */
> if (!file) {
> __kvm_gmem_unbind(slot, slot->gmem.file->private_data);
> return;
> }
>
If we did get the reference, this is the most straightforward case. Take
the filemap_invalidate_lock(), unbind.
> filemap_invalidate_lock(file->f_mapping);
> __kvm_gmem_unbind(slot, file->private_data);
> filemap_invalidate_unlock(file->f_mapping);
> }