Re: [PATCH v2] KVM: x86: Use gfn_to_pfn_cache for record_steal_time

From: David Woodhouse

Date: Tue Mar 17 2026 - 19:00:38 EST


On Thu, 2026-03-12 at 17:17 -0700, Sean Christopherson wrote:
>
> I'm pretty sure this is going to make PROVE_LOCKING unhappy due to PREEMPT_RT
> making rwlock_t sleepable (when called from kvm_sched_out()).  I've been content
> to ignore the kvm_xen_set_evtchn_fast() warning[*] because I can't imagine anyone
> is crazy enough to emulate Xen with an RT kernel, but I do know there are RT users
> that run VMs, and so this path would be more than just a PROVE_LOCKING issue.
>
> If we want to push the gpc stuff broadly, we need a solution to that (though I'm
> still not 100% convinced using a gpc here is a net positive).
>
> [*] https://lore.kernel.org/all/673f4bbc.050a0220.3c9d61.0174.GAE@xxxxxxxxxx

I set up a test case, and you're quite right:

[ 478.182893] [ BUG: Invalid wait context ]
[ 478.182895] 7.0.0-rc3+ #2 Tainted: G OE
[ 478.182896] -----------------------------
[ 478.182897] steal_time/5227 is trying to lock:
[ 478.182898] ff2c7aa6637b8cf8 (&gpc->lock){++++}-{3:3}, at: kvm_arch_vcpu_put+0xe8/0x240 [kvm]
[ 478.182982] other info that might help us debug this:
[ 478.182982] context-{5:5}
[ 478.182983] 3 locks held by steal_time/5227:
[ 478.182984] #0: ff2c7aa6637b80a0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xe4/0x950 [kvm]
[ 478.183026] #1: ff2c7ac4fe0367a0 (&rq->__lock){-...}-{2:2}, at: raw_spin_rq_lock_nested+0x24/0xb0
[ 478.183034] #2: ff2c7aa67e96a4e0 (&kvm->srcu){.+.+}-{0:0}, at: kvm_arch_vcpu_put+0x7d/0x240 [kvm]

It appears to be fixable just by making it use read_trylock(). In the
PREEMPT_RT case that appears to go through rt_read_trylock() →
rwbase_read_trylock(), which does an atomic cmpxchg on the reader count
and returns immediately if it can't get it. And, crucially, doesn't
seem to whine about it.

There's absolutely no need for the irqsave in this case; this lock is
never obtained from interrupt context anyway.

I think that just using read_trylock() will work for the
kvm_xen_set_evtchn_fast() case too; will look at that in the morning.

@@ -5244,20 +5244,33 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
if (unlikely(current->mm != vcpu->kvm->mm))
return;

- read_lock_irqsave(&gpc->lock, flags);
+ /*
+ * Use a trylock as this is called from the scheduler path (via
+ * kvm_sched_out), where rwlock_t is not safe on PREEMPT_RT (it
+ * becomes sleepable). Setting preempted is best-effort anyway;
+ * the old HVA-based code used copy_to_user_nofault() which could
+ * also silently fail.
+ *
+ * Since we only trylock and bail on failure, there is no risk of
+ * deadlock with an interrupt handler, so no need to disable
+ * interrupts.
+ */
+ if (!read_trylock(&gpc->lock))
+ return;
+
if (!kvm_gpc_check(gpc, sizeof(*st)))
goto out_unlock_gpc;


Attachment: smime.p7s
Description: S/MIME cryptographic signature