Re: [PATCH v2] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing

From: Xu Yilun
Date: Fri Jan 19 2024 - 00:54:01 EST


On Thu, Jan 18, 2024 at 09:22:37AM -0800, Sean Christopherson wrote:
> On Fri, Jan 19, 2024, Xu Yilun wrote:
> > On Tue, Jan 09, 2024 at 05:20:45PM -0800, Sean Christopherson wrote:
> > > Retry page faults without acquiring mmu_lock if the resolved gfn is covered
> > > by an active invalidation. Contending for mmu_lock is especially
> > > problematic on preemptible kernels as the mmu_notifier invalidation task
> > > will yield mmu_lock (see rwlock_needbreak()), delay the in-progress
> >
> > Is it possible fault-in task avoids contending mmu_lock by using _trylock()?
> > Like:
> >
> > while (!read_trylock(&vcpu->kvm->mmu_lock))
> > cpu_relax();
> >
> > if (is_page_fault_stale(vcpu, fault))
> > goto out_unlock;
> >
> > r = kvm_tdp_mmu_map(vcpu, fault);
> >
> > out_unlock:
> > read_unlock(&vcpu->kvm->mmu_lock)
>
> It's definitely possible, but the downsides far outweigh any potential benefits.
>
> Doing trylock means the CPU is NOT put into the queue for acquiring the lock,
> which means that forward progress isn't guaranteed. E.g. in a pathological
> scenario (and by "pathological", I mean if NUMA balancing or KSM is active ;-)),
> it's entirely possible for a near-endless stream of mmu_lock writers to be in
> the queue, thus preventing the vCPU from acquiring mmu_lock in a timely manner.

Ah yes, I forgot the main purpose of yielding is to let vCPU make
forward progress when the fault-in page is not covered by the invalidation.

Thanks,
Yilun

>
> And hacking the page fault path to bypass KVM's lock contention detection would
> be a very willful, deliberate violation of the intent of the MMU's yielding logic
> for preemptible kernels.
>
> That said, I would love to revisit KVM's use of rwlock_needbreak(), at least in
> the TDP MMU. As evidenced by this rash of issues, it's not at all obvious that
> yielding on mmu_lock contention is *ever* a net positive for KVM, or even for the
> kernel. The shadow MMU is probably a different story since it can't parallelize
> page faults with other operations, e.g. yielding in kvm_zap_obsolete_pages() to
> allow vCPUs to make forward progress is probably a net positive.
>
> But AFAIK, no one has done any testing to prove that yielding on contention in
> the TDP MMU is actually a good thing. I'm 99% certain the only reason the TDP
> MMU yields on contention is because the shadow MMU yields on contention, i.e.
> I'm confident that no one ever did performance testing to shadow that there is
> any benefit whatsoever to yielding mmu_lock in the TDP MMU.


>