Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users

From: Yan Zhao
Date: Tue Aug 08 2023 - 21:29:56 EST


On Mon, Aug 07, 2023 at 10:19:07AM -0700, Sean Christopherson wrote:
> On Mon, Aug 07, 2023, Like Xu wrote:
> > On 23/12/2022 8:57 am, Sean Christopherson wrote:
> > > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > > + const u8 *new, int bytes)
> > > +{
> > > + __kvm_page_track_write(vcpu, gpa, new, bytes);
> > > +
> > > + kvm_mmu_track_write(vcpu, gpa, new, bytes);
> > > +}
> >
> > The kvm_mmu_track_write() is only used for x86, where the incoming parameter
> > "u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
> > emulated page table writes"), please help confirm if it's still needed ? Thanks.
> > A minor clean up is proposed.
>
> Hmm, unless I'm misreading things, KVMGT ultimately doesn't consume @new either.
> So I think we can remove @new from kvm_page_track_write() entirely.
Sorry for the late reply.
Yes, KVMGT does not consume @new and it reads the guest PTE again in the
page track write handler.

But I have a couple of questions related to the memtioned commit as
below:

(1) If "re-reading the current value of the guest PTE after the MMU lock has
been acquired", then should KVMGT also acquire the MMU lock too?
If so, could we move the MMU lock and unlock into kvm_page_track_write()
as it's common.

(2) Even if KVMGT consumes @new,
will kvm_page_track_write() be called for once or twice if there are two
concurent emulated write?


commit 0e0fee5c539b61fdd098332e0e2cc375d9073706
Author: Junaid Shahid <junaids@xxxxxxxxxx>
Date: Wed Oct 31 14:53:57 2018 -0700

kvm: mmu: Fix race in emulated page table writes

When a guest page table is updated via an emulated write,
kvm_mmu_pte_write() is called to update the shadow PTE using the just
written guest PTE value. But if two emulated guest PTE writes happened
concurrently, it is possible that the guest PTE and the shadow PTE end
up being out of sync. Emulated writes do not mark the shadow page as
unsync-ed, so this inconsistency will not be resolved even by a guest TLB
flush (unless the page was marked as unsync-ed at some other point).

This is fixed by re-reading the current value of the guest PTE after the
MMU lock has been acquired instead of just using the value that was
written prior to calling kvm_mmu_pte_write().