Re: [PATCH 08/16] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults

From: Sean Christopherson
Date: Mon Mar 04 2024 - 11:00:17 EST


On Fri, Mar 01, 2024, Kai Huang wrote:
> On 1/03/2024 12:06 pm, Sean Christopherson wrote:
> > E.g. in this case, KVM will just skip various fast paths because of the RSVD flag,
> > and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track()
> > would skip write tracking, which could theoretically cause data corruption, so I
> > guess arguably it would be safer to bail?
> >
> > Anyone else have an opinion? This type of bug should never escape development,
> > so I'm a-ok effectively killing the VM. Unless someone has a good argument for
> > continuing on, I'll go with Kai's suggestion and squash this:
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index cedacb1b89c5..d796a162b2da 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> > error_code |= PFERR_PRIVATE_ACCESS;
> > r = RET_PF_INVALID;
> > - if (unlikely((error_code & PFERR_RSVD_MASK) &&
> > - !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) {
> > + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> > + if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
> > + return -EFAULT;
>
> -EFAULT is part of guest_memfd() memory fault ABI. I didn't think over this
> thoroughly but do you want to return -EFAULT here?

Yes, I/we do. There are many existing paths that can return -EFAULT from KVM_RUN
without setting run->exit_reason to KVM_EXIT_MEMORY_FAULT. Userspace is responsible
for checking run->exit_reason on -EFAULT (and -EHWPOISON), i.e. must be prepared
to handle a "bare" -EFAULT, where for all intents and purposes "handle" means
"terminate the guest".

That's actually one of the reasons why KVM_EXIT_MEMORY_FAULT exists, it'd require
an absurd amount of work and churn in KVM to *safely* return useful information
on *all* -EFAULTs. FWIW, I had hopes and dreams of actually doing exactly this,
but have long since abandoned those dreams.

In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that
(a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that
KVM is in a state where fixing the underlying problem and resuming the guest is
safe, e.g. won't corrupt the guest (because KVM is in a half-baked state).