Re: [PATCH v7 5/8] KVM: x86/mmu: Don't pass FOLL_GET to __kvm_follow_pfn

From: Sean Christopherson
Date: Thu Aug 31 2023 - 17:18:24 EST


On Fri, Aug 25, 2023, David Stevens wrote:
> On Fri, Aug 25, 2023 at 12:15 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Thu, Aug 24, 2023, David Stevens wrote:
> > > On Wed, Jul 5, 2023 at 7:25 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Jul 04, 2023 at 04:50:50PM +0900, David Stevens wrote:
> > > > > @@ -4529,7 +4540,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
> > > > >
> > > > > out_unlock:
> > > > > read_unlock(&vcpu->kvm->mmu_lock);
> > > > > - kvm_release_pfn_clean(fault->pfn);
> > > >
> > > > Yet kvm_release_pfn() can still be triggered for the kvm_vcpu_maped gfns.
> > > > What if guest uses a non-referenced page(e.g., as a vmcs12)? Although I
> > > > believe this is not gonna happen in real world...
> > >
> > > kvm_vcpu_map still uses gfn_to_pfn, which eventually passes FOLL_GET
> > > to __kvm_follow_pfn. So if a guest tries to use a non-refcounted page
> > > like that, then kvm_vcpu_map will fail and the guest will probably
> > > crash. It won't trigger any bugs in the host, though.
> > >
> > > It is unfortunate that the guest will be able to use certain types of
> > > memory for some purposes but not for others. However, while it is
> > > theoretically fixable, it's an unreasonable amount of work for
> > > something that, as you say, nobody really cares about in practice [1].
> > >
> > > [1] https://lore.kernel.org/all/ZBEEQtmtNPaEqU1i@xxxxxxxxxx/
> >
> > There are use cases that care, which is why I suggested allow_unsafe_kmap.
> > Specifically, AWS manages their pool of guest memory in userspace and maps it all
> > via /dev/mem. Without that module param to let userspace opt-in, this series will
> > break such setups. It still arguably is a breaking change since it requires
> > userspace to opt-in, but allowing such behavior by default is simply not a viable
> > option, and I don't have much sympathy since so much of this mess has its origins
> > in commit e45adf665a53 ("KVM: Introduce a new guest mapping API").
> >
> > The use cases that no one cares about (AFAIK) is allowing _untrusted_ userspace
> > to back guest RAM with arbitrary memory. In other words, I want KVM to allow
> > (by default) mapping device memory into the guest for things like vGPUs, without
> > having to do the massive and invasive overhaul needed to safely allow backing guest
> > RAM with completely arbitrary memory.
>
> Do you specifically want the allow_unsafe_kmap breaking change? v7 of
> this series should have supported everything that is currently
> supported by KVM, but you're right that the v8 version of
> hva_to_pfn_remapped doesn't support mapping
> !kvm_pfn_to_refcounted_page() pages. That could be supported
> explicitly with allow_unsafe_kmap as you suggested,

I think it needs to be explicit, i.e. needs the admin to opt-in to the unsafe
behavior.