Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

From: Florent Revest
Date: Tue Sep 26 2017 - 17:14:55 EST


On Thu, 2017-08-31 at 11:23 +0200, Christoffer Dall wrote:
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index 2ea21da..1d2d3df 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> > @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm
> > *kvm,
> > ÂÂÂÂÂÂÂÂphys_addr_t size = PAGE_SIZE * memslot->npages;
> > ÂÂÂÂÂÂÂÂhva_t reg_end = hva + size;
> >
> > +ÂÂÂÂÂÂÂif (unlikely(!kvm->mm)) {
> I think you should consider using a predicate so that it's clear that
> this is for in-kernel VMs and not just some random situation where mm
> can be NULL.

Internal VMs should be the only usage when kvm->mm would be NULL.
However if you'd prefer it otherwise, I'll make sure this condition
will be made clearer.

> So it's unclear to me why we don't need any special casing in
> kvm_handle_guest_abort, related to MMIO exits etc.ÂÂYou probably
> assume that we will never do emulation, but that should be described
> and addressed somewhere before I can critically review this patch.

This is indeed what I was assuming. This RFC does not allow MMIO with
internal VMs. I can not think of a usage when this would be useful. I'd
make sure this would be documented in an eventual later RFC.

> > +static int internal_vm_prep_mem(struct kvm *kvm,
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂconst struct
> > kvm_userspace_memory_region *mem)
> > +{
> > +ÂÂÂÂÂÂÂphys_addr_t addr, end;
> > +ÂÂÂÂÂÂÂunsigned long pfn;
> > +ÂÂÂÂÂÂÂint ret;
> > +ÂÂÂÂÂÂÂstruct kvm_mmu_memory_cache cache = { 0 };
> > +
> > +ÂÂÂÂÂÂÂend = mem->guest_phys_addr + mem->memory_size;
> > +ÂÂÂÂÂÂÂpfn = __phys_to_pfn(mem->guest_phys_addr);
> > +ÂÂÂÂÂÂÂaddr = mem->guest_phys_addr;
> My main concern here is that we don't do any checks on this region
> and we could be mapping device memory here as well.ÂÂAre we intending
> that to be ok, and are we then relying on the guest to use proper
> memory attributes ?

Indeed, being able to map device memory is intended. It is needed for
Runtime Services sandboxing. It also relies on the guest being
correctly configured.

> > +
> > +ÂÂÂÂÂÂÂfor (; addr < end; addr += PAGE_SIZE) {
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpte_t pte = pfn_pte(pfn, PAGE_S2);
> > +
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpte = kvm_s2pte_mkwrite(pte);
> > +
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = mmu_topup_memory_cache(&cache,
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂKVM_MMU_CACHE_MIN_PAGE
> > S,
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂKVM_NR_MEM_OBJS);
> You should be able to allocate all you need up front instead of doing
> it in sequences.

Ok.

> >
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (ret) {
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmmu_free_memory_cache(&cache);
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn ret;
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ}
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_lock(&kvm->mmu_lock);
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂspin_unlock(&kvm->mmu_lock);
> Since you're likely to allocate some large contiguous chunks here,
> can you have a look at using section mappings?

Will do.

Thank you very much,
  Florent