Re: [PATCH v2 3/3] kvm/arm: Align the VMID allocation with the arm64 ASID one

From: Will Deacon
Date: Thu Jul 22 2021 - 05:50:26 EST


On Thu, Jul 22, 2021 at 06:45:14AM +0000, Shameerali Kolothum Thodi wrote:
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > index 83dc3b271bc5..42df9931ed9a 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> > > @@ -140,10 +140,10 @@ void __kvm_flush_cpu_context(struct
> > kvm_s2_mmu *mmu)
> > > __tlb_switch_to_host(&cxt);
> > > }
> > >
> > > -void __kvm_flush_vm_context(void)
> > > +void __kvm_tlb_flush_local_all(void)
> > > {
> > > - dsb(ishst);
> > > - __tlbi(alle1is);
> > > + dsb(nshst);
> > > + __tlbi(alle1);
> > >
> > > /*
> > > * VIPT and PIPT caches are not affected by VMID, so no maintenance
> > > @@ -155,7 +155,7 @@ void __kvm_flush_vm_context(void)
> > > *
> > > */
> > > if (icache_is_vpipt())
> > > - asm volatile("ic ialluis");
> > > + asm volatile("ic iallu" : : );
> > >
> > > - dsb(ish);
> > > + dsb(nsh);
> >
> > Hmm, I'm wondering whether having this local stuff really makes sense for
> > VMIDs. For ASIDs, where rollover can be frequent and TLBI could result in
> > IPI on 32-bit, the local option was important, but here rollover is less
> > frequent, DVM is relied upon to work and the cost of a hypercall is
> > significant with nVHE.
> >
> > So I do think you could simplify patch 2 slightly to drop the
> > flush_pending and just issue inner-shareable invalidation on rollover.
> > With that, it might also make it straightforward to clear active_asids
> > when scheduling out a vCPU, which would solve the other problem I
> > mentioned
> > about unnecessarily reserving a bunch of the VMID space.
>
> Ok. I will try out the above suggestion. Hope it will be acceptable for 8 bit
> VMID systems as well as there is a higher chance for rollover especially
> when we introduce pinned VMIDs(I am not sure such platforms care about
> Pinned VMID or not. If not, we could limit Pinned VMIDs to 16 bit systems).

So I woke up at 3am in a cold sweat after dreaming about this code.

I think my suggestion above still stands for the VMID allocator, but
interestingly, it would _not_ be valid for the ASID allocator because there
the ASID is part of the active context and so, during the window where the
active_asid is out of sync with the TTBR, receiving a broadcast TLBI from a
concurrent rollover wouldn't be enough to knock out the old ASID from the
TLB despite it subsequently being made available for reallocation. So the
local TLB invalidation is not just a performance hint as I said; it's crucial
to the way the thing works (and this is also why the CnP code has to switch
to the reserved TTBR0).

As an aside: I'm more and more inclined to rip out the CnP stuff given
that it doesn't appear to being any benefits, but does have some clear
downsides. Perhaps something for next week.

Will