Re: [RFC PATCH 0/9] kvm: implement atomic memslot updates

From: Maxim Levitsky
Date: Thu Sep 29 2022 - 11:41:41 EST


On Thu, 2022-09-29 at 17:28 +0200, Paolo Bonzini wrote:
> On 9/28/22 22:41, Sean Christopherson wrote:
> > On Wed, Sep 28, 2022, Paolo Bonzini wrote:
> > > On 9/28/22 17:58, Sean Christopherson wrote:
> > > > I don't disagree that the memslots API is lacking, but IMO that is somewhat
> > > > orthogonal to fixing KVM x86's "code fetch to MMIO" mess. Such a massive new API
> > > > should be viewed and prioritized as a new feature, not as a bug fix, e.g. I'd
> > > > like to have the luxury of being able to explore ideas beyond "let userspace
> > > > batch memslot updates", and I really don't want to feel pressured to get this
> > > > code reviewed and merge.
> > >
> > > I absolutely agree that this is not a bugfix. Most new features for KVM can
> > > be seen as bug fixes if you squint hard enough, but they're still features.
> >
> > I guess I'm complaining that there isn't sufficient justification for this new
> > feature. The cover letter provides a bug that would be fixed by having batched
> > updates, but as above, that's really due to deficiencies in a different KVM ABI.
>
> I disagree. Failure to fetch should be fixed but is otherwise a red
> herring. It's the whole memslot API (including dirty logging) that is a
> mess.
>
> If you think we should overhaul it even more than just providing atomic
> batched updates, that's fine. But still, the impossibility to perform
> atomic updates in batches *is* a suboptimal part of the KVM API.
>
> > - Why can't this be solved in userspace?
>
> I don't think *can't* is the right word. If the metric of choice was
> "what can be solved in userspace", we'd all be using microkernels. The
> question is why userspace would be a better place to solve it.
>
> The only reason to do it in userspace would be if failure to fetch is
> something that is interesting to userspace, other than between two
> KVM_SET_USER_MEMORY_REGION. Unless you provide an API to pass failures
> to fetch down to userspace, the locking in userspace is going to be
> inferior, because it would have to be unconditional. This means worse
> performance and more complication, not to mention having to do it N
> times instead of 1 for N implementations.
>
> Not forcing userspace to do "tricks" is in my opinion a strong part of
> deciding whether an API belongs in KVM.
>
> > - What operations does userspace truly need? E.g. if the only use case is to
> > split/truncate/hole punch an existing memslot, can KVM instead provide a
> > memslot flag and exit reason that allows kicking vCPUs to userspace if the
> > memslot is accessed? E.g. KVM_MEM_DISABLED that acts like an invalid memslot,
> > but KVM exists with a dedicated exit reason instead of generating MMIO semantics.
>
> The main cases are:
>
> - for the boot case, splitting and merging existing memslots. QEMU
> likes to merge adjacent memory regions into a single memslot, so if
> something goes from read-write to read-only it has to be split and vice
> versa. I guess a "don't merge this memory region" flag would be the
> less hideous way to solve it in userspace.
>
> - however, there is also the case of resizing an existing memslot which
> is what David would like to have for virtio-mem. This is not really
> fixable because part of the appeal of virtio-mem is to have a single
> huge memslot instead of many smaller ones, in order to reduce the
> granularity of add/remove (David, correct me if I'm wrong).
>
> (The latter _would_ be needed by other VMMs).
>
> > If updates only need to be "atomic" for an address space, does the API allowing
> > mixing non-SMM and SMM memslots?
>
> I agree that the address space should be moved out of the single entries
> and into the header if we follow through with this approach.
>
> > The update needs to be "atomic", i.e. vCPUs
> > must never see an invalid/deleted memslot, but if the memslot is writable,
> > how does KVM prevent some writes from hitting the old HVA and some from hitting
> > the new HVA without a quiescent period?
>
> (Heh, and I forgot likewise that non-x86 does not retry on
> KVM_MEMSLOT_INVALID. Yes, that would be treated as a bug on other
> architectures).
>
> > > Wholesale replacement was my first idea when I looked at the issue, I think
> > > at the end of 2020. I never got to a full implementation, but my impression
> > > was that allocating/deallocating dirty bitmaps, rmaps etc. would make it any
> > > easier than arbitrary batch updates.
> >
> > It's not obvious to me that the memslot metadata is going to be easy to handle
> > regardless of what we do. E.g. I'm pretty sure that batching updates will "corrupt"
> > the dirty bitmap if a hole is punched in a memslot that's being dirty logged.
>
> Indeed; I would have thought that it is clear with the batch updates API
> (which requires the update to be split into delete and insert), but
> apparently it's not and it's by no means optimal.


I 100% agree with everything Paolo said.


Best regards,
Maxim Levitsky
>
> Paolo
>