Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM)

From: Vishal Annapurve
Date: Fri Jul 14 2023 - 19:10:07 EST


On Fri, Jul 14, 2023 at 12:29 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> ...
> And _if_ there is a VMM that instantiates memory before KVM_CREATE_VM, IMO making
> the ioctl() /dev/kvm scoped would have no meaningful impact on adapting userspace
> to play nice with the required ordering. If userspace can get at /dev/kvm, then
> it can do KVM_CREATE_VM, because the only input to KVM_CREATE_VM is the type, i.e.
> the only dependencies for KVM_CREATE_VM should be known/resolved long before the
> VMM knows it wants to use gmem.

I am not sure about the benefits of tying gmem creation to any given
kvm instance. I think the most important requirement here is that a
given gmem range is always tied to a single VM - This can be enforced
when memslots are bound to the gmem files.

I believe "Required ordering" is that gmem files are created first and
then supplied while creating the memslots whose gpa ranges can
generate private memory accesses.
Is there any other ordering we want to enforce here?

> ...
> Practically, I think that gives us a clean, intuitive way to handle intra-host
> migration. Rather than transfer ownership of the file, instantiate a new file
> for the target VM, using the gmem inode from the source VM, i.e. create a hard
> link. That'd probably require new uAPI, but I don't think that will be hugely
> problematic. KVM would need to ensure the new VM's guest_memfd can't be mapped
> until KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM (which would also need to verify the
> memslots/bindings are identical), but that should be easy enough to enforce.
>
> That way, a VM, its memslots, and its SPTEs are tied to the file, while allowing
> the memory and the *contents* of memory to outlive the VM, i.e. be effectively
> transfered to the new target VM. And we'll maintain the invariant that each
> guest_memfd is bound 1:1 with a single VM.
>
> As above, that should also help us draw the line between mapping memory into a
> VM (file), and freeing/reclaiming the memory (inode).
>
> There will be extra complexity/overhead as we'll have to play nice with the
> possibility of multiple files per inode, e.g. to zap mappings across all files
> when punching a hole, but the extra complexity is quite small, e.g. we can use
> address_space.private_list to keep track of the guest_memfd instances associated
> with the inode.

Are we talking about a different usecase of sharing gmem fd across VMs
other than intra-host migration?
If not, ideally only one of the files should be catering to the guest
memory mappings at any given time. i.e. any inode should be ideally
bound to (through the file) a single kvm instance, as we we are
planning to ensure that guest_memfd can't be mapped until
KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM is invoked on the target side.