Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM)

From: Michael Roth
Date: Thu May 11 2023 - 20:21:58 EST


On Fri, Apr 21, 2023 at 06:33:26PM -0700, Sean Christopherson wrote:
>
> Code is available here if folks want to take a look before any kind of formal
> posting:
>
> https://github.com/sean-jc/linux.git x86/kvm_gmem_solo

Hi Sean,

I've been working on getting the SNP patches ported to this but I'm having
some trouble working out a reasonable scheme for how to work the
RMPUPDATE hooks into the proposed design.

One of the main things is kvm_gmem_punch_hole(): this is can free pages
back to the host whenever userspace feels like it. Pages that are still
marked private in the RMP table will blow up the host if they aren't returned
to the normal state before handing them back to the kernel. So I'm trying to
add a hook, orchestrated by kvm_arch_gmem_invalidate(), to handle that,
e.g.:

static long kvm_gmem_punch_hole(struct file *file, int mode, loff_t offset,
loff_t len)
{
struct kvm_gmem *gmem = file->private_data;
pgoff_t start = offset >> PAGE_SHIFT;
pgoff_t end = (offset + len) >> PAGE_SHIFT;
struct kvm *kvm = gmem->kvm;

/*
* Bindings must stable across invalidation to ensure the start+end
* are balanced.
*/
filemap_invalidate_lock(file->f_mapping);
kvm_gmem_invalidate_begin(kvm, gmem, start, end);

/* Handle arch-specific cleanups before releasing pages */
kvm_arch_gmem_invalidate(kvm, gmem, start, end);
truncate_inode_pages_range(file->f_mapping, offset, offset + len);

kvm_gmem_invalidate_end(kvm, gmem, start, end);
filemap_invalidate_unlock(file->f_mapping);

return 0;
}

But there's another hook, kvm_arch_gmem_set_mem_attributes(), needed to put
the page in its intended state in the RMP table prior to mapping it into the
guest's NPT. Currently I'm calling that hook via
kvm_vm_ioctl_set_mem_attributes(), just after kvm->mem_attr_array is updated
based on the ioctl. The reasoning there is that KVM MMU can then rely on the
existing mmu_invalidate_seq logic to ensure both the state in the
mem_attr_array and the RMP table are in sync and up-to-date once MMU lock is
acquired and MMU is ready to map it, or retry #NPF otherwise.

But for kvm_gmem_punch_hole(), kvm_vm_ioctl_set_mem_attributes() can potentially
result in something like the following sequence if I implement things as above:

CPU0: kvm_gmem_punch_hole():
kvm_gmem_invalidate_begin()
kvm_arch_gmem_invalidate() // set pages to default/shared state in RMP table before free'ing
CPU1: kvm_vm_ioctl_set_mem_attributes():
kvm_arch_gmem_set_mem_attributes() // maliciously set pages to private in RMP table
CPU0: truncate_inode_pages_range() // HOST BLOWS UP TOUCHING PRIVATE PAGES
kvm_arch_gmem_invalidate_end()

One quick and lazy solution is to rely on the fact that
kvm_vm_ioctl_set_mem_attributes() holds the kvm->slots_lock throughout the
entire begin()/end() portion of the invalidation sequence, and to similarly
hold the kvm->slots_lock throughout the begin()/end() sequence in
kvm_gmem_punch_hole() to prevent any interleaving.

But I'd imagine overloading kvm->slots_lock is not the proper approach. But
would introducing a similar mutex to keep these operations grouped/atomic be
a reasonable approach to you, or should we be doing something else entirely
here?

Keep in mind that RMP updates can't be done while holding KVM->mmu_lock
spinlock, because we also need to unmap pages from the directmap, which can
lead to scheduling-while-atomic BUG()s[1], so that's another constraint we
need to work around.

Thanks!

-Mike

[1] https://lore.kernel.org/linux-coco/20221214194056.161492-7-michael.roth@xxxxxxx/T/#m45a1af063aa5ac0b9314d6a7d46eecb1253bba7a

>
> [1] https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@xxxxxxxxxx
> [2] https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner
> [3] https://lore.kernel.org/linux-mm/20200522125214.31348-1-kirill.shutemov@xxxxxxxxxxxxxxx