Re: [PATCH v7 09/14] KVM: Extend the memslot to support fd-based private memory

From: Sean Christopherson
Date: Fri Jul 29 2022 - 15:51:44 EST


On Wed, Jul 06, 2022, Chao Peng wrote:
> @@ -1332,9 +1332,18 @@ yet and must be cleared on entry.
> __u64 userspace_addr; /* start of the userspace allocated memory */
> };
>
> + struct kvm_userspace_memory_region_ext {
> + struct kvm_userspace_memory_region region;
> + __u64 private_offset;
> + __u32 private_fd;
> + __u32 pad1;
> + __u64 pad2[14];
> +};
> +
> /* for kvm_memory_region::flags */
> #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
> #define KVM_MEM_READONLY (1UL << 1)
> + #define KVM_MEM_PRIVATE (1UL << 2)

Very belatedly following up on prior feedback...

| I think a flag is still needed, the problem is private_fd can be safely
| accessed only when this flag is set, e.g. without this flag, we can't
| copy_from_user these new fields since they don't exist for previous
| kvm_userspace_memory_region callers.

I forgot about that aspect of things. We don't technically need a dedicated
PRIVATE flag to handle that, but it does seem to be the least awful soltuion.
We could either add a generic KVM_MEM_EXTENDED_REGION or an entirely new
ioctl(), e.g. KVM_SET_USER_MEMORY_REGION2, but in both approaches there's a decent
chance that we'll end up needed individual "this field is valid" flags anways.

E.g. if KVM requires pad1 and pad2 to be zero to carve out future extensions,
then we're right back here if some future extension needs to treat '0' as a legal
input.

TL;DR: adding KVM_MEM_PRIVATE still seems like the best approach.

> @@ -4631,14 +4658,35 @@ static long kvm_vm_ioctl(struct file *filp,
> break;
> }
> case KVM_SET_USER_MEMORY_REGION: {
> - struct kvm_userspace_memory_region kvm_userspace_mem;
> + struct kvm_user_mem_region mem;
> + unsigned long size;
> + u32 flags;
> +
> + kvm_sanity_check_user_mem_region_alias();
> +
> + memset(&mem, 0, sizeof(mem));
>
> r = -EFAULT;
> - if (copy_from_user(&kvm_userspace_mem, argp,
> - sizeof(kvm_userspace_mem)))
> +
> + if (get_user(flags,
> + (u32 __user *)(argp + offsetof(typeof(mem), flags))))
> + goto out;


Indentation is funky. It's hard to massage this into something short and
readable What about capturing the offset separately? E.g.

struct kvm_user_mem_region mem;
unsigned int flags_offset = offsetof(typeof(mem), flags));
unsigned long size;
u32 flags;

kvm_sanity_check_user_mem_region_alias();

memset(&mem, 0, sizeof(mem));

r = -EFAULT;
if (get_user(flags, (u32 __user *)(argp + flags_offset)))
goto out;

But this can actually be punted until KVM_MEM_PRIVATE is fully supported. As of
this patch, KVM doesn't read the extended size, so I believe the diff for this
patch can simply be:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index da263c370d00..5194beb7b52f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4640,6 +4640,10 @@ static long kvm_vm_ioctl(struct file *filp,
sizeof(kvm_userspace_mem)))
goto out;

+ r = -EINVAL;
+ if (mem.flags & KVM_MEM_PRIVATE)
+ goto out;
+
r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem);
break;
}