Re: [PATCH v3 07/10] KVM: arm64: Provide a PV_TIME device to user space

From: Jonathan Cameron
Date: Thu Aug 22 2019 - 09:45:48 EST


On Thu, 22 Aug 2019 12:11:55 +0100
Steven Price <steven.price@xxxxxxx> wrote:

> On 22/08/2019 11:57, Jonathan Cameron wrote:
> > On Wed, 21 Aug 2019 16:36:53 +0100
> > Steven Price <steven.price@xxxxxxx> wrote:
> >
> >> Allow user space to inform the KVM host where in the physical memory
> >> map the paravirtualized time structures should be located.
> >>
> >> A device is created which provides the base address of an array of
> >> Stolen Time (ST) structures, one for each VCPU. There must be (64 *
> >> total number of VCPUs) bytes of memory available at this location.
> >>
> >> The address is given in terms of the physical address visible to
> >> the guest and must be page aligned. The guest will discover the address
> >> via a hypercall.
> >>
> >> Signed-off-by: Steven Price <steven.price@xxxxxxx>
> >
> > Hi Steven,
> >
> > One general question inline. I'm not particularly familiar with this area
> > of the kernel, so maybe I'm missing something obvious, but having
> > .destroy free the kvm_device which wasn't created in .create seems
> > 'unusual'.
> >
> > Otherwise, FWIW looks good to me.
> >
> > Jonathan
> >
> [...]
> >> +static void kvm_arm_pvtime_destroy(struct kvm_device *dev)
> >> +{
> >> + struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
> >> +
> >> + pvtime->st_base = GPA_INVALID;
> >> + kfree(dev);
> >
> > Nothing to do with your patch as such... All users do the same.
> >
> > This seems miss balanced. Why do we need to free the device by hand
> > when we didn't create it in the create function? I appreciate
> > the comments say this is needed, but as far as I can see every
> > single callback does kfree(dev) at the end which seems an
> > odd thing to do.
>
> Yes I think this is odd too - indeed when I initially wrote this I
> missed off the kfree() call and had to track down the memory leak.
>
> When I looked into potentially tiding this up I found some other
> oddities, e.g. "kvm-xive" (arch/powerpc/kvm/book3s_xive.c) doesn't have
> a destroy callback. But I can't see anything in the common code which
> deals with that case. So I decided to just "go with the flow" at the
> moment, since I don't understand how some of these existing devices work
> (perhaps they are already broken?).

It has a release however and kvm_device_release also removes the
device from the list that would then be cleared by kvm_destroy_devices.

kvm_device_release is a release callback for the file operations so it
'might' be called in all paths.

Fun though, in kvm_ioctl_create_device the error handling for
the anon_inode_getfd calls ops->destroy without checking it exists.
Boom.

Possibly never happens in reality but looks like a bug to me.

Jonathan


>
> Steve
>
> >> +}
> >> +
> >> +static int kvm_arm_pvtime_set_attr(struct kvm_device *dev,
> >> + struct kvm_device_attr *attr)
> >> +{
> >> + struct kvm *kvm = dev->kvm;
> >> + struct kvm_arch_pvtime *pvtime = &kvm->arch.pvtime;
> >> + u64 __user *user = (u64 __user *)attr->addr;
> >> + struct kvm_dev_arm_st_region region;
> >> +
> >> + switch (attr->group) {
> >> + case KVM_DEV_ARM_PV_TIME_REGION:
> >> + if (copy_from_user(&region, user, sizeof(region)))
> >> + return -EFAULT;
> >> + if (region.gpa & ~PAGE_MASK)
> >> + return -EINVAL;
> >> + if (region.size & ~PAGE_MASK)
> >> + return -EINVAL;
> >> + switch (attr->attr) {
> >> + case KVM_DEV_ARM_PV_TIME_ST:
> >> + if (pvtime->st_base != GPA_INVALID)
> >> + return -EEXIST;
> >> + pvtime->st_base = region.gpa;
> >> + pvtime->st_size = region.size;
> >> + return 0;
> >> + }
> >> + break;
> >> + }
> >> + return -ENXIO;
> >> +}
> >> +
> >> +static int kvm_arm_pvtime_get_attr(struct kvm_device *dev,
> >> + struct kvm_device_attr *attr)
> >> +{
> >> + struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
> >> + u64 __user *user = (u64 __user *)attr->addr;
> >> + struct kvm_dev_arm_st_region region;
> >> +
> >> + switch (attr->group) {
> >> + case KVM_DEV_ARM_PV_TIME_REGION:
> >> + switch (attr->attr) {
> >> + case KVM_DEV_ARM_PV_TIME_ST:
> >> + region.gpa = pvtime->st_base;
> >> + region.size = pvtime->st_size;
> >> + if (copy_to_user(user, &region, sizeof(region)))
> >> + return -EFAULT;
> >> + return 0;
> >> + }
> >> + break;
> >> + }
> >> + return -ENXIO;
> >> +}
> >> +
> >> +static int kvm_arm_pvtime_has_attr(struct kvm_device *dev,
> >> + struct kvm_device_attr *attr)
> >> +{
> >> + switch (attr->group) {
> >> + case KVM_DEV_ARM_PV_TIME_REGION:
> >> + switch (attr->attr) {
> >> + case KVM_DEV_ARM_PV_TIME_ST:
> >> + return 0;
> >> + }
> >> + break;
> >> + }
> >> + return -ENXIO;
> >> +}
> >> +
> >> +static const struct kvm_device_ops pvtime_ops = {
> >> + "Arm PV time",
> >> + .create = kvm_arm_pvtime_create,
> >> + .destroy = kvm_arm_pvtime_destroy,
> >> + .set_attr = kvm_arm_pvtime_set_attr,
> >> + .get_attr = kvm_arm_pvtime_get_attr,
> >> + .has_attr = kvm_arm_pvtime_has_attr
> >> +};
> >> +
> >> +void kvm_pvtime_init(void)
> >> +{
> >> + kvm_register_device_ops(&pvtime_ops, KVM_DEV_TYPE_ARM_PV_TIME);
> >> +}
> >
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
>