Re: [PATCH v3 24/30] vfio-pci/zdev: wire up group notifier

From: Alex Williamson
Date: Tue Feb 08 2022 - 14:26:34 EST


On Tue, 8 Feb 2022 14:51:41 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, Feb 08, 2022 at 10:43:19AM -0700, Alex Williamson wrote:
> > On Fri, 4 Feb 2022 16:15:30 -0500
> > Matthew Rosato <mjrosato@xxxxxxxxxxxxx> wrote:
> >
> > > KVM zPCI passthrough device logic will need a reference to the associated
> > > kvm guest that has access to the device. Let's register a group notifier
> > > for VFIO_GROUP_NOTIFY_SET_KVM to catch this information in order to create
> > > an association between a kvm guest and the host zdev.
> > >
> > > Signed-off-by: Matthew Rosato <mjrosato@xxxxxxxxxxxxx>
> > > arch/s390/include/asm/kvm_pci.h | 2 ++
> > > drivers/vfio/pci/vfio_pci_core.c | 2 ++
> > > drivers/vfio/pci/vfio_pci_zdev.c | 46 ++++++++++++++++++++++++++++++++
> > > include/linux/vfio_pci_core.h | 10 +++++++
> > > 4 files changed, 60 insertions(+)
> > >
> > > diff --git a/arch/s390/include/asm/kvm_pci.h b/arch/s390/include/asm/kvm_pci.h
> > > index e4696f5592e1..16290b4cf2a6 100644
> > > +++ b/arch/s390/include/asm/kvm_pci.h
> > > @@ -16,6 +16,7 @@
> > > #include <linux/kvm.h>
> > > #include <linux/pci.h>
> > > #include <linux/mutex.h>
> > > +#include <linux/notifier.h>
> > > #include <asm/pci_insn.h>
> > > #include <asm/pci_dma.h>
> > >
> > > @@ -32,6 +33,7 @@ struct kvm_zdev {
> > > u64 rpcit_count;
> > > struct kvm_zdev_ioat ioat;
> > > struct zpci_fib fib;
> > > + struct notifier_block nb;
> > > };
> > >
> > > int kvm_s390_pci_dev_open(struct zpci_dev *zdev);
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > > index f948e6cd2993..fc57d4d0abbe 100644
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -452,6 +452,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> > >
> > > vfio_pci_vf_token_user_add(vdev, -1);
> > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > + vfio_pci_zdev_release(vdev);
> > > vfio_pci_core_disable(vdev);
> > >
> > > mutex_lock(&vdev->igate);
> > > @@ -470,6 +471,7 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_close_device);
> > > void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
> > > {
> > > vfio_pci_probe_mmaps(vdev);
> > > + vfio_pci_zdev_open(vdev);
> > > vfio_spapr_pci_eeh_open(vdev->pdev);
> > > vfio_pci_vf_token_user_add(vdev, 1);
> > > }
> >
> > If this handling were for a specific device, I think we'd be suggesting
> > this is the point at which we cross over to a vendor variant making use
> > of vfio-pci-core rather than hooking directly into the core code.
>
> Personally, I think it is wrong layering for VFIO to be aware of KVM
> like this. This marks the first time that VFIO core code itself is
> being made aware of the KVM linkage.

I agree, but I've resigned that I've lost that battle. Both mdev vGPU
vendors make specific assumptions about running on a VM. VFIO was
never intended to be tied to KVM or the specific use case of a VM.

> It copies the same kind of design the s390 specific mdev use of
> putting VFIO in charge of KVM functionality. If we are doing this we
> should just give up and admit that KVM is a first-class part of struct
> vfio_device and get rid of the notifier stuff too, at least for s390.

Euw. You're right, I really don't like vfio core code embracing this
dependency for s390, device specific use cases are bad enough.

> Reading the patches and descriptions pretty much everything is boiling
> down to 'use vfio to tell the kvm architecture code to do something' -
> which I think needs to be handled through a KVM side ioctl.

AIF at least sounds a lot like the reason we invented the irq bypass
mechanism to allow interrupt producers and consumers to register
independently and associate to each other with a shared token.

Is the purpose of IOAT to associate the device to a set of KVM page
tables? That seems like a container or future iommufd operation. I
read DTSM as supported formats for the IOAT.

> Or, at the very least, everything needs to be described in some way
> that makes it clear what is happening to userspace, without kvm,
> through these ioctls.

As I understand the discussion here:

https://lore.kernel.org/all/20220204211536.321475-15-mjrosato@xxxxxxxxxxxxx/

The assumption is that there is no non-KVM userspace currently. This
seems like a regression to me.

> This seems especially true now that it seems s390 PCI support is
> almost truely functional, with actual new userspace instructions to
> issue MMIO operations that work outside of KVM.
>
> I'm not sure how this all fits together, but I would expect an outcome
> where DPDK could run on these new systems and not have to know
> anything more about s390 beyond using the proper MMIO instructions via
> some compilation time enablement.

Yes, fully enabling zPCI with vfio, but only for KVM is not optimal.

> (I've been reviewing s390 patches updating rdma for a parallel set of
> stuff)
>
> > this is meant to extend vfio-pci proper for the whole arch. Is there a
> > compromise in using #ifdefs in vfio_pci_ops to call into zpci specific
> > code that implements these arch specific hooks and the core for
> > everything else? SPAPR code could probably converted similarly, it
> > exists here for legacy reasons. [Cc Jason]
>
> I'm not sure I get what you are suggesting? Where would these ifdefs
> be?

Essentially just:

static const struct vfio_device_ops vfio_pci_ops = {
.name = "vfio-pci",
#ifdef CONFIG_S390
.open_device = vfio_zpci_open_device,
.close_device = vfio_zpci_close_device,
.ioctl = vfio_zpci_ioctl,
#else
.open_device = vfio_pci_open_device,
.close_device = vfio_pci_core_close_device,
.ioctl = vfio_pci_core_ioctl,
#endif
.read = vfio_pci_core_read,
.write = vfio_pci_core_write,
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
};

It would at least provide more validation/exercise of the core/vendor
split. Thanks,

Alex